IV-learner: learning conditional average treatment effects using instrumental variables
Abstract: Instrumental variable methods are very popular in econometrics and biostatistics for inferring causal average effects of an exposure on an outcome where there is unmeasured confounding. However, their application for learning heterogeneous treatment effects, such as conditional average treatment effects (CATE), in combination with machine learning in investigating treatment effect heterogeneity is somewhat limited.
A generic approach that allows the use of arbitrary machine learning algorithms can be based on the popular two-stage principle. In this two-stage approach, we can learn causal treatment effects by regressing the outcome on the predicted exposure, based on a first-stage regression of exposure on instrumental variables (and pre-exposure covariates). This gives rise to the IV-double machine learning (IV-DML) approach of Foster and Syrgkanis (2023).
Unfortunately, the slow convergence rates of the data-adaptive estimators that affect the first-stage predictions propagate into the resulting CATE estimates. In view of this, we make an alternative proposal, the IV-learner, which is inspired by infinite-dimensional targeted learning procedure (Vansteelandt 2023, van der Laan et al 2024), which strategically tailors first-stage predictions to perform well in their ultimate task: CATE estimation. The resulting targeted Neyman-orthogonal learner is easy to construct based on arbitrary, off-the-shelf learners. We study the finite sample performance of our proposal using simulations, and compare it to existing methods. We also illustrate it using a real data example.
Ths is a joint work with Stijn Vansteelandt, Stephen O’Neill, Richard Grieve