Methodological development for estimating marginal causal effects of non-randomized treatments on time-to-event outcomes
Research project
This project falls under the broad statistical research area called causal inference.
This is a challenging and active area of research which is of strategic importance for Sweden since opportunities to conduct complex and comprehensive observational studies are plentiful due to the possibility to link socio-economics, demographics, geographical and health data on an individual level.
The main purpose of this project is to develop new and improved statistical methods for analysing time-to-event data in studies where treatment assignment is not randomized. Within the causal inference literature, the time-to-event setting has not gained much attention compared to other settings. Providing novel theoretical results and practical recommendations useful for empirical scientists working with this type of data is therefore welcomed by researchers both in causal inference as well as in the empirical sciences.
Randomized controlled trials (RCTs) are generally considered the golden standard for estimating causal effects, of e.g., medical treatments on survival time or time to disease recurrence. In many scientific areas it is unfeasible or unethical to perform randomized controlled trials and this is one reason that researchers increasingly conduct observational studies. Despite that treatment assignment is non-randomized, observational studies can be designed to approximate RCTs, and measures similar to those reported from RCTs can be estimated from such studies, e.g., population level treatment effects such as marginal hazard ratios. This project addresses some of the issues researchers face when designing and conducting observational studies.
In the last decade propensity score matching (PSM) and inverse probability of treatment weighting using the propensity score (IPTW-PS) have become enormously popular and widely used within the medical, and other empirical, sciences. However, these methods are not without faults, in short: Both PSM and IPTW-PS requires specifying a model for the propensity score; PSM approximates a completely randomized experiment, which is less efficient than approximating a fully blocked randomized experiment; and IPTW-PS can lead to large standard errors.
A different issue which can lead to uncertain estimates and conclusions is the problem of missing data. A common way to handle missing data is to use multiple imputation, and matched or weighted samples can subsequently be created for each imputed dataset. For the parameter of interest, point estimates from each imputed dataset can be pooled to a single estimate but it is not always straightforward to correctly estimate the standard error (needed for confidence interval construction). In cases where bootstrap standard errors is to be computed it is somewhat unclear how to draw bootstrap samples when both matching/weighting and multiple imputation have been performed.
Last but not least, regardless of if matching, weighting or some other method is used, the researcher has to select a covariate set such that it is probable that, given the covariate set, the effect of treatment on outcome is no longer confounded by any other variables. To aid in this selection reliable data-driven covariate selection procedures, suited for time-to-event outcomes, need to be developed.
The project has the following specific goals:
To develop model-free matching and weighting estimators of population-level average treatment effects, marginal hazard ratios. The new estimators utilize cardinality matching and stable balancing weights and there is no need to estimate the propensity score.
To investigate how to properly perform bootstrapping for standard error estimation and confidence interval construction when missing data is imputed by multiple imputation and matching or weighting is subsequently done using the imputed datasets.
To develop new data-driven covariate selection procedures, geared at selecting sets of covariates confounding the relation between treatment and outcome, suited for time-to-event data.
To apply the developed methods when studying disparities in quality of stroke care, using Riksstroke data.
Marginal hazard ratios are of interest to policy makers and are the type of parameters estimated from randomized controlled trials. The proposed methods are less model-dependent than existing methods. Adjusting for confounding variables and correct bootstrap sampling is of importance to obtain unbiased results and valid inference.