Univariate Conditional Average Treatment Effect Estimation

**Author:** Philippe Boileau

`uniCATE`

implements statistical inference procedures for variable importance measures that assess the treatment effect modification capabilities of individual pre-treatment biomarkers in high-dimensional randomized control trials. This variable importance measure is defined as the vector of simple linear regression slope coefficients obtained by regressing the difference in potential outcomes on each biomarker. This parameter, which we dub the *univariate conditional average treatment effect*, is a reasonable indicator of treatment effect modification in all but pathological biomarker-outcome relationships, and can therefore be used to identify predictive biomarkers. Assumption-lean estimation and testing procedures based on semiparametric theory are made available for continuous, binary, and right-censored time-to-event outcomes.

The *development version* of the package may be installed from GitHub using `remotes`

:

`remotes::install_github("insightsengineering/uniCATE")`

`unicate()`

should be used when the outcome is continuous or binary. For right-censored time-to-event outcomes, use `sunicate()`

.

We simulate a randomized control trial in which there is a heterogeneous treatment effect for biomarkers 1 and 2. `unicate()`

successfully identifies these biomarkers as effect modifiers.

```
# set the seed for reproducibility
set.seed(514)
# simulate some randomized control data
n <- 100
data <- tibble("treatment" = rbinom(n, 1, 0.5)) %>%
mutate(
bio1 = rnorm(n, mean = 2, sd = 0.2),
bio2 = rnorm(n, mean = -2, sd = 0.2),
bio3 = rnorm(n, mean = 0, sd = 0.1),
bio4 = rnorm(n, mean = 0, sd = 0.1),
covar = 0.2 * rbinom(n, 1, 0.4),
response = covar + bio1 * treatment + bio2 * treatment
)
# define the required arguments
covariates <- c("bio1", "bio2", "bio3", "bio4", "covar")
biomarkers <- c("bio1", "bio2", "bio3", "bio4")
propensity_score_ls <- list("1" = 0.5, "0" = 0.5)
# create a simple SuperLearner using a linear model and a random forest
interactions <- lapply(biomarkers, function(b) c(b, "treatment"))
lrnr_interactions <- sl3::Lrnr_define_interactions$new(interactions)
lrnr_glm <- sl3::make_learner(
sl3::Pipeline, lrnr_interactions, sl3::Lrnr_glm$new()
)
lrnr_sl <- Lrnr_sl$new(
learners = make_learner(
Stack, Lrnr_ranger$new(), lrnr_glm
),
metalearner = make_learner(Lrnr_nnls)
)
# apply uniCATE to the simulated data
unicate(
data,
outcome = "response",
treatment = "treatment",
covariates = covariates,
biomarkers = biomarkers,
propensity_score_ls = propensity_score_ls,
super_learner = lrnr_sl,
v_folds = 2L
)
#> # A tibble: 4 x 7
#> biomarker coef se z p_value p_value_bh p_value_holm
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 bio1 0.876 0.0988 8.87 7.52e-19 3.01e-18 3.01e-18
#> 2 bio2 0.841 0.115 7.30 2.97e-13 5.94e-13 8.91e-13
#> 3 bio3 0.117 0.255 0.457 6.48e- 1 6.64e- 1 1 e+ 0
#> 4 bio4 0.113 0.260 0.434 6.64e- 1 6.64e- 1 1 e+ 0
```

If you encounter any bugs or have any specific feature requests, please file an issue.

Contributions are very welcome. Interested contributors should consult our contribution guidelines prior to submitting a pull request.

The contents of this repository are distributed under the Apache 2.0 license. See the `LICENSE.md`

and `LICENSE`

files for details.