Skip to contents

Subgroup analyses are routinely performed in clinical trial analyses. This package implements shrinkage methods to estimate treatment effects in overlapping subgroups with a binary or time-to-event endpoint as described in Wolbers et al (2024). Both Bayesian estimation with a regularized horseshoe prior as well as penalized frequentist methods using the lasso or ridge penalties are implemented. The Bayesian approach provides both point estimates and credible intervals whereas only point estimates are available for the penalized frequentist methods. The estimators are intended to completement standard subgroup-specific estimators which are routinely displayed in forest plots. They typically have substantially smaller overall mean squared error compared to the standard estimator.

Installation

Please note that on Windows, you will need to install Rtools, because you will need to have a working C++ toolchain to compile the Stan models.

Release

You can install the current release version of bonsaiforest from CRAN with:

install.packages("bonsaiforest")

Development

You can install the development version of bonsaiforest from GitHub with:

# install.packages("remotes")
remotes::install_github("insightsengineering/bonsaiforest")

Getting started

See the introductory vignette or get started by trying out the example:

library(bonsaiforest)
str(example_data)
#> 'data.frame':    1000 obs. of  14 variables:
#>  $ id    : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ arm   : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 2 2 2 1 ...
#>  $ x_1   : Factor w/ 2 levels "a","b": 2 2 1 1 2 1 1 2 2 1 ...
#>  $ x_2   : Factor w/ 2 levels "a","b": 1 2 1 1 1 2 2 1 2 2 ...
#>  $ x_3   : Factor w/ 2 levels "a","b": 2 1 2 1 2 2 2 1 2 2 ...
#>  $ x_4   : Factor w/ 3 levels "a","b","c": 2 1 3 3 3 3 3 1 3 3 ...
#>  $ x_5   : Factor w/ 4 levels "a","b","c","d": 4 4 1 1 4 1 3 4 4 3 ...
#>  $ x_6   : Factor w/ 2 levels "a","b": 2 1 2 2 2 1 2 2 2 2 ...
#>  $ x_7   : Factor w/ 2 levels "a","b": 2 1 2 2 2 1 1 2 2 2 ...
#>  $ x_8   : Factor w/ 3 levels "a","b","c": 3 2 1 3 3 3 1 1 2 3 ...
#>  $ x_9   : Factor w/ 2 levels "a","b": 2 1 2 2 2 1 1 2 2 2 ...
#>  $ x_10  : Factor w/ 3 levels "a","b","c": 3 3 3 3 3 1 2 2 2 3 ...
#>  $ tt_pfs: num  0.9795 3.4762 1.7947 0.0197 2.2168 ...
#>  $ ev_pfs: num  1 0 1 1 0 0 0 0 0 0 ...

horseshoe_model <- horseshoe(
  resp = "tt_pfs", trt = "arm",
  subgr = c("x_1", "x_2", "x_3", "x_4"),
  covars = c(
    "x_1", "x_2", "x_3", "x_4", "x_5",
    "x_6", "x_7", "x_8", "x_9", "x_10"
  ),
  data = example_data, resptype = "survival",
  status = "ev_pfs", chains = 2, seed = 0,
  control = list(adapt_delta = 0.95)
)
#> Compiling Stan program...
#> Start sampling
#> 
#> SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 0.000254 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.54 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 6.543 seconds (Warm-up)
#> Chain 1:                4.446 seconds (Sampling)
#> Chain 1:                10.989 seconds (Total)
#> Chain 1: 
#> 
#> SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
#> Chain 2: 
#> Chain 2: Gradient evaluation took 7.5e-05 seconds
#> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.75 seconds.
#> Chain 2: Adjust your expectations accordingly!
#> Chain 2: 
#> Chain 2: 
#> Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 2: 
#> Chain 2:  Elapsed Time: 6.227 seconds (Warm-up)
#> Chain 2:                8.713 seconds (Sampling)
#> Chain 2:                14.94 seconds (Total)
#> Chain 2:
#> Warning: There were 3 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems

summary_horseshoe <- summary(horseshoe_model, conf = 0.9)
summary_horseshoe
#>   subgroup trt.estimate   trt.low  trt.high
#> 1     x_1a    0.6561374 0.5260189 0.8162805
#> 2     x_1b    0.6535959 0.5176018 0.8102726
#> 3     x_2a    0.6550734 0.5189214 0.8256944
#> 4     x_2b    0.6534071 0.5261231 0.8076542
#> 5     x_3a    0.6308748 0.4582799 0.7998477
#> 6     x_3b    0.6631032 0.5325865 0.8224179
#> 7     x_4a    0.6437881 0.5059100 0.8047302
#> 8     x_4b    0.6431772 0.4921286 0.8127279
#> 9     x_4c    0.6727510 0.5257881 0.8657248

plot(summary_horseshoe)