draws
fits the base imputation model to the observed outcome data
according to the given multiple imputation methodology.
According to the user's method specification, it returns either draws from the posterior distribution of the
model parameters as required for Bayesian multiple imputation or frequentist parameter estimates from the
original data and bootstrapped or leave-one-out datasets as required for conditional mean imputation.
The purpose of the imputation model is to estimate model parameters
in the absence of intercurrent events (ICEs) handled using reference-based imputation methods.
For this reason, any observed outcome data after ICEs, for which reference-based imputation methods are
specified, are removed and considered as missing for the purpose of estimating the imputation model, and for
this purpose only. The imputation model is a mixed model for repeated measures (MMRM) that is valid
under a missing-at-random (MAR) assumption.
It can be fit using maximum likelihood (ML) or restricted ML (REML) estimation,
a Bayesian approach, or an approximate Bayesian approach according to the user's method specification.
The ML/REML approaches and the approximate Bayesian approach support several possible covariance structures,
while the Bayesian approach based on MCMC sampling supports only an unstructured covariance structure.
In any case the covariance matrix can be assumed to be the same or different across each group.
Usage
draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE)
# S3 method for class 'approxbayes'
draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE)
# S3 method for class 'condmean'
draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE)
# S3 method for class 'bmlmi'
draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE)
# S3 method for class 'bayes'
draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE)
Arguments
- data
A
data.frame
containing the data to be used in the model. See details.- data_ice
A
data.frame
that specifies the information related to the ICEs and the imputation strategies. See details.- vars
A
vars
object as generated byset_vars()
. See details.- method
A
method
object as generated by eithermethod_bayes()
,method_approxbayes()
,method_condmean()
ormethod_bmlmi()
. It specifies the multiple imputation methodology to be used. See details.- ncores
A single numeric specifying the number of cores to use in creating the draws object. Note that this parameter is ignored for
method_bayes()
(Default = 1). Can also be a cluster object generated bymake_rbmi_cluster()
- quiet
Logical, if
TRUE
will suppress printing of progress information that is printed to the console.
Value
A draws
object which is a named list containing the following:
data
: R6longdata
object containing all relevant input data information.method
: Amethod
object as generated by eithermethod_bayes()
,method_approxbayes()
ormethod_condmean()
.samples
: list containing the estimated parameters of interest. Each element ofsamples
is a named list containing the following:ids
: vector of characters containing the ids of the subjects included in the original dataset.beta
: numeric vector of estimated regression coefficients.sigma
: list of estimated covariance matrices (one for each level ofvars$group
).theta
: numeric vector of transformed covariances.failed
: Logical.TRUE
if the model fit failed.ids_samp
: vector of characters containing the ids of the subjects included in the given sample.
fit
: ifmethod_bayes()
is chosen, returns the MCMC Stan fit object. OtherwiseNULL
.n_failures
: absolute number of failures of the model fit. Relevant only formethod_condmean(type = "bootstrap")
,method_approxbayes()
andmethod_bmlmi()
.formula
: fixed effects formula object used for the model specification.
Details
draws
performs the first step of the multiple imputation (MI) procedure: fitting the
base imputation model. The goal is to estimate the parameters of interest needed
for the imputation phase (i.e. the regression coefficients and the covariance matrices
from a MMRM model).
The function distinguishes between the following methods:
Bayesian MI based on MCMC sampling:
draws
returns the draws from the posterior distribution of the parameters using a Bayesian approach based on MCMC sampling. This method can be specified by usingmethod = method_bayes()
.Approximate Bayesian MI based on bootstrapping:
draws
returns the draws from the posterior distribution of the parameters using an approximate Bayesian approach, where the sampling from the posterior distribution is simulated by fitting the MMRM model on bootstrap samples of the original dataset. This method can be specified by usingmethod = method_approxbayes()]
.Conditional mean imputation with bootstrap re-sampling:
draws
returns the MMRM parameter estimates from the original dataset and fromn_samples
bootstrap samples. This method can be specified by usingmethod = method_condmean()
with argumenttype = "bootstrap"
.Conditional mean imputation with jackknife re-sampling:
draws
returns the MMRM parameter estimates from the original dataset and from each leave-one-subject-out sample. This method can be specified by usingmethod = method_condmean()
with argumenttype = "jackknife"
.Bootstrapped Maximum Likelihood MI:
draws
returns the MMRM parameter estimates from a given number of bootstrap samples needed to perform random imputations of the bootstrapped samples. This method can be specified by usingmethod = method_bmlmi()
.
Bayesian MI based on MCMC sampling has been proposed in Carpenter, Roger, and Kenward (2013) who first introduced reference-based imputation methods. Approximate Bayesian MI is discussed in Little and Rubin (2002). Conditional mean imputation methods are discussed in Wolbers et al (2022). Bootstrapped Maximum Likelihood MI is described in Von Hippel & Bartlett (2021).
The argument data
contains the longitudinal data. It must have at least the following variables:
subjid
: a factor vector containing the subject ids.visit
: a factor vector containing the visit the outcome was observed on.group
: a factor vector containing the group that the subject belongs to.outcome
: a numeric vector containing the outcome variable. It might contain missing values. Additional baseline or time-varying covariates must be included indata
.
data
must have one row per visit per subject. This means that incomplete
outcome data must be set as NA
instead of having the related row missing. Missing values
in the covariates are not allowed.
If data
is incomplete
then the expand_locf()
helper function can be used to insert any missing rows using
Last Observation Carried Forward (LOCF) imputation to impute the covariates values.
Note that LOCF is generally not a principled imputation method and should only be used when appropriate
for the specific covariate.
Please note that there is no special provisioning for the baseline outcome values. If you do not want baseline
observations to be included in the model as part of the response variable then these should be removed in advance
from the outcome variable in data
. At the same time if you want to include the baseline outcome as covariate in
the model, then this should be included as a separate column of data
(as any other covariate).
Character covariates will be explicitly
cast to factors. If you use a custom analysis function that requires specific reference
levels for the character covariates (for example in the computation of the least square means
computation) then you are advised
to manually cast your character covariates to factor in advance of running draws()
.
The argument data_ice
contains information about the occurrence of ICEs. It is a
data.frame
with 3 columns:
Subject ID: a character vector containing the ids of the subjects that experienced the ICE. This column must be named as specified in
vars$subjid
.Visit: a character vector containing the first visit after the occurrence of the ICE (i.e. the first visit affected by the ICE). The visits must be equal to one of the levels of
data[[vars$visit]]
. If multiple ICEs happen for the same subject, then only the first non-MAR visit should be used. This column must be named as specified invars$visit
.Strategy: a character vector specifying the imputation strategy to address the ICE for this subject. This column must be named as specified in
vars$strategy
. Possible imputation strategies are:"MAR"
: Missing At Random."CIR"
: Copy Increments in Reference."CR"
: Copy Reference."JR"
: Jump to Reference."LMCF"
: Last Mean Carried Forward. For explanations of these imputation strategies, see Carpenter, Roger, and Kenward (2013), Cro et al (2021), and Wolbers et al (2022). Please note that user-defined imputation strategies can also be set.
The data_ice
argument is necessary at this stage since (as explained in Wolbers et al (2022)), the model is fitted
after removing the observations which are incompatible with the imputation model, i.e.
any observed data on or after data_ice[[vars$visit]]
that are addressed with an imputation
strategy different from MAR are excluded for the model fit. However such observations
will not be discarded from the data in the imputation phase
(performed with the function (impute()
). To summarize, at this stage only pre-ICE data
and post-ICE data that is after ICEs for which MAR imputation is specified are used.
If the data_ice
argument is omitted, or if a subject doesn't have a record within data_ice
, then it is
assumed that all of the relevant subject's data is pre-ICE and as such all missing
visits will be imputed under the MAR assumption and all observed data will be used to fit the base imputation model.
Please note that the ICE visit cannot be updated via the update_strategy
argument
in impute()
; this means that subjects who didn't have a record in data_ice
will always have their
missing data imputed under the MAR assumption even if their strategy is updated.
The vars
argument is a named list that specifies the names of key variables within
data
and data_ice
. This list is created by set_vars()
and contains the following named elements:
subjid
: name of the column indata
anddata_ice
which contains the subject ids variable.visit
: name of the column indata
anddata_ice
which contains the visit variable.group
: name of the column indata
which contains the group variable.outcome
: name of the column indata
which contains the outcome variable.covariates
: vector of characters which contains the covariates to be included in the model (including interactions which are specified as"covariateName1*covariateName2"
). If no covariates are provided the default model specification ofoutcome ~ 1 + visit + group
will be used. Please note that thegroup*visit
interaction is not included in the model by default.strata
: covariates used as stratification variables in the bootstrap sampling. By default only thevars$group
is set as stratification variable. Needed only formethod_condmean(type = "bootstrap")
andmethod_approxbayes()
.strategy
: name of the column indata_ice
which contains the subject-specific imputation strategy.
In our experience, Bayesian MI (method = method_bayes()
) with a relatively low number of
samples (e.g. n_samples
below 100) frequently triggers STAN warnings about R-hat such as
"The largest R-hat is X.XX, indicating chains have not mixed". In many instances, this warning
might be spurious, i.e. standard diagnostics analysis of the MCMC samples do not indicate any
issues and results look reasonable. Increasing the number of samples to e.g. above 150 usually
gets rid of the warning.
References
James R Carpenter, James H Roger, and Michael G Kenward. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation. Journal of Biopharmaceutical Statistics, 23(6):1352–1371, 2013.
Suzie Cro, Tim P Morris, Michael G Kenward, and James R Carpenter. Sensitivity analysis for clinical trials with missing continuous outcome data using controlled multiple imputation: a practical guide. Statistics in Medicine, 39(21):2815–2842, 2020.
Roderick J. A. Little and Donald B. Rubin. Statistical Analysis with Missing Data, Second Edition. John Wiley & Sons, Hoboken, New Jersey, 2002. [Section 10.2.3]
Marcel Wolbers, Alessandro Noci, Paul Delmar, Craig Gower-Page, Sean Yiu, Jonathan W. Bartlett. Standard and reference-based conditional mean imputation. https://arxiv.org/abs/2109.11162, 2022.
Von Hippel, Paul T and Bartlett, Jonathan W. Maximum likelihood multiple imputation: Faster imputations and consistent standard errors without posterior draws. 2021.
See also
method_bayes()
, method_approxbayes()
, method_condmean()
, method_bmlmi()
for setting method
.
set_vars()
for setting vars
.
expand_locf()
for expanding data
in case of missing rows.
For more details see the quickstart vignette:
vignette("quickstart", package = "rbmi")
.