Introduction to Chevron
Adrian Waddell
2022-10-14
chevron.Rmd
Introduction
The chevron
R package provides functions to produce
standard tables, listings and graphs (TLGs) used to analyze and report
clinical trials data. The ensemble of function used to produce a
particular output are stored in an chevron_tlg
object of
class S4
also called pipelines. Each standard
output is associated with one pipeline. They contain the
following objects: * A main
function also refereed to as
TLG-function. * A preprocess
function. * A
postprocess
function * A adam_dataset
character vector of the name of the AdAM
datasets required
to create the output.
TLG-functions
The TLG-functions in chevron
use other packages
to produce the final outputs, for example rtables
and
tern
are used to create listings and tables, and
ggplot2
, lattice
, and grid
are
used to create graphs.
TLG-functions in chevron
such as
dmt01_1_main
, aet02_1_main
,
aet02_2_main
have the following properties:
- they produce a narrow defined output (currently standards in Roche
GDS
). Note, that the naming convention<gds template id>_<i>_main
indicates that a RocheGDS
defined standard may have different implementations. Or, alternatively, aGDS
template id can be regarded as a guideline and the function name inchevron
as a standard. - have very few arguments to modify the standard. Generally, arguments may change the structure of the table (arm variable, which variables are summarized) but not parameterize the cell content (i.e. alpha-level for p-value).
- have always the first argument
adam_db
which is the collection ofADaM
datasets (ADSL
,ADAE
,ADRS
, etc.). Please read the Theadam_db
Argument vignette in this package for more details. - have a
.study
argument, read the The .study argument vignette for more detail. - have the
...
argument to facilitate their incorporation in a pipeline.
pre-processing
The pre-process functions in chevron
use
dm
and dunlin
packages to process
dm
object and turn them into a suitable input for
TLG-functions. The pre-processing step typically includes
checks that will ensure that the dm
input can be later
processed by the TLG-functions.
pre-process in chevron such as dmt01_1_pre
,
aet02_1_pre
, aet02_2_pre
have the following
properties: 1. they return a dm
object amenable to
processing by a TLG-functions or return rapidly an
understandable error message. 1. have very few arguments to modify the
standard. 1. have always the first argument adam_db
which
is the collection of ADaM
datasets (ADSL
,
ADAE
, ADRS
, etc.). Please read the The
adam_db
Argument vignette in this package for more
details. 1. can have the .study
argument and other argument
of the corresponding TLG-functions as they may inform the
function on the column that will be required and facilitate the checking
process. 1. have the ...
argument to facilitate their
incorporation in a pipeline.
Example AET02
For example, the GDS
template aet02
is
implemented in chevron
with the pipeline that have
the name aet02_*
. The object documentation which is
accessible with the help
function,
e.g. help('aet02_1')
documents what is particular in the
_1
implementation.
We first define the data and put it in a dm
object, you
can find more details about this in the adam_db
vignette.
library(dm)
#>
#> Attaching package: 'dm'
#> The following object is masked from 'package:stats':
#>
#> filter
library(scda)
#>
syn_data <- synthetic_cdisc_data("latest")
adam_study_data <- dm(adsl = syn_data$adsl, adae = syn_data$adae) %>%
dm_add_pk(adsl, c("USUBJID", "STUDYID")) %>%
dm_add_fk(adae, c("USUBJID", "STUDYID"), ref_table = "adsl") %>%
dm_add_pk(adae, c("USUBJID", "STUDYID", "ASTDTM", "AETERM", "AESEQ"))
validate_dm(adam_study_data)
#> Warning: `validate_dm()` was deprecated in dm 0.3.0.
#> Please use `dm_validate()` instead.
A the aet02_1
output is then created as follows:
library(chevron)
#> Registered S3 method overwritten by 'tern':
#> method from
#> tidy.glm broom
run(aet02_1, adam_study_data)
#> MedDRA System Organ Class A: Drug X B: Placebo C: Combination
#> MedDRA Preferred Term (N=134) (N=134) (N=132)
#> ——————————————————————————————————————————————————————————————————————————————————————————————————————
#> Total number of patients with at least one adverse event 100 (74.6%) 98 (73.1%) 103 (78%)
#> Overall total number of events 502 480 604
#> cl A.1
#> Total number of patients with at least one adverse event 68 (50.7%) 58 (43.3%) 76 (57.6%)
#> Total number of events 115 99 137
#> dcd A.1.1.1.1 45 (33.6%) 31 (23.1%) 52 (39.4%)
#> dcd A.1.1.1.2 41 (30.6%) 39 (29.1%) 42 (31.8%)
#> cl B.2
#> Total number of patients with at least one adverse event 62 (46.3%) 56 (41.8%) 74 (56.1%)
#> Total number of events 102 106 127
#> dcd B.2.2.3.1 38 (28.4%) 40 (29.9%) 45 (34.1%)
#> dcd B.2.1.2.1 39 (29.1%) 34 (25.4%) 46 (34.8%)
#> cl D.1
#> Total number of patients with at least one adverse event 64 (47.8%) 54 (40.3%) 68 (51.5%)
#> Total number of events 106 84 114
#> dcd D.1.1.1.1 42 (31.3%) 32 (23.9%) 46 (34.8%)
#> dcd D.1.1.4.2 38 (28.4%) 34 (25.4%) 40 (30.3%)
#> cl D.2
#> Total number of patients with at least one adverse event 37 (27.6%) 46 (34.3%) 50 (37.9%)
#> Total number of events 49 57 65
#> dcd D.2.1.5.3 37 (27.6%) 46 (34.3%) 50 (37.9%)
#> cl C.2
#> Total number of patients with at least one adverse event 28 (20.9%) 36 (26.9%) 48 (36.4%)
#> Total number of events 39 40 57
#> dcd C.2.1.2.1 28 (20.9%) 36 (26.9%) 48 (36.4%)
#> cl B.1
#> Total number of patients with at least one adverse event 38 (28.4%) 37 (27.6%) 36 (27.3%)
#> Total number of events 44 43 50
#> dcd B.1.1.1.1 38 (28.4%) 37 (27.6%) 36 (27.3%)
#> cl C.1
#> Total number of patients with at least one adverse event 36 (26.9%) 34 (25.4%) 36 (27.3%)
#> Total number of events 47 51 54
#> dcd C.1.1.1.3 36 (26.9%) 34 (25.4%) 36 (27.3%)
The function associated with a particular slot can be retrieved with
the corresponding method: get_main
,
get_preprocess
and get_postprocess
get_main(aet02_1)
#> function (adam_db, armvar = .study$actualarm, lbl_overall = .study$lbl_overall,
#> prune_0 = TRUE, deco = std_deco("AET02"), .study = list(actualarm = "ACTARM",
#> lbl_overall = NULL))
#> {
#> dbsel <- get_db_data(adam_db, "adsl", "adae")
#> lyt <- aet02_1_lyt(armvar = armvar, lbl_overall = lbl_overall,
#> deco = deco)
#> tbl <- build_table(lyt, dbsel$adae, alt_counts_df = dbsel$adsl)
#> if (prune_0) {
#> tbl <- smart_prune(tbl)
#> }
#> tbl_sorted <- tbl %>% sort_at_path(path = c("AEBODSYS"),
#> scorefun = cont_n_allcols) %>% sort_at_path(path = c("AEBODSYS",
#> "*", "AEDECOD"), scorefun = score_occurrences)
#> tbl_sorted
#> }
#> <bytecode: 0x556acaa4c1a0>
#> <environment: namespace:chevron>
These are standard function that can be used on their own
get_preprocess(aet02_1)(adam_study_data)
#> ── Metadata ────────────────────────────────────────────────────────────────────
#> Tables: `adsl`, `adae`
#> Columns: 148
#> Primary keys: 2
#> Foreign keys: 1
# or
foo <- aet02_1@preprocess
foo(adam_study_data)
#> ── Metadata ────────────────────────────────────────────────────────────────────
#> Tables: `adsl`, `adae`
#> Columns: 148
#> Primary keys: 2
#> Foreign keys: 1
Pipeline customization
In some instances it is useful to customize the pipeline, for instance by changing the pre processing function. Be aware that you have to think carefully about argument names and compatibility with downstream functions.
aet02_1@preprocess <- function(adam_db) adam_db
get_preprocess(aet02_1)
#> function(adam_db) adam_db
Note that this operation creates a local version of the pipeline. The
package version of the pipeline (accessible with
chevron::aet01_1
) remains unchanged.
Custom Pipeline creation
To create a pipeline from scratch, use the provided constructor:
my_pipeline <- chevron_tlg(
main = aet02_1_main,
preprocess = aet02_1_pre,
postprocess = function(tlg, ...) {
print(paste("Finished at", Sys.time()))
tlg
},
adam_datasets = c("adsl", "adae")
)
run(my_pipeline, adam_study_data)
#> [1] "Finished at 2022-10-14 01:37:42"
#> MedDRA System Organ Class A: Drug X B: Placebo C: Combination
#> MedDRA Preferred Term (N=134) (N=134) (N=132)
#> ——————————————————————————————————————————————————————————————————————————————————————————————————————
#> Total number of patients with at least one adverse event 100 (74.6%) 98 (73.1%) 103 (78%)
#> Overall total number of events 502 480 604
#> cl A.1
#> Total number of patients with at least one adverse event 68 (50.7%) 58 (43.3%) 76 (57.6%)
#> Total number of events 115 99 137
#> dcd A.1.1.1.1 45 (33.6%) 31 (23.1%) 52 (39.4%)
#> dcd A.1.1.1.2 41 (30.6%) 39 (29.1%) 42 (31.8%)
#> cl B.2
#> Total number of patients with at least one adverse event 62 (46.3%) 56 (41.8%) 74 (56.1%)
#> Total number of events 102 106 127
#> dcd B.2.2.3.1 38 (28.4%) 40 (29.9%) 45 (34.1%)
#> dcd B.2.1.2.1 39 (29.1%) 34 (25.4%) 46 (34.8%)
#> cl D.1
#> Total number of patients with at least one adverse event 64 (47.8%) 54 (40.3%) 68 (51.5%)
#> Total number of events 106 84 114
#> dcd D.1.1.1.1 42 (31.3%) 32 (23.9%) 46 (34.8%)
#> dcd D.1.1.4.2 38 (28.4%) 34 (25.4%) 40 (30.3%)
#> cl D.2
#> Total number of patients with at least one adverse event 37 (27.6%) 46 (34.3%) 50 (37.9%)
#> Total number of events 49 57 65
#> dcd D.2.1.5.3 37 (27.6%) 46 (34.3%) 50 (37.9%)
#> cl C.2
#> Total number of patients with at least one adverse event 28 (20.9%) 36 (26.9%) 48 (36.4%)
#> Total number of events 39 40 57
#> dcd C.2.1.2.1 28 (20.9%) 36 (26.9%) 48 (36.4%)
#> cl B.1
#> Total number of patients with at least one adverse event 38 (28.4%) 37 (27.6%) 36 (27.3%)
#> Total number of events 44 43 50
#> dcd B.1.1.1.1 38 (28.4%) 37 (27.6%) 36 (27.3%)
#> cl C.1
#> Total number of patients with at least one adverse event 36 (26.9%) 34 (25.4%) 36 (27.3%)
#> Total number of events 47 51 54
#> dcd C.1.1.1.3 36 (26.9%) 34 (25.4%) 36 (27.3%)
Note that to ensure the correct execution of the run
function, the name of the first argument of the main
function must be adam_db
; the input dm
object
to pre-process. The name of the first argument of the
preprocess
function must be adam_db
; the input
dm
object to create TLG
output and finally,
the name of the first argument of the postprocess
function
must be TLG
, the input TableTree
object to
post-process. Validation criteria enforce these rules upon creation of a
pipeline.