Input `ADaM` data in a teal application
NEST CoreDev
2022-04-20
including-adam-data-in-teal.Rmd
Introduction
To include ADaM
data in a teal app, the
teal.data::cdisc_data
function is used.
The cdisc_data
function allows teal
applications to include multiple datasets, identifying merge keys and
providing information to produce R code for reproducibility.
There is an advantage to passing CDISC
datasets that
adhere to ADaM
standards to these functions in that the
code is minimized. However, the dataset-related functions also include
the flexibility to work with non-standard datasets provided that merge
keys and the relationship between the datasets are specified.
The examples below illustrate the usage of these different dataset
functions for example cdisc_dataset
and
dataset
. For more information, see documentation in
teal.data
.
Keys
Primary keys serve as unique row identifiers in individual datasets
and thus need to be specified for each dataset and dataset connector.
These can be specified on the most general dataset constructor
dataset
as shown below.
library(teal)
# using cdisc_dataset, keys are automatically derived for standard datanames
# (although they can be overwritten)
adsl <- data.frame(
STUDYID = "study",
USUBJID = 1:10,
SEX = sample(c("F", "M"), 10, replace = TRUE),
AGE = rpois(10, 40)
)
dataset_adsl <- cdisc_dataset("ADSL", adsl)
class(dataset_adsl)
## [1] "CDISCTealDataset" "TealDataset" "R6"
When passing multiple datasets to the cdisc_data
function, dataset relationship are set using join_keys
and
join_key
and these are used to merge datasets together
within teal apps.
In the example below, two standard CDISC
datasets
(ADSL
and ADTTE
) are passed to the
aforementioned function. In the case of CDISC
datasets that
adhere to ADaM
standards, the merge keys do not need to be
manually specified. Keys are automatically added if
dataname
matches one of the implemented standards as
documented in the cdisc_dataset
function. This minimizes
the code needed to allow data merges as seen in this example:
adsl <- data.frame(
STUDYID = "study",
USUBJID = 1:10,
SEX = sample(c("F", "M"), 10, replace = TRUE),
AGE = rpois(10, 40)
)
adtte <- rbind(adsl, adsl, adsl)
adtte$PARAMCD <- rep(c("OS", "EFS", "PFS"), each = 10)
adtte$AVAL <- c(
rnorm(10, mean = 700, sd = 200), # dummy OS level
rnorm(10, mean = 400, sd = 100), # dummy EFS level
rnorm(10, mean = 450, sd = 200) # dummy PFS level
)
cdisc_data_obj <- cdisc_data(
cdisc_dataset(
dataname = "ADSL",
x = adsl,
code = '
adsl <- data.frame(
STUDYID = "study",
USUBJID = 1:10,
SEX = sample(c("F", "M"), 10, replace = TRUE),
AGE = rpois(10, 40)
)'
),
cdisc_dataset(
dataname = "ADTTE",
x = adtte,
code = '
adtte <- rbind(adsl, adsl, adsl)
adtte$PARAMCD <- rep(c("OS", "EFS", "PFS"), each = 10)
adtte$AVAL <- c(
rnorm(10, mean = 700, sd = 200),
rnorm(10, mean = 400, sd = 100),
rnorm(10, mean = 450, sd = 200)
)'
)
)
class(cdisc_data_obj)
## [1] "TealData" "TealDataAbstract" "R6"
# which is equivalent to:
example_data <- cdisc_data(
cdisc_dataset(
dataname = "ADSL",
x = adsl,
code = '
adsl <- data.frame(
STUDYID = "study",
USUBJID = 1:10,
SEX = sample(c("F", "M"), 10, replace = TRUE),
AGE = rpois(10, 40)
)',
keys = c("STUDYID", "USUBJID")
),
cdisc_dataset(
dataname = "ADTTE",
x = adtte,
code = '
adtte <- rbind(adsl, adsl, adsl)
adtte$PARAMCD <- rep(c("OS", "EFS", "PFS"), each = 10)
adtte$AVAL <- c(
rnorm(10, mean = 700, sd = 200),
rnorm(10, mean = 400, sd = 100),
rnorm(10, mean = 450, sd = 200)
)',
keys = c("STUDYID", "USUBJID", "PARAMCD")
),
join_keys = join_keys(
join_key("ADSL", "ADSL", c("STUDYID", "USUBJID")),
join_key("ADTTE", "ADTTE", c("USUBJID", "STUDYID", "PARAMCD")),
join_key("ADSL", "ADTTE", c("STUDYID", "USUBJID"))
)
)
app <- init(
data = example_data,
modules = example_module()
)
if (interactive()) {
shinyApp(app$ui, app$server)
}
The [teal.data::join_keys()] function is used to specify keys:
- [teal.data::join_keys()] is a collection of multiple [teal.data::join_key()] entries
- [teal.data::join_key()] specifies the relation between two datasets:
-
dataset_1
,dataset_2
- name of two datasets -
key
- (optionally) named vector of column names
-
Note that it is assumed that join keys are symmetric,
i.e. join_key("x", "y", "x_col" = "y_col")
will enable
merge from “x” to “y” and vice-versa.
For more information about preprocessing, reproducibility,
relationships between datasets and DDL
, please refer to the
teal.data
package.