Input ADaM data in a teal application
NEST CoreDev
2022-04-20
including-adam-data-in-teal.Rmd
Introduction
To include ADaM
data in a teal app, the
teal.data::cdisc_data
function is used.
The cdisc_data
function allows teal
applications to include multiple datasets, identifying merge keys and
providing information to produce R code for reproducibility.
There is an advantage to passing CDISC
datasets that
adhere to ADaM
standards to these functions in that the
code is minimized. However, the dataset-related functions also include
the flexibility to work with non-standard datasets provided that merge
keys and the relationship between the datasets are specified.
The examples below illustrate the usage of these different dataset
functions for example cdisc_dataset
and
dataset
. For more information, see documentation in
teal.data
.
Keys
Primary keys serve as unique row identifiers in individual datasets
and thus need to be specified for each dataset and dataset connector.
These can be specified on the most general dataset constructor
dataset
as shown below.
library(teal)
library(scda)
# using cdisc_dataset, keys are automatically derived for standard datanames
# (although they can be overwritten)
adsl <- synthetic_cdisc_data("latest")$adsl
dataset_adsl <- cdisc_dataset("ADSL", adsl)
class(dataset_adsl)
## [1] "CDISCTealDataset" "TealDataset" "R6"
When passing multiple datasets to the cdisc_data
function, dataset relationship are set using join_keys
and
join_key
and these are used to merge datasets together
within teal apps.
In the example below, two standard CDISC datasets (ADSL and ADTTE)
are passed to the aforementioned function. In the case of
CDISC
datasets that adhere to ADaM
standards,
the merge keys do not need to be manually specified. Keys are
automatically added if dataname
matches one of the
implemented standards as documented in the cdisc_dataset
function. This minimizes the code needed to allow data merges as seen in
this example:
adsl <- synthetic_cdisc_data("latest")$adsl
adtte <- synthetic_cdisc_data("latest")$adtte
cdisc_data_obj <- cdisc_data(
cdisc_dataset(dataname = "ADSL", x = adsl, code = "ADSL <- synthetic_cdisc_data(\"latest\")$adsl"),
cdisc_dataset(dataname = "ADTTE", x = adtte, code = "ADTTE <- synthetic_cdisc_data(\"latest\")$adtte")
)
class(cdisc_data_obj)
## [1] "TealData" "TealDataAbstract" "R6"
# which is equivalent to:
example_data <- cdisc_data(
cdisc_dataset(
dataname = "ADSL",
x = adsl,
code = "ADSL <- synthetic_cdisc_data(\"latest\")$adsl",
keys = c("STUDYID", "USUBJID")
),
cdisc_dataset(
dataname = "ADTTE",
x = adtte,
code = "ADTTE <- synthetic_cdisc_data(\"latest\")$adtte",
parent = "ADSL",
keys = c("USUBJID", "STUDYID", "PARAMCD")
),
join_keys = join_keys(
join_key("ADSL", "ADSL", c("STUDYID", "USUBJID")),
join_key("ADTTE", "ADTTE", c("USUBJID", "STUDYID", "PARAMCD")),
join_key("ADSL", "ADTTE", c("STUDYID", "USUBJID"))
)
)
app <- init(
data = example_data,
modules = example_module()
)
if (interactive()) {
shinyApp(app$ui, app$server)
}
The [teal.data::join_keys()] function is used to specify keys:
- [teal.data::join_keys()] is a collection of multiple [teal.data::join_key()] entries
- [teal.data::join_key()] specifies the relation between two datasets:
-
dataset_1
,dataset_2
- name of two datasets -
key
- (optionally) named vector of column names
-
Note that it is assumed that join keys are symmetric,
i.e. join_key("x", "y", "x_col" = "y_col")
will enable
merge from “x” to “y” and vice-versa.
For more information about preprocessing, reproducibility,
relationships between datasets and DDL, please refer to the teal.data
package.