Input `ADaM` data in a teal application

Introduction

To include ADaM data in a teal app, the teal.data::cdisc_data function is used.

The cdisc_data function allows teal applications to include multiple datasets, identifying merge keys and providing information to produce R code for reproducibility.

There is an advantage to passing CDISC datasets that adhere to ADaM standards to these functions in that the code is minimized. However, the dataset-related functions also include the flexibility to work with non-standard datasets provided that merge keys and the relationship between the datasets are specified.

The examples below illustrate the usage of these different dataset functions for example cdisc_dataset and dataset. For more information, see documentation in teal.data.

Keys

Primary keys serve as unique row identifiers in individual datasets and thus need to be specified for each dataset and dataset connector. These can be specified on the most general dataset constructor dataset as shown below.

library(teal)

# using cdisc_dataset, keys are automatically derived for standard datanames
# (although they can be overwritten)
adsl <- data.frame(
  STUDYID = "study",
  USUBJID = 1:10,
  SEX = sample(c("F", "M"), 10, replace = TRUE),
  AGE = rpois(10, 40)
)
dataset_adsl <- cdisc_dataset("ADSL", adsl)
class(dataset_adsl)

## [1] "CDISCTealDataset" "TealDataset"      "R6"

When passing multiple datasets to the cdisc_data function, dataset relationship are set using join_keys and join_key and these are used to merge datasets together within teal apps.

In the example below, two standard CDISC datasets (ADSL and ADTTE) are passed to the aforementioned function. In the case of CDISC datasets that adhere to ADaM standards, the merge keys do not need to be manually specified. Keys are automatically added if dataname matches one of the implemented standards as documented in the cdisc_dataset function. This minimizes the code needed to allow data merges as seen in this example:

adsl <- data.frame(
  STUDYID = "study",
  USUBJID = 1:10,
  SEX = sample(c("F", "M"), 10, replace = TRUE),
  AGE = rpois(10, 40)
)
adtte <- rbind(adsl, adsl, adsl)
adtte$PARAMCD <- rep(c("OS", "EFS", "PFS"), each = 10)
adtte$AVAL <- c(
  rnorm(10, mean = 700, sd = 200), # dummy OS level
  rnorm(10, mean = 400, sd = 100), # dummy EFS level
  rnorm(10, mean = 450, sd = 200) # dummy PFS level
)

cdisc_data_obj <- cdisc_data(
  cdisc_dataset(
    dataname = "ADSL",
    x = adsl,
    code = '
    adsl <- data.frame(
      STUDYID = "study",
      USUBJID = 1:10,
      SEX = sample(c("F", "M"), 10, replace = TRUE),
      AGE = rpois(10, 40)
    )'
  ),
  cdisc_dataset(
    dataname = "ADTTE",
    x = adtte,
    code = '
    adtte <- rbind(adsl, adsl, adsl)
    adtte$PARAMCD <- rep(c("OS", "EFS", "PFS"), each = 10)
    adtte$AVAL <- c(
      rnorm(10, mean = 700, sd = 200),
      rnorm(10, mean = 400, sd = 100),
      rnorm(10, mean = 450, sd = 200)
    )'
  )
)
class(cdisc_data_obj)

## [1] "TealData"         "TealDataAbstract" "R6"

# which is equivalent to:
example_data <- cdisc_data(
  cdisc_dataset(
    dataname = "ADSL",
    x = adsl,
    code = '
    adsl <- data.frame(
      STUDYID = "study",
      USUBJID = 1:10,
      SEX = sample(c("F", "M"), 10, replace = TRUE),
      AGE = rpois(10, 40)
    )',
    keys = c("STUDYID", "USUBJID")
  ),
  cdisc_dataset(
    dataname = "ADTTE",
    x = adtte,
    code = '
    adtte <- rbind(adsl, adsl, adsl)
    adtte$PARAMCD <- rep(c("OS", "EFS", "PFS"), each = 10)
    adtte$AVAL <- c(
      rnorm(10, mean = 700, sd = 200),
      rnorm(10, mean = 400, sd = 100),
      rnorm(10, mean = 450, sd = 200)
    )',
    keys = c("STUDYID", "USUBJID", "PARAMCD")
  ),
  join_keys = join_keys(
    join_key("ADSL", "ADSL", c("STUDYID", "USUBJID")),
    join_key("ADTTE", "ADTTE", c("USUBJID", "STUDYID", "PARAMCD")),
    join_key("ADSL", "ADTTE", c("STUDYID", "USUBJID"))
  )
)

app <- init(
  data = example_data,
  modules = example_module()
)

if (interactive()) {
  shinyApp(app$ui, app$server)
}

The [teal.data::join_keys()] function is used to specify keys:

[teal.data::join_keys()] is a collection of multiple [teal.data::join_key()] entries
[teal.data::join_key()] specifies the relation between two datasets:
- dataset_1, dataset_2 - name of two datasets
- key - (optionally) named vector of column names

Note that it is assumed that join keys are symmetric, i.e. join_key("x", "y", "x_col" = "y_col") will enable merge from “x” to “y” and vice-versa.

For more information about preprocessing, reproducibility, relationships between datasets and DDL, please refer to the teal.data package.

NEST CoreDev

2022-04-20

Introduction

Keys