Examining Delayed Data Objects
Dawid Kałędkowski
15.03.2022
testing-delayed-data.Rmd
Overview
Once your delayed data object has been created as described in Delayed Data Objects,
teal.data
provides a useful set of functions to examine the
object outside of a shiny application, i.e. the global environment.
Below is an exhaustive list of all such functions:
TealDataset |
TealDatasetConnector |
TealDataConnector & TealData
|
|
---|---|---|---|
Get Reproducible Code (Optionally Deparsed) | get_code |
get_code |
get_code |
Get data.frame | get_raw_data |
get_raw_data |
get_raw_data |
Get Dataset Name | get_dataname |
get_dataname |
get_dataname |
Get Single Dataset Object | get_dataset |
get_dataset |
get_dataset |
Get All Dataset Objects | - | - | get_datasets |
Load Data | - | load_dataset |
load_datasets |
Check if Loaded | - | is_pulled |
is_pulled |
Mutate Single Dataset | mutate_dataset |
mutate_dataset |
mutate_dataset |
Mutate All Datasets | - | - | mutate_data |
The most basic function get_dataname
returns the name of
the dataset or datasets in your delayed data object:
library(scda)
library(teal.data)
adsl_cf <- callable_function(function() synthetic_cdisc_data("latest")$adsl)
adsl <- cdisc_dataset_connector(
dataname = "ADSL",
pull_callable = adsl_cf,
keys = get_cdisc_keys("ADSL")
)
get_dataname(adsl) # "ADSL"
## [1] "ADSL"
adae_cf <- callable_function(function() synthetic_cdisc_data("latest")$adae)
adae <- cdisc_dataset_connector(
dataname = "ADAE",
pull_callable = adae_cf,
keys = get_cdisc_keys("ADAE")
)
delayed_data <- cdisc_data(adsl, adae)
get_dataname(delayed_data) # "ADSL" "ADAE"
## [1] "ADSL" "ADAE"
The delayed data objects described above all also contain a
launch
method which can be used to test the data loading
screen:
if (interactive()) {
delayed_data$launch()
}
There is also a pull
method to test that the data can be
loaded without launching a shiny app. See Delayed Data Advanced.
Alternatively teal.data
provides a
load_dataset
function for
<...>Dataset<...>
objects which is used to pull
the data without launching the delayed loading screen, and a
load_datasets
function for
<...>Data<...>
objects which launches the
delayed loading screen used to pull the datasets from the
connection.
After loading the data, it can be checked that the data has been
successfully pulled using the is_pulled
function:
if (interactive()) {
load_datasets(delayed_data)
}
is_pulled(delayed_data)
## [1] FALSE
Aside: Loading page UI
It is possible to set default values of the boxes on the loading page
using the set_ui_input
method:
adae$set_ui_input(function(ns) {
list(pickerInput("name", label = "Version of the dataset", choices = ls_synthetic_cdisc_data(), selected = "latest"))
})
Testing data loading continued
Once the data are loaded, it’s also possible to access the individual
dataset objects using the get_dataset
function, or for
<...>Data<...>
objects, retrieve all dataset
objects using the get_datasets
function:
lapply(delayed_data$get_items(), function(item) item$pull())
# return a particular dataset by name
get_dataset(delayed_data, dataname = "ADSL")
# or return all datasets
load_datasets(delayed_data)
get_datasets(delayed_data)
Note that when a connector is loaded, the result is a dataset object:
# "CDISCTealDatasetConnector" "TealDatasetConnector" "R6"
class(adsl)
## [1] "CDISCTealDatasetConnector" "TealDatasetConnector"
## [3] "R6"
# "CDISCTealDataset" "TealDataset" "R6"
class(get_dataset(adsl))
## [1] "CDISCTealDataset" "TealDataset" "R6"
To view the raw dataframe object, use the get_raw_data
function:
# for a single <...>Dataset<..> object
head(get_raw_data(adsl), 2)
## STUDYID USUBJID SUBJID SITEID AGE AGEU SEX
## 1 AB12345 AB12345-CHN-3-id-128 id-128 CHN-3 32 YEARS M
## 2 AB12345 AB12345-CHN-15-id-262 id-262 CHN-15 35 YEARS M
## RACE ETHNIC COUNTRY DTHFL INVID
## 1 ASIAN HISPANIC OR LATINO CHN Y INV ID CHN-3
## 2 BLACK OR AFRICAN AMERICAN NOT HISPANIC OR LATINO CHN N INV ID CHN-15
## INVNAM ARM ARMCD ACTARM ACTARMCD TRT01P
## 1 Dr. CHN-3 Doe A: Drug X ARM A A: Drug X ARM A A: Drug X
## 2 Dr. CHN-15 Doe C: Combination ARM C C: Combination ARM C C: Combination
## TRT01A TRT02P TRT02A REGION1 STRATA1 STRATA2 BMRKR1
## 1 A: Drug X B: Placebo A: Drug X Asia C S2 14.424934
## 2 C: Combination B: Placebo C: Combination Asia C S1 4.055463
## BMRKR2 ITTFL SAFFL BMEASIFL BEP01FL AEWITHFL RANDDT TRTSDTM
## 1 MEDIUM Y Y Y Y N 2019-02-22 2019-02-24 11:09:18
## 2 LOW Y Y N N Y 2019-02-26 2019-02-26 09:05:00
## TRTEDTM TRT01SDTM TRT01EDTM
## 1 2022-02-12 03:55:58 2019-02-24 11:09:18 2021-02-11 22:06:46
## 2 2022-02-26 02:32:36 2019-02-26 09:05:00 2021-02-25 20:43:24
## TRT02SDTM TRT02EDTM AP01SDTM
## 1 2021-02-11 22:06:46 2022-02-12 03:55:58 2019-02-24 11:09:18
## 2 2021-02-25 20:43:24 2022-02-26 02:32:36 2019-02-26 09:05:00
## AP01EDTM AP02SDTM AP02EDTM EOSSTT
## 1 2021-02-11 22:06:46 2021-02-11 22:06:46 2022-02-12 03:55:58 DISCONTINUED
## 2 2021-02-25 20:43:24 2021-02-25 20:43:24 2022-02-26 02:32:36 COMPLETED
## EOTSTT EOSDT EOSDY DCSREAS DTHDT DTHCAUS DTHCAT
## 1 DISCONTINUED 2022-02-12 1084 DEATH 2022-03-06 ADVERSE EVENT ADVERSE EVENT
## 2 COMPLETED 2022-02-26 1096 <NA> <NA> <NA> <NA>
## LDDTHELD LDDTHGR1 LSTALVDT DTHADY ADTHAUT study_duration_secs
## 1 22 <=30 2022-03-06 1106 Yes 63113904
## 2 NA <NA> 2022-03-17 NA <NA> 63113904
# or for a <...>Data<...> object containing multiple datasets, specify the name of the dataset of interest
raw <- get_raw_data(delayed_data, "ADSL")
head(raw, 2)
## STUDYID USUBJID SUBJID SITEID AGE AGEU SEX
## 1 AB12345 AB12345-CHN-3-id-128 id-128 CHN-3 32 YEARS M
## 2 AB12345 AB12345-CHN-15-id-262 id-262 CHN-15 35 YEARS M
## RACE ETHNIC COUNTRY DTHFL INVID
## 1 ASIAN HISPANIC OR LATINO CHN Y INV ID CHN-3
## 2 BLACK OR AFRICAN AMERICAN NOT HISPANIC OR LATINO CHN N INV ID CHN-15
## INVNAM ARM ARMCD ACTARM ACTARMCD TRT01P
## 1 Dr. CHN-3 Doe A: Drug X ARM A A: Drug X ARM A A: Drug X
## 2 Dr. CHN-15 Doe C: Combination ARM C C: Combination ARM C C: Combination
## TRT01A TRT02P TRT02A REGION1 STRATA1 STRATA2 BMRKR1
## 1 A: Drug X B: Placebo A: Drug X Asia C S2 14.424934
## 2 C: Combination B: Placebo C: Combination Asia C S1 4.055463
## BMRKR2 ITTFL SAFFL BMEASIFL BEP01FL AEWITHFL RANDDT TRTSDTM
## 1 MEDIUM Y Y Y Y N 2019-02-22 2019-02-24 11:09:18
## 2 LOW Y Y N N Y 2019-02-26 2019-02-26 09:05:00
## TRTEDTM TRT01SDTM TRT01EDTM
## 1 2022-02-12 03:55:58 2019-02-24 11:09:18 2021-02-11 22:06:46
## 2 2022-02-26 02:32:36 2019-02-26 09:05:00 2021-02-25 20:43:24
## TRT02SDTM TRT02EDTM AP01SDTM
## 1 2021-02-11 22:06:46 2022-02-12 03:55:58 2019-02-24 11:09:18
## 2 2021-02-25 20:43:24 2022-02-26 02:32:36 2019-02-26 09:05:00
## AP01EDTM AP02SDTM AP02EDTM EOSSTT
## 1 2021-02-11 22:06:46 2021-02-11 22:06:46 2022-02-12 03:55:58 DISCONTINUED
## 2 2021-02-25 20:43:24 2021-02-25 20:43:24 2022-02-26 02:32:36 COMPLETED
## EOTSTT EOSDT EOSDY DCSREAS DTHDT DTHCAUS DTHCAT
## 1 DISCONTINUED 2022-02-12 1084 DEATH 2022-03-06 ADVERSE EVENT ADVERSE EVENT
## 2 COMPLETED 2022-02-26 1096 <NA> <NA> <NA> <NA>
## LDDTHELD LDDTHGR1 LSTALVDT DTHADY ADTHAUT study_duration_secs
## 1 22 <=30 2022-03-06 1106 Yes 63113904
## 2 NA <NA> 2022-03-17 NA <NA> 63113904
# note the raw data is now just a regular R table
class(raw)
## [1] "tbl_df" "tbl" "data.frame"
The get_code
function is called to check that the
processing code is as expected (and for reproducibility).
get_code(delayed_data)
## [1] "ADSL <- (function() synthetic_cdisc_data(\"latest\")$adsl)()\nADAE <- (function() synthetic_cdisc_data(\"latest\")$adae)()"
See the section on pre-processing Delayed Data
to specify additional code instructions to transform your delayed data
which will also be added to the output of get_code
.
Aside: Piping functions
The examples above covered some basic piping, but there is a natural
sequence to the loading and inspection of a delayed data object. For
this reason, the magrittr
pipe %>%
works
well for many pre-processing tasks.
library(teal.data)
library(scda)
library(magrittr)
adsl_cf <- callable_function(function() synthetic_cdisc_data("latest")$adsl)
cdisc_dataset_connector(
dataname = "ADSL",
pull_callable = adsl_cf,
keys = get_cdisc_keys("ADSL")
) %>%
mutate_dataset("ADSL$TRTDUR <- round(as.numeric(ADSL$TRTEDTM - ADSL$TRTSDTM), 1)") %>%
load_dataset() %>%
get_raw_data() %>%
head(n = 2)
## STUDYID USUBJID SUBJID SITEID AGE AGEU SEX
## 1 AB12345 AB12345-CHN-3-id-128 id-128 CHN-3 32 YEARS M
## 2 AB12345 AB12345-CHN-15-id-262 id-262 CHN-15 35 YEARS M
## RACE ETHNIC COUNTRY DTHFL INVID
## 1 ASIAN HISPANIC OR LATINO CHN Y INV ID CHN-3
## 2 BLACK OR AFRICAN AMERICAN NOT HISPANIC OR LATINO CHN N INV ID CHN-15
## INVNAM ARM ARMCD ACTARM ACTARMCD TRT01P
## 1 Dr. CHN-3 Doe A: Drug X ARM A A: Drug X ARM A A: Drug X
## 2 Dr. CHN-15 Doe C: Combination ARM C C: Combination ARM C C: Combination
## TRT01A TRT02P TRT02A REGION1 STRATA1 STRATA2 BMRKR1
## 1 A: Drug X B: Placebo A: Drug X Asia C S2 14.424934
## 2 C: Combination B: Placebo C: Combination Asia C S1 4.055463
## BMRKR2 ITTFL SAFFL BMEASIFL BEP01FL AEWITHFL RANDDT TRTSDTM
## 1 MEDIUM Y Y Y Y N 2019-02-22 2019-02-24 11:09:18
## 2 LOW Y Y N N Y 2019-02-26 2019-02-26 09:05:00
## TRTEDTM TRT01SDTM TRT01EDTM
## 1 2022-02-12 03:55:58 2019-02-24 11:09:18 2021-02-11 22:06:46
## 2 2022-02-26 02:32:36 2019-02-26 09:05:00 2021-02-25 20:43:24
## TRT02SDTM TRT02EDTM AP01SDTM
## 1 2021-02-11 22:06:46 2022-02-12 03:55:58 2019-02-24 11:09:18
## 2 2021-02-25 20:43:24 2022-02-26 02:32:36 2019-02-26 09:05:00
## AP01EDTM AP02SDTM AP02EDTM EOSSTT
## 1 2021-02-11 22:06:46 2021-02-11 22:06:46 2022-02-12 03:55:58 DISCONTINUED
## 2 2021-02-25 20:43:24 2021-02-25 20:43:24 2022-02-26 02:32:36 COMPLETED
## EOTSTT EOSDT EOSDY DCSREAS DTHDT DTHCAUS DTHCAT
## 1 DISCONTINUED 2022-02-12 1084 DEATH 2022-03-06 ADVERSE EVENT ADVERSE EVENT
## 2 COMPLETED 2022-02-26 1096 <NA> <NA> <NA> <NA>
## LDDTHELD LDDTHGR1 LSTALVDT DTHADY ADTHAUT study_duration_secs TRTDUR
## 1 22 <=30 2022-03-06 1106 Yes 63113904 1083.7
## 2 NA <NA> 2022-03-17 NA <NA> 63113904 1095.7
Since these functions modify (operate on) the objects that are given to them, there is no need to assign the result.
For an introduction to pipes, refer to the documentation for
%>%
or other resources on pipes.