Data merge
NEST coreDev
2022-05-11
data-merge.RmdCombining datasets is a crucial step when using modules with more
than one dataset. In the context of teal, we use the term
“merge” to combine datasets where two functions are offered
merge_expression_module and
merge_expression_srv. Depending on the specific scenario,
one or the other shall be used.
When no processing of the data_extract list is required,
the merge_expression_module function is used to read the
data and the data_extract_spec’s list and apply the
merging. It is a wrapper that combines
data_extract_multiple_srv() and
merge_expression_srv() see below for more details. With
additional processing of the data_extract list input,
merge_expression_srv() can be combined with
data_extract_multiple_srv() or
data_extract_srv() to customize the
selector_list input.
In the coming sections, we will show examples of both scenarios.
merge_expression_module
With merge_expression_module solely, all you would need
is a list of data_extract_spec objects for the
data_extract argument, a list of reactive or non-reactive
data.frame objects and a list of join keys corresponding to
every data.frame object.
App code
library(teal.transform)
#> Loading required package: magrittr
library(shiny)
adsl_extract <- teal.transform::data_extract_spec(
dataname = "ADSL",
select = select_spec(
label = "Select variable:",
choices = c("AGE", "BMRKR1"),
selected = "AGE",
multiple = TRUE,
fixed = FALSE
)
)
adtte_extract <- teal.transform::data_extract_spec(
dataname = "ADTTE",
select = select_spec(
choices = c("AVAL", "ASEQ"),
selected = "AVAL",
multiple = TRUE,
fixed = FALSE
)
)
data_extracts <- list(adsl_extract = adsl_extract, adtte_extract = adtte_extract)
merge_ui <- function(id, data_extracts) {
ns <- NS(id)
teal.widgets::standard_layout(
output = teal.widgets::white_small_well(
verbatimTextOutput(ns("expr")),
dataTableOutput(ns("data"))
),
encoding = div(
teal.transform::data_extract_ui(
ns("adsl_extract"), # must correspond with data_extracts list names
label = "ADSL extract",
data_extracts[[1]]
),
teal.transform::data_extract_ui(
ns("adtte_extract"), # must correspond with data_extracts list names
label = "ADTTE extract",
data_extracts[[2]]
)
)
)
}
merge_module <- function(id, datasets, data_extracts, join_keys) {
moduleServer(id, function(input, output, session) {
merged_data <- teal.transform::merge_expression_module(
data_extract = data_extracts,
datasets = datasets,
join_keys = join_keys,
merge_function = "dplyr::left_join"
)
ANL <- reactive({ # nolint
eval(envir = list2env(datasets), expr = as.expression(merged_data()$expr))
})
output$expr <- renderText(paste(merged_data()$expr, collapse = "\n"))
output$data <- renderDataTable(ANL())
})
}
# Define data.frame objects
ADSL <- teal.transform::rADSL # nolint
ADTTE <- teal.transform::rADTTE # nolint
# create a list of data.frame objects
datasets <- list(ADSL = ADSL, ADTTE = ADTTE)
# create join_keys
join_keys <- teal.data::join_keys(
teal.data::join_key("ADSL", "ADSL", c("STUDYID", "USUBJID")),
teal.data::join_key("ADSL", "ADTTE", c("STUDYID", "USUBJID")),
teal.data::join_key("ADTTE", "ADTTE", c("STUDYID", "USUBJID", "PARAMCD"))
)
data_extract_multiple_srv +
merge_expression_srv
In the scenario above, if the user deselects the ADTTE
variable, the merging between ADTTE and ADSL
would still take place even though ADTTE is not used or
needed here. Here, the developer might update the
selector_list input in a reactive manner so that it gets
updated based on conditions set by the developer. Below, we reuse the
input from above and we update the app server so that the
adtte_extract is removed from the
selector_list input when no ADTTE variable is
selected and the reactive_selector_list is passed to
merge_expression_srv:
merge_module <- function(id, datasets, data_extracts, join_keys) {
moduleServer(id, function(input, output, session) {
selector_list <- teal.transform::data_extract_multiple_srv(data_extracts, datasets, join_keys)
reactive_selector_list <- reactive({
if (is.null(selector_list()$adtte_extract) || length(selector_list()$adtte_extract()$select) == 0) {
selector_list()[names(selector_list()) != "adtte_extract"]
} else {
selector_list()
}
})
merged_data <- teal.transform::merge_expression_srv(
selector_list = reactive_selector_list,
datasets = datasets,
join_keys = join_keys,
merge_function = "dplyr::left_join"
)
ANL <- reactive({ # nolint
eval(envir = list2env(datasets), expr = as.expression(merged_data()$expr))
})
output$expr <- renderText(paste(merged_data()$expr, collapse = "\n"))
output$data <- renderDataTable(ANL())
})
}Shiny app
shinyApp(
ui = fluidPage(merge_ui("data_merge", data_extracts)),
server = function(input, output, session) {
merge_module("data_merge", datasets, data_extracts, join_keys)
}
)merge_expression_module is replaced here with three
parts:
-
selector_list: output ofdata_extract_multiple_srvwhich loops over the list of data_extract given and runsdata_extract_srvfor each one returning a list of reactive objects. -
reactive_selector_list: intermediate reactive list updatingselector_listcontent -
merged_data: output ofmerge_expression_srvusingreactive_selector_listas input
Output from merging
Both merge functions, merge_expression_srv and
merge_expression_module, return a reactive object which
contains a list of the following elements:
-
expr: code needed to replicate merged dataset -
columns_source: list of columns selected per selector -
keys: the keys of the merged dataset -
filter_info: filters that are applied on the data
These elements can be further used inside the server to retrieve and use information about the selections, data, filters, …
Merging of non CDISC datasets
General datasets do not share the same relationships as
CDISC datasets thus these relationships must be specified
by the join_keys functions. For more information, please
refer to the Join Keys vignette.
The data merge module respects the relationships given by the user
and in the case of multiple datasets to merge, the order is specified by
the order of elements in the data_extract argument of the
merge_expression_module function. Merging groups of
datasets with complex relationships can quickly become challenging to
specify so please take extra care when setting this up.