Data merge
NEST coreDev
2022-05-11
data-merge.Rmd
Combining datasets is a crucial step when using modules with more
than one dataset. In the context of teal
, we use the term
“merge” to combine datasets where two functions are offered
data_merge_module
and data_merge_srv
.
Depending on the specific scenario, one or the other shall be used.
When no processing of the data_extract
list is required,
the data_merge_module
function is used to read the data and
the data_extract_spec
’s list and apply the merging. It is a
wrapper that combines data_extract_multiple_srv()
and
data_merge_srv()
see below for more details. With
additional processing of the data_extract
list input,
data_merge_srv()
can be combined with
data_extract_multiple_srv()
or
data_extract_srv()
to customize the
selector_list
input.
In the coming sections, we will show examples of both scenarios.
data_merge_module
With data_merge_module
solely, all you would need is a
list of data_extract_spec
objects for the
data_extract
argument and a FilteredData
object for the datasets
argument.
App code
library(teal.transform)
#> Loading required package: magrittr
library(shiny)
adsl_extract <- teal.transform::data_extract_spec(
dataname = "ADSL",
select = select_spec(
label = "Select variable:",
choices = c("AGE", "BMRKR1"),
selected = "AGE",
multiple = TRUE,
fixed = FALSE
)
)
adtte_extract <- teal.transform::data_extract_spec(
dataname = "ADTTE",
select = select_spec(
choices = c("AVAL", "ASEQ"),
selected = "AVAL",
multiple = TRUE,
fixed = FALSE
)
)
data_extracts <- list(adsl_extract = adsl_extract, adtte_extract = adtte_extract)
merge_ui <- function(id, data_extracts) {
ns <- NS(id)
teal.widgets::standard_layout(
output = teal.widgets::white_small_well(
verbatimTextOutput(ns("expr")),
dataTableOutput(ns("data"))
),
encoding = div(
teal.transform::data_extract_ui(
ns("adsl_extract"), # must correspond with data_extracts list names
label = "ADSL extract",
data_extracts[[1]]
),
teal.transform::data_extract_ui(
ns("adtte_extract"), # must correspond with data_extracts list names
label = "ADTTE extract",
data_extracts[[2]]
)
)
)
}
merge_module <- function(id, datasets, data_extracts) {
moduleServer(id, function(input, output, session) {
merged_data <- teal.transform::data_merge_module(
data_extract = data_extracts,
datasets = datasets,
merge_function = "dplyr::left_join"
)
output$expr <- renderText(merged_data()$expr)
output$data <- renderDataTable(merged_data()$data())
})
}
sample_filtered_data <- function() {
# create TealData
adsl <- teal.data::cdisc_dataset("ADSL", scda::synthetic_cdisc_data("latest")$adsl)
adtte <- teal.data::cdisc_dataset("ADTTE", scda::synthetic_cdisc_data("latest")$adtte)
data <- teal.data::cdisc_data(adsl, adtte)
# convert TealData to FilteredData
datasets <- teal.slice:::filtered_data_new(data)
teal.slice:::filtered_data_set(data, datasets)
datasets
}
datasets <- sample_filtered_data()
data_extract_multiple_srv
+
data_merge_srv
In the scenario above, if the user deselects the ADTTE
variable, the merging between ADTTE
and ADSL
would still take place even though ADTTE
is not used or
needed here. Here, the developer might update the
selector_list
input in a reactive manner so that it gets
updated based on conditions set by the developer. Below, we reuse the
input from above and we update the app
server so that the
adtte_extract
is removed from the
selector_list
input when no ADTTE
variable is
selected and the reactive_selector_list
is passed to
data_merge_srv
:
merge_module <- function(id, datasets, data_extracts) {
moduleServer(id, function(input, output, session) {
selector_list <- teal.transform::data_extract_multiple_srv(data_extracts, datasets)
reactive_selector_list <- reactive({
if (is.null(selector_list()$adtte_extract) || length(selector_list()$adtte_extract()$select) == 0) {
selector_list()[names(selector_list()) != "adtte_extract"]
} else {
selector_list()
}
})
merged_data <- teal.transform::data_merge_srv(
selector_list = reactive_selector_list,
datasets = datasets,
merge_function = "dplyr::left_join"
)
output$expr <- renderText(merged_data()$expr)
output$data <- renderDataTable(merged_data()$data())
})
}
Shiny app
shinyApp(
ui = fluidPage(merge_ui("data_merge", data_extracts)),
server = function(input, output, session) {
merge_module("data_merge", datasets, data_extracts)
}
)
data_merge_module
is replaced here with three parts:
-
selector_list
: output ofdata_extract_multiple_srv
which loops over the list of data_extract given and runsdata_extract_srv
for each one returning a list of reactive objects. -
reactive_selector_list
: intermediate reactive list updatingselector_list
content -
merged_data
: output ofdata_merge_srv
usingreactive_selector_list
as input
Output from merging
Both merge functions, data_merge_srv
and
data_merge_module
, return a reactive object which contains
a list of the following elements:
-
data
: the merged dataset after filtering and reshaping containing selected columns -
expr
: code needed to replicate merged dataset -
chunks
: chunks R6 object (seeteal.code
) -
columns_source
: list of columns selected per selector -
keys
: the keys of the merged dataset -
filter_info
: filters that are applied on the data
These elements can be further used inside the server to retrieve and use information about the selections, data, filters, …
Merging of non CDISC
datasets
General datasets do not share the same relationships as
CDISC
datasets thus these relationships must be specified
by the join_keys
functions. For more information, please
refer to the Join Keys
vignette.
The data merge module respects the relationships given by the user
and in the case of multiple datasets to merge, the order is specified by
the order of elements in the data_extract
argument of the
data_merge_module
function. Merging groups of datasets with
complex relationships can quickly become challenging to specify so
please take extra care when setting this up.