{rtables} Advanced Usage
Gabriel Becker
2025-01-10
Source:vignettes/advanced_usage.Rmd
advanced_usage.Rmd
NOTE
This vignette is currently under development. Any code or prose which
appears in a version of this vignette on the main
branch of
the repository will work/be correct, but they likely are not in their
final form.
Initialization
Control splitting with provided function (limited customization)
rtables provides an array of functions to control the splitting logic
without creating an entirely new split functions. By default
split_*_by
facets data based on categorical variable.
d1 <- subset(ex_adsl, AGE < 25)
d1$AGE <- as.factor(d1$AGE)
lyt1 <- basic_table() %>%
split_cols_by("AGE") %>%
analyze("SEX")
build_table(lyt1, d1)
## 20 21 23 24
## ————————————————————————————————————
## F 0 2 4 5
## M 1 1 2 3
## U 0 0 0 0
## UNDIFFERENTIATED 0 0 0 0
For continuous variables, the split_*_by_cutfun
can be
leveraged to create categories and the corresponding faceting, when the
break points are dependent from the data.
sd_cutfun <- function(x) {
cutpoints <- c(
min(x),
mean(x) - sd(x),
mean(x) + sd(x),
max(x)
)
names(cutpoints) <- c("", "Low", "Medium", "High")
cutpoints
}
lyt1 <- basic_table() %>%
split_cols_by_cutfun("AGE", cutfun = sd_cutfun) %>%
analyze("SEX")
build_table(lyt1, ex_adsl)
## Low Medium High
## ——————————————————————————————————————
## F 36 165 21
## M 21 115 30
## U 1 8 0
## UNDIFFERENTIATED 0 1 2
Alternatively, split_*_by_cuts
can be used when
breakpoints are predefined and split_*_by_quartiles
when
the data should be faceted by quantile.
lyt1 <- basic_table() %>%
split_cols_by_cuts(
"AGE",
cuts = c(0, 30, 60, 100),
cutlabels = c("0-30 y.o.", "30-60 y.o.", "60-100 y.o.")
) %>%
analyze("SEX")
build_table(lyt1, ex_adsl)
## 0-30 y.o. 30-60 y.o. 60-100 y.o.
## ———————————————————————————————————————————————————————
## F 71 150 1
## M 48 116 2
## U 2 7 0
## UNDIFFERENTIATED 1 2 0
Custom Split Functions
Adding an Overall Column Only When The Split Would Already Define 2+ Facets
Our custom split functions can do anything, including conditionally applying one or more other existing custom split functions.
Here we define a function constructor which accepts the variable name we want to check, and then return a custom split function that has the behavior you want using functions provided by rtables for both cases:
picky_splitter <- function(var) {
function(df, spl, vals, labels, trim) {
orig_vals <- vals
if (is.null(vals)) {
vec <- df[[var]]
vals <- if (is.factor(vec)) levels(vec) else unique(vec)
}
if (length(vals) == 1) {
do_base_split(spl = spl, df = df, vals = vals, labels = labels, trim = trim)
} else {
add_overall_level(
"Overall",
label = "All Obs", first = FALSE
)(df = df, spl = spl, vals = orig_vals, trim = trim)
}
}
}
d1 <- subset(ex_adsl, ARM == "A: Drug X")
d1$ARM <- factor(d1$ARM)
lyt1 <- basic_table() %>%
split_cols_by("ARM", split_fun = picky_splitter("ARM")) %>%
analyze("AGE")
This gives us the desired behavior in both the one column corner case:
build_table(lyt1, d1)
## A: Drug X
## ————————————————
## Mean 33.77
and the standard multi-column case:
build_table(lyt1, ex_adsl)
## A: Drug X B: Placebo C: Combination All Obs
## ————————————————————————————————————————————————————————
## Mean 33.77 35.43 35.43 34.88
Notice we use add_overall_level which is itself a function constructor, and then immediately call the constructed function in the more-than-one-columns case.
Leveraging .spl_context
What Is .spl_context
?
.spl_context
(see ?spl_context
) is a
mechanism by which the rtables
tabulation machinery gives
custom split, analysis or content (row-group summary) functions
information about the overarching facet-structure the splits or cells
they generate will reside in.
In particular .spl_context
ensures that your functions
know (and thus do computations based on) the following types of
information:
Different Formats For Different Values Within A Row-Split
dta_test <- data.frame(
USUBJID = rep(1:6, each = 3),
PARAMCD = rep("lab", 6 * 3),
AVISIT = rep(paste0("V", 1:3), 6),
ARM = rep(LETTERS[1:3], rep(6, 3)),
AVAL = c(9:1, rep(NA, 9)),
CHG = c(1:9, rep(NA, 9))
)
my_afun <- function(x, .spl_context) {
n <- sum(!is.na(x))
meanval <- mean(x, na.rm = TRUE)
sdval <- sd(x, na.rm = TRUE)
## get the split value of the most recent parent
## (row) split above this analyze
val <- .spl_context[nrow(.spl_context), "value"]
## do a silly thing to decide the different format precisiosn
## your real logic would go here
valnum <- min(2L, as.integer(gsub("[^[:digit:]]*", "", val)))
fstringpt <- paste0("xx.", strrep("x", valnum))
fmt_mnsd <- sprintf("%s (%s)", fstringpt, fstringpt)
in_rows(
n = n,
"Mean, SD" = c(meanval, sdval),
.formats = c(n = "xx", "Mean, SD" = fmt_mnsd)
)
}
lyt <- basic_table() %>%
split_cols_by("ARM") %>%
split_rows_by("AVISIT") %>%
split_cols_by_multivar(vars = c("AVAL", "CHG")) %>%
analyze_colvars(my_afun)
build_table(lyt, dta_test)
## A B C
## AVAL CHG AVAL CHG AVAL CHG
## ———————————————————————————————————————————————————————————————————————————
## V1
## n 2 2 1 1 0 0
## Mean, SD 7.5 (2.1) 2.5 (2.1) 3.0 (NA) 7.0 (NA) NA NA
## V2
## n 2 2 1 1 0 0
## Mean, SD 6.50 (2.12) 3.50 (2.12) 2.00 (NA) 8.00 (NA) NA NA
## V3
## n 2 2 1 1 0 0
## Mean, SD 5.50 (2.12) 4.50 (2.12) 1.00 (NA) 9.00 (NA) NA NA
Simulating ‘Baseline Comparison’ In Row Space
my_afun <- function(x, .var, .spl_context) {
n <- sum(!is.na(x))
meanval <- mean(x, na.rm = TRUE)
sdval <- sd(x, na.rm = TRUE)
## get the split value of the most recent parent
## (row) split above this analyze
val <- .spl_context[nrow(.spl_context), "value"]
## we show it if its not a CHG within V1
show_it <- val != "V1" || .var != "CHG"
## do a silly thing to decide the different format precisiosn
## your real logic would go here
valnum <- min(2L, as.integer(gsub("[^[:digit:]]*", "", val)))
fstringpt <- paste0("xx.", strrep("x", valnum))
fmt_mnsd <- if (show_it) sprintf("%s (%s)", fstringpt, fstringpt) else "xx"
in_rows(
n = if (show_it) n, ## NULL otherwise
"Mean, SD" = if (show_it) c(meanval, sdval), ## NULL otherwise
.formats = c(n = "xx", "Mean, SD" = fmt_mnsd)
)
}
lyt <- basic_table() %>%
split_cols_by("ARM") %>%
split_rows_by("AVISIT") %>%
split_cols_by_multivar(vars = c("AVAL", "CHG")) %>%
analyze_colvars(my_afun)
build_table(lyt, dta_test)
## A B C
## AVAL CHG AVAL CHG AVAL CHG
## ———————————————————————————————————————————————————————————————————————————
## V1
## n 2 1 0
## Mean, SD 7.5 (2.1) 3.0 (NA) NA
## V2
## n 2 2 1 1 0 0
## Mean, SD 6.50 (2.12) 3.50 (2.12) 2.00 (NA) 8.00 (NA) NA NA
## V3
## n 2 2 1 1 0 0
## Mean, SD 5.50 (2.12) 4.50 (2.12) 1.00 (NA) 9.00 (NA) NA NA
We can further simulate the formal modeling of reference row(s) using
the extra_args
machinery
my_afun <- function(x, .var, ref_rowgroup, .spl_context) {
n <- sum(!is.na(x))
meanval <- mean(x, na.rm = TRUE)
sdval <- sd(x, na.rm = TRUE)
## get the split value of the most recent parent
## (row) split above this analyze
val <- .spl_context[nrow(.spl_context), "value"]
## we show it if its not a CHG within V1
show_it <- val != ref_rowgroup || .var != "CHG"
fmt_mnsd <- if (show_it) "xx.x (xx.x)" else "xx"
in_rows(
n = if (show_it) n, ## NULL otherwise
"Mean, SD" = if (show_it) c(meanval, sdval), ## NULL otherwise
.formats = c(n = "xx", "Mean, SD" = fmt_mnsd)
)
}
lyt2 <- basic_table() %>%
split_cols_by("ARM") %>%
split_rows_by("AVISIT") %>%
split_cols_by_multivar(vars = c("AVAL", "CHG")) %>%
analyze_colvars(my_afun, extra_args = list(ref_rowgroup = "V1"))
build_table(lyt2, dta_test)
## A B C
## AVAL CHG AVAL CHG AVAL CHG
## —————————————————————————————————————————————————————————————————————
## V1
## n 2 1 0
## Mean, SD 7.5 (2.1) 3.0 (NA) NA
## V2
## n 2 2 1 1 0 0
## Mean, SD 6.5 (2.1) 3.5 (2.1) 2.0 (NA) 8.0 (NA) NA NA
## V3
## n 2 2 1 1 0 0
## Mean, SD 5.5 (2.1) 4.5 (2.1) 1.0 (NA) 9.0 (NA) NA NA