Quick start: `substitute` for NSE
Francois Collin
29.10.2020
Source:vignettes/quickstart_substitute.Rmd
quickstart_substitute.RmdIntroduction
Considering an expression, R usually evaluates it and returns its
value. Instead of focusing on the value, it is also possible to work
with the code which generated the
value. This is where non standard evaluation, or NSE,
starts. The function substitute is an important element of
non-standard evaluation. For instance, if we consider a
defined such as a <- 5, then the expression
a returns 5, the substitute(a) returns the
code to obtain the value: a.
The principle on which relies teal is to:
- generate expressions.
- return the result of the expression in the result panel of the app.
- return the corresponding code (or expression) with
Show R Code.
The expression returning the displayed value must be reactive. The information in the encoding in one hand and the filtering panel on the other hand modify the expression and the displayed value. As so, teal needs to work both on expressions and values and relies heavily on NSE.
The NSE is an advanced notion and mixing it with Shiny app development is a source of difficulties:
- hindered efficient coding as the Shiny app must be run to check the good execution of the code.
- limited possibilities for testing.
As an alternative, it is possible to focus first on the NSE aspects
in plain R, and secondly, only once ready, integrate it in the Shiny
App. Hereafter, are few practical examples demonstrating how NSE works.
The choice was made to focus on substitute.
Basic
Principle
non_evaluated_expression <- substitute(expr = a + b)
non_evaluated_expression
## a + b
eval(non_evaluated_expression)
## Error in eval(non_evaluated_expression): object 'b' not foundWhat happened?
-
substitutereturns the code and not the value, - it even does not try the code, therefore it is possible to return an expression which does not make sense (yet), for instance involving two non defined objects.
- If the values of
aandbexist, the expression can run without error (below).
a <- 1
b <- 5
eval(non_evaluated_expression)
## [1] 6Now, the function name substitute is for a reason. Not
only returning the expression, it also operates
substitutions of some terms within a given expression.
fun <- function(a, b) {
substitute(expr = a + b)
}
non_evaluated_expression <- fun(5, -2)
non_evaluated_expression
## 5 + -2
eval(non_evaluated_expression)
## [1] 3What happened?
- the objects
aandbexist in the function environment wheresubstituteis called. - the terms of the expression within
substitutewere replaced by the values ofaandb.
Indeed, before returning the expression, substitute
verifies if a and b don’t have any value
existing in the evaluation environment. If so, values of a
and b are used in the expression.
It is also possible to use the second argument of
substitute, env, an environment (or a list)
containing objects. If the expression submitted in
substitute has corresponding objects in env,
the terms within the expression will be substituted with provided
values:
non_evaluated_expression <- substitute(
expr = a + b,
env = list(a = 5, b = 5)
)
non_evaluated_expression
## 5 + 5
eval(non_evaluated_expression)
## [1] 10What happened?
- The environment in which the values of
aandbwere taken from was directly declared within thesubstituteexpression (argumentexpr) and the values were substituted (argumentenv). -
substitutereturned a non-evaluated expression, useeval()to evaluate it.
With a slightly more elaborated expression:
non_evaluated_expression <- substitute(
expr = plot(x = x, y = exp(x), main = text),
env = list(x = 0:10, text = "A graph")
)
non_evaluated_expression
## plot(x = 0:10, y = exp(0:10), main = "A graph")
eval(non_evaluated_expression)
Note that:
-
xas an argument name in plot has been preserved, whilexas an object has been replaced.
Replace an object name
In formulas, character strings are not accepted, how to operate the substitution?
# Error expected:
plot_expr <- substitute(
expr = plot(y ~ x, data = iris, main = text),
env = list(
x = Sepal.Length,
y = Sepal.Width,
text = "Iris, again ..."
)
)
## Error in eval(expr, envir, enclos): object 'Sepal.Length' not found
# Error expected:
plot_expr <- substitute(
expr = plot(y ~ x, data = iris, main = text),
env = list(
x = "Sepal.Length",
y = "Sepal.Width",
text = "Iris, again ..."
)
)
plot_expr
## plot("Sepal.Width" ~ "Sepal.Length", data = iris, main = "Iris, again ...")
eval(plot_expr)
## Error in terms.formula(formula, data = data): invalid term in model formulaThe object names have a special name class;
as.names coerces a character string to an object name
(alternatively, as.symbol provides identical result):
plot_expr <- substitute(
expr = plot(y ~ x, data = iris, main = text),
env = list(
x = as.name("Sepal.Length"),
y = as.symbol("Sepal.Width"),
text = "Iris, again ..."
)
)
plot_expr
## plot(Sepal.Width ~ Sepal.Length, data = iris, main = "Iris, again ...")
eval(plot_expr)
What with dataframe name?
Lets imagine a pipe-flavored expression, with df being
the term corresponding to the dataframe which should be substituted:
df %>% plot(y ~ x, data = ., main = text).
The principle exposed above can work directly without addition.
However, df in the expression is then replaced directly by
the value of the object provided and not the expression generating the
dataframe: the pipeline is working but not humanly readable.
library(magrittr)
short_iris <- head(iris)
plot_expr <- substitute(
expr = df %>% plot(y ~ x, data = ., main = text),
env = list(
df = short_iris,
x = as.name("Sepal.Length"),
y = as.symbol("Sepal.Width"),
text = "Iris, again ..."
)
)
eval(plot_expr)
plot_expr
## list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.4), Sepal.Width = c(3.5,
## 3, 3.2, 3.1, 3.6, 3.9), Petal.Length = c(1.4, 1.4, 1.3, 1.5,
## 1.4, 1.7), Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, 0.4), Species = c(1L,
## 1L, 1L, 1L, 1L, 1L)) %>% plot(Sepal.Width ~ Sepal.Length, data = .,
## main = "Iris, again ...")How can we replace the value by the expression generating this value?
That is pretty much the topic of the vignette:
substitute.
plot_expr <- substitute(
expr = df %>% plot(y ~ x, data = ., main = text),
env = list(
df = substitute(iris),
x = as.name("Sepal.Length"),
y = as.symbol("Sepal.Width"),
text = "Iris, again ..."
)
)
plot_expr
## iris %>% plot(Sepal.Width ~ Sepal.Length, data = ., main = "Iris, again ...")
eval(plot_expr)
In a nutshell
-
substituteis relevant when the expression needs to be modified. It takes 2 arguments:-
exprthe expression to be (eventually) substituted. -
envthe environment in which potential replacement value might be needed.
-
- If the replacement value should be slightly more special like:
- an object name (like in formulas
e.g.
y ~ x) then, useas.nameoras.symbol. - a data frame name (like
iris) then, usesubstitute.
- an object name (like in formulas
e.g.
rtables
Direct use of substitute
The substitute approach can be used with the new
rtables pipelines.
Lets prepare an example for reporting data from LB domain. The
example is based on the template LBT01; the target is to
report in columns the lab test result per study arm, as values
(AVAL) and changes from baseline (CHG), per
analysis visit in rows.
The data can be prepared as follows:
library(scda)
##
library(rtables)
## Loading required package: formatters
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tern)
## Registered S3 method overwritten by 'tern':
## method from
## tidy.glm broom
adlb <- synthetic_cdisc_data("latest")$adlb
adlb_f <- adlb %>%
filter(
PARAM == "Alanine Aminotransferase Measurement" &
ARMCD %in% c("ARM A", "ARM B") & AVISIT == "WEEK 1 DAY 8"
)And the rtables expression is obtained as:
rtables_expr <- substitute(
expr = basic_table() %>%
split_cols_by(arm, split_fun = drop_split_levels) %>%
split_rows_by(visit, split_fun = drop_split_levels) %>%
split_cols_by_multivar(
vars = c("AVAL", "CHG"),
varlabels = c("Value", "Change")
) %>%
summarize_colvars() %>%
build_table(df = df),
env = list(
df = substitute(adlb_f),
arm = "ARM",
visit = "AVISIT"
)
)The expression is valid … :
eval(rtables_expr)
## A: Drug X B: Placebo
## Value Change Value Change
## ————————————————————————————————————————————————————————————————————
## WEEK 1 DAY 8
## n 134 134 134 134
## Mean (SD) 19.3 (4.0) -0.5 (5.9) 20.2 (4.0) 0.1 (6.0)
## Median 19.2 -0.5 20.1 -0.7
## Min - Max 8.9 - 27.3 -12.5 - 19.7 5.8 - 28.8 -12.3 - 17.4… but not easily readable …:
rtables_expr
## basic_table() %>% split_cols_by("ARM", split_fun = drop_split_levels) %>%
## split_rows_by("AVISIT", split_fun = drop_split_levels) %>%
## split_cols_by_multivar(vars = c("AVAL", "CHG"), varlabels = c("Value",
## "Change")) %>% summarize_colvars() %>% build_table(df = adlb_f)… but that can be arranged:
library(teal)
#' Stylish code
#'
#' Deparse an expression and display the code following NEST conventions.
#'
#' @param expr (`call`)\cr or possibly understood as so.
#'
styled_expr <- function(expr) {
styler:::print.vertical(
styler::style_text(text = deparse(expr)),
colored = FALSE
)
}
#'
#' @examples
styled_expr(rtables_expr)
## basic_table() %>%
## split_cols_by("ARM", split_fun = drop_split_levels) %>%
## split_rows_by("AVISIT", split_fun = drop_split_levels) %>%
## split_cols_by_multivar(vars = c("AVAL", "CHG"), varlabels = c(
## "Value",
## "Change"
## )) %>%
## summarize_colvars() %>%
## build_table(df = adlb_f)
substitute in a function
Moving further, substitute can actually be wrapped in a
function, this way the rtables pipelines are
programmatically obtained:
rtables_expr <- function(df,
arm,
visit) {
substitute(
expr = basic_table() %>%
split_cols_by(arm, split_fun = drop_split_levels) %>%
split_rows_by(visit, split_fun = drop_split_levels) %>%
split_cols_by_multivar(
vars = c("AVAL", "CHG"),
varlabels = c("Value", "Change")
) %>%
summarize_colvars() %>%
build_table(df = df),
env = list(
df = substitute(df),
arm = arm,
visit = visit
)
)
}
result <- rtables_expr(df = adlb_f, arm = "ARM", visit = "AVISIT")
styled_expr(result)
## basic_table() %>%
## split_cols_by("ARM", split_fun = drop_split_levels) %>%
## split_rows_by("AVISIT", split_fun = drop_split_levels) %>%
## split_cols_by_multivar(vars = c("AVAL", "CHG"), varlabels = c(
## "Value",
## "Change"
## )) %>%
## summarize_colvars() %>%
## build_table(df = adlb_f)
eval(result)
## A: Drug X B: Placebo
## Value Change Value Change
## ————————————————————————————————————————————————————————————————————
## WEEK 1 DAY 8
## n 134 134 134 134
## Mean (SD) 19.3 (4.0) -0.5 (5.9) 20.2 (4.0) 0.1 (6.0)
## Median 19.2 -0.5 20.1 -0.7
## Min - Max 8.9 - 27.3 -12.5 - 19.7 5.8 - 28.8 -12.3 - 17.4- The same results as before are obtained …
- while, fine tuning is easier.
- For instance, the variable designating the study arm and the visit can be changed, which is an expected feature in teal module encoding panel.
result <- rtables_expr(df = adlb_f, arm = "ARMCD", visit = "AVISITN")
eval(result)
## Split var [AVISITN] was not character or factor. Converting to factor
## ARM A ARM B
## Value Change Value Change
## ———————————————————————————————————————————————————————————————————
## 1
## n 134 134 134 134
## Mean (SD) 19.3 (4.0) -0.5 (5.9) 20.2 (4.0) 0.1 (6.0)
## Median 19.2 -0.5 20.1 -0.7
## Min - Max 8.9 - 27.3 -12.5 - 19.7 5.8 - 28.8 -12.3 - 17.4
styled_expr(result)
## basic_table() %>%
## split_cols_by("ARMCD", split_fun = drop_split_levels) %>%
## split_rows_by("AVISITN", split_fun = drop_split_levels) %>%
## split_cols_by_multivar(vars = c("AVAL", "CHG"), varlabels = c(
## "Value",
## "Change"
## )) %>%
## summarize_colvars() %>%
## build_table(df = adlb_f)Chain expressions in a pipeline
It is also possible to manipulate expressions, for instance, expressions might be chained in a pipeline.
#' Expressions as a pipeline
#'
#' Accepts expressions to be chained using the `magrittr` pipeline-flavor.
#' @param ... (`call`)\cr or object which can be interpreted as so.
#' (e.g. `name`)
#'
pipe_expr <- function(...) {
exprs <- unlist(list(...))
exprs <- lapply(
exprs,
function(x) {
x <- deparse(x)
paste(x, collapse = " ")
}
)
exprs <- unlist(exprs)
exprs <- paste(exprs, collapse = " %>% ")
str2lang(exprs)
}
#' @examples
result <- pipe_expr(
expr1 = substitute(df),
expr2 = substitute(head)
)
result
## df %>% head- Expressions can be arranged in a list, this way, it is possible to have conditional editing of expressions.
- In the context of
rtables, layers enclosinganalyzecall handle.statsoption. The lean expression should include the.statsoption, only when the default value is changed. - This is again an expected feature in teal module when rendering the
code with
Show R Code:
rtables_expr <- function(df,
arm,
visit,
.stats = NULL) {
# The rtables layout is decomposed into a list of expressions.
lyt <- list()
# 1. First the columns and rows:
lyt$structure <- substitute(
expr = basic_table() %>%
split_cols_by(arm, split_fun = drop_split_levels) %>%
split_rows_by(visit, split_fun = drop_split_levels) %>%
split_cols_by_multivar(
vars = c("AVAL", "CHG"),
varlabels = c("Value", "Change")
),
env = list(
arm = arm,
visit = visit
)
)
# 2. The analyze layer which depends on the use of .stats.
lyt$analyze <- if (is.null(.stats)) {
substitute(
summarize_colvars()
)
} else {
substitute(
summarize_colvars(.stats = .stats),
list(.stats = .stats)
)
}
# 3. And finishing with rtables::build_table.
lyt$build <- substitute(
build_table(df = df),
list(df = substitute(df))
)
# As previously demonstrated, expressions can be manipulated and
# chained in a pipeline.
pipe_expr(lyt)
}- First application with standard statistics:
result <- rtables_expr(df = adlb_f, arm = "ARM", visit = "AVISIT")
styled_expr(result)
## basic_table() %>%
## split_cols_by("ARM", split_fun = drop_split_levels) %>%
## split_rows_by("AVISIT", split_fun = drop_split_levels) %>%
## split_cols_by_multivar(vars = c("AVAL", "CHG"), varlabels = c(
## "Value",
## "Change"
## )) %>%
## summarize_colvars() %>%
## build_table(df = adlb_f)
eval(result)
## A: Drug X B: Placebo
## Value Change Value Change
## ————————————————————————————————————————————————————————————————————
## WEEK 1 DAY 8
## n 134 134 134 134
## Mean (SD) 19.3 (4.0) -0.5 (5.9) 20.2 (4.0) 0.1 (6.0)
## Median 19.2 -0.5 20.1 -0.7
## Min - Max 8.9 - 27.3 -12.5 - 19.7 5.8 - 28.8 -12.3 - 17.4- Then with statistics specifications:
result <- rtables_expr(
df = adlb_f, arm = "ARM", visit = "AVISIT",
.stats = c("n", "mean_sd")
)
styled_expr(result)
## basic_table() %>%
## split_cols_by("ARM", split_fun = drop_split_levels) %>%
## split_rows_by("AVISIT", split_fun = drop_split_levels) %>%
## split_cols_by_multivar(vars = c("AVAL", "CHG"), varlabels = c(
## "Value",
## "Change"
## )) %>%
## summarize_colvars(.stats = c("n", "mean_sd")) %>%
## build_table(df = adlb_f)
eval(result)
## A: Drug X B: Placebo
## Value Change Value Change
## ———————————————————————————————————————————————————————————————
## WEEK 1 DAY 8
## n 134 134 134 134
## Mean (SD) 19.3 (4.0) -0.5 (5.9) 20.2 (4.0) 0.1 (6.0)Including pre-processing
Finally, it would also be possible to wrap several expressions into a single function.
- For instance, the teal module generally includes a pre-processing section:
rtables_expr <- function(df,
paramcd,
arm,
visit,
.stats = NULL) {
# y is a list which will collect two expressions:
# 1. y$data with the preprocessing steps.
# 2. y$rtables the table layout and build.
y <- list()
# 1. Preprocessing ---
y$data <- substitute(
df <- df %>%
filter(
PARAMCD == paramcd &
ARMCD %in% c("ARM A", "ARM B") & AVISIT == "WEEK 1 DAY 8"
),
list(
df = substitute(df),
paramcd = paramcd
)
)
# 2. rtables layout ---
lyt <- list()
lyt$structure <- substitute(
expr = basic_table() %>%
split_cols_by(arm, split_fun = drop_split_levels) %>%
split_rows_by(visit, split_fun = drop_split_levels) %>%
split_cols_by_multivar(
vars = c("AVAL", "CHG"),
varlabels = c("Value", "Change")
),
env = list(
arm = arm,
visit = visit
)
)
lyt$analyze <- if (is.null(.stats)) {
substitute(
summarize_colvars()
)
} else {
substitute(
summarize_colvars(.stats = .stats),
list(.stats = .stats)
)
}
lyt$build <- substitute(
build_table(df = df),
list(df = substitute(df))
)
y$rtables <- pipe_expr(lyt)
# Finally returns y as a list with two expressions.
y
}It is now possible to modify the studied parameter
(PARAMCD) in addition to the study arm and visit variables
names.
adlb <- synthetic_cdisc_data("latest")$adlb
result <- rtables_expr(
df = adlb, paramcd = "CRP", arm = "ARM", visit = "AVISIT",
.stats = c("n", "mean_sd")
)The two expressions are consistent:
styled_expr(result$data)
## adlb <- adlb %>% filter(PARAMCD == "CRP" & ARMCD %in% c(
## "ARM A",
## "ARM B"
## ) & AVISIT == "WEEK 1 DAY 8")
styled_expr(result$rtables)
## basic_table() %>%
## split_cols_by("ARM", split_fun = drop_split_levels) %>%
## split_rows_by("AVISIT", split_fun = drop_split_levels) %>%
## split_cols_by_multivar(vars = c("AVAL", "CHG"), varlabels = c(
## "Value",
## "Change"
## )) %>%
## summarize_colvars(.stats = c("n", "mean_sd")) %>%
## build_table(df = adlb)The two expressions can be executed and return the
rtables:
result_exec <- mapply(eval, result)
result_exec$rtables
## A: Drug X B: Placebo
## Value Change Value Change
## ————————————————————————————————————————————————————————————
## WEEK 1 DAY 8
## n 134 134 134 134
## Mean (SD) 1.0 (0.2) 0.1 (0.3) 1.0 (0.2) 0.0 (0.3)