Create a Custom Splitting Function
Arguments
- pre
list. Zero or more functions which operate on the incoming data and return a new data frame that should split via
core_split
. They will be called on the data in the order they appear in the list.- core_split
function or NULL. If not NULL, a function which accepts the same arguments do_base_split does, and returns the same type of named list. Custom functions which override this behavior cannot be used in column splits.
- post
list. Zero or more functions which should be called on the list output by splitting.
Details
Custom split functions can be thought of as (up to) 3 different types of manipulations of the splitting process
Preprocessing of the incoming data to be split
(Row-splitting only) Customization of the core mapping of incoming data to facets, and
Postprocessing operations on the set of facets (groups) generated by the split.
This function provides an interface to create custom split functions by implementing and specifying sets of operations in each of those classes of customization independently.
Preprocessing functions (1), must accept: df
, spl
, vals
,
labels
, and can optionally accept .spl_context
. They then
manipulate df
(the incoming data for the split) and return a
modified data.frame. This modified data.frame must contain all
columns present in the incoming data.frame, but can add columns if
necessary (though we note that these new columns cannot be used in
the layout as split or analysis variables, because they will not be
present when validity checking is done).
The preprocessing component is useful for things such as manipulating factor levels, e.g., to trim unobserved ones or to reorder levels based on observed counts, etc.
Customization of core splitting (2) is currently only supported in
row splits. Core splitting functions override the fundamental
splitting procedure, and are only necessary in rare cases. These
must accept spl
, df
, vals
, labels
, and can optionally
accept .spl_context
. They must return a named list with elements,
all of the same length, as follows: - datasplit
(containing a
list of data.frames), - values
containing values associated with
the facets, which must be character
or SplitValue
objects. These values will appear in the paths of the resulting
table. - labels
containing the character labels associated with
values
Postprocessing functions (3) must accept the result of the core
split as their first argument (which as of writing can be
anything), in addition to spl
, and fulldf
, and can optionally
accept .spl_context
. They must each return a modified version of
the same structure specified above for core splitting.
In both the pre- and post-processing cases, multiple functions can
be specified. When this happens, they are applied sequentially, in
the order they appear in the list passed to the relevant argument
(pre
and post
, respectively).
See also
custom_split_funs for a more detailed discussion on what custom split functions do.
Other make_custom_split:
add_combo_facet()
,
drop_facet_levels()
,
make_split_result()
,
trim_levels_in_facets()
Examples
mysplitfun <- make_split_fun(pre = list(drop_facet_levels),
post = list(add_overall_facet("ALL", "All Arms")))
basic_table(show_colcounts = TRUE) %>%
split_cols_by("ARM", split_fun = mysplitfun) %>%
analyze("AGE") %>%
build_table(subset(DM, ARM %in% c("B: Placebo", "C: Combination")))
#> B: Placebo C: Combination All Arms
#> (N=106) (N=129) (N=235)
#> —————————————————————————————————————————————
#> Mean 33.02 34.57 33.87
## post (and pre) arguments can take multiple functions, here
## we add an overall facet and the reorder the facets
reorder_facets <- function(splret, spl, fulldf, ...) {
ord <- order(names(splret$values))
make_split_result(splret$values[ord],
splret$datasplit[ord],
splret$labels[ord])
}
mysplitfun2 <- make_split_fun(pre = list(drop_facet_levels),
post = list(add_overall_facet("ALL", "All Arms"),
reorder_facets))
basic_table(show_colcounts = TRUE) %>%
split_cols_by("ARM", split_fun = mysplitfun2) %>%
analyze("AGE") %>%
build_table(subset(DM, ARM %in% c("B: Placebo", "C: Combination")))
#> All Arms B: Placebo C: Combination
#> (N=235) (N=106) (N=129)
#> —————————————————————————————————————————————
#> Mean 33.87 33.02 34.57
very_stupid_core <- function(spl, df, vals, labels, .spl_context) {
make_split_result(c("stupid", "silly"),
datasplit = list(df[1:10,], df[11:30,]),
labels = c("first 10", "second 20"))
}
dumb_30_facet <- add_combo_facet("dumb",
label = "thirty patients",
levels = c("stupid", "silly"))
nonsense_splfun <- make_split_fun(core_split = very_stupid_core,
post = list(dumb_30_facet))
## recall core split overriding is not supported in column space
## currently, but we can see it in action in row space
lyt_silly <- basic_table() %>%
split_rows_by("ARM", split_fun = nonsense_splfun) %>%
summarize_row_groups() %>%
analyze("AGE")
silly_table <- build_table(lyt_silly, DM)
silly_table
#> all obs
#> ———————————————————————————
#> first 10 10 (2.8%)
#> Mean 31.10
#> second 20 20 (5.6%)
#> Mean 34.25
#> thirty patients 30 (8.4%)
#> Mean 33.20