Adding analyzed variables to our table layout defines the primary
tabulation to be performed. We do this by adding calls to analyze
and/or analyze_colvars
into our layout pipeline. As with adding
further splitting, the tabulation will occur at the current/next level of
nesting by default.
Arguments
- lyt
layout object pre-data used for tabulation
- vars
character vector. Multiple variable names.
- afun
function. Analysis function, must take
x
ordf
as its first parameter. Can optionally take other parameters which will be populated by the tabulation framework. See Details inanalyze
.- var_labels
character. Variable labels for 1 or more variables
- table_names
character. Names for the tables representing each atomic analysis. Defaults to
var
.- format
FormatSpec. Format associated with this split. Formats can be declared via strings (
"xx.x"
) or function. In cases such asanalyze
calls, they can character vectors or lists of functions.- na_str
character(1). String that should be displayed when the value of
x
is missing. Defaults to"NA"
.- nested
boolean. Should this layout instruction be applied within the existing layout structure if possible (
TRUE
, the default) or as a new top-level element (`FALSE). Ignored if it would nest a split underneath analyses, which is not allowed.- inclNAs
boolean. Should observations with NA in the
var
variable(s) be included when performing this analysis. Defaults toFALSE
- extra_args
list. Extra arguments to be passed to the tabulation function. Element position in the list corresponds to the children of this split. Named elements in the child-specific lists are ignored if they do not match a formal argument of the tabulation function.
- show_labels
character(1). Should the variable labels for corresponding to the variable(s) in
vars
be visible in the resulting table.- indent_mod
numeric. Modifier for the default indent position for the structure created by this function(subtable, content table, or row) and all of that structure's children. Defaults to 0, which corresponds to the unmodified default behavior.
- section_div
character(1). String which should be repeated as a section divider after each group defined by this split instruction, or
NA_character_
(the default) for no section divider.
Value
A PreDataTableLayouts
object suitable for passing to further
layouting functions, and to build_table
.
Details
When non-NULL format
is used to specify formats for all generated
rows, and can be a character vector, a function, or a list of functions. It
will be repped out to the number of rows once this is known during the
tabulation process, but will be overridden by formats specified within
rcell
calls in afun
.
The analysis function (afun
) should take as its first parameter either
x
or df
. Which of these the function accepts changes the
behavior when tabulation is performed.
If
afun
's first parameter is x, it will receive the corresponding subset vector of data from the relevant column (fromvar
here) of the raw data being used to build the table.If
afun
's first parameter isdf
, it will receive the corresponding subset data.frame (i.e. all columns) of the raw data being tabulated
In addition to differentiation on the first argument, the analysis function can optionally accept a number of other parameters which, if and only if present in the formals will be passed to the function by the tabulation machinery. These are as follows:
- .N_col
column-wise N (column count) for the full column being tabulated within
- .N_total
overall N (all observation count, defined as sum of column counts) for the tabulation
- .N_row
row-wise N (row group count) for the group of observations being analyzed (ie with no column-based subsetting)
- .df_row
data.frame for observations in the row group being analyzed (ie with no column-based subsetting)
- .var
variable that is analyzed
- .ref_group
data.frame or vector of subset corresponding to the
ref_group
column including subsetting defined by row-splitting. Optional and only required/meaningful if aref_group
column has been defined- .ref_full
data.frame or vector of subset corresponding to the
ref_group
column without subsetting defined by row-splitting. Optional and only required/meaningful if aref_group
column has been defined- .in_ref_col
boolean indicates if calculation is done for cells within the reference column
- .spl_context
data.frame, each row gives information about a previous/'ancestor' split state. see below
Note
None of the arguments described in the Details section
can be overridden via extra_args or when calling
make_afun
. .N_col
and .N_total
can
be overridden via the col_counts
argument to
build_table
. Alternative values for the others
must be calculated within afun
based on a combination
of extra arguments and the unmodified values provided by the
tabulation framework.
.spl_context Details
The .spl_context
data.frame
gives information about the subsets of data
corresponding to the splits within-which the current analyze
action is
nested. Taken together, these correspond to the path that the resulting (set
of) rows the analysis function is creating, although the information is in a
slightly different form. Each split (which correspond to groups of rows in
the resulting table), as well as the initial 'root' "split", is represented
via the following columns:
- split
The name of the split (often the variable being split in the simple case)
- value
The string representation of the value at that split
- full_parent_df
a dataframe containing the full data (ie across all columns) corresponding to the path defined by the combination of
split
andvalue
of this row and all rows above this row- all_cols_n
the number of observations corresponding to this row grouping (union of all columns)
- (row-split and analyze contexts only) <1 column for each column in the table structure
These list columns (named the same as
names(col_exprs(tab))
) contain logical vectors corresponding to the subset of this row'sfull_parent_df
corresponding to that column- cur_col_subset
List column containing logical vectors indicating the subset of that row's
full_parent_df
for the column currently being created by the analysis function- cur_col_n
integer column containing the observation counts for that split
note Within analysis functions that accept .spl_context
, the
all_cols_n
and cur_col_n
columns of the dataframe will contain the 'true'
observation counts corresponding to the row-group and row-group x column
subsets of the data. These numbers will not, and currently cannot, reflect
alternate column observation counts provided by the alt_counts_df
,
col_counts
or col_total
arguments to build_table
Examples
lyt <- basic_table() %>%
split_cols_by("ARM") %>%
analyze("AGE", afun = list_wrap_x(summary) , format = "xx.xx")
lyt
#> A Pre-data Table Layout
#>
#> Column-Split Structure:
#> ARM (lvls)
#>
#> Row-Split Structure:
#> AGE (** analysis **)
#>
tbl <- build_table(lyt, DM)
tbl
#> A: Drug X B: Placebo C: Combination
#> —————————————————————————————————————————————————
#> Min. 20.00 21.00 22.00
#> 1st Qu. 29.00 29.00 30.00
#> Median 33.00 32.00 33.00
#> Mean 34.91 33.02 34.57
#> 3rd Qu. 39.00 37.00 38.00
#> Max. 60.00 55.00 53.00
lyt2 <- basic_table() %>%
split_cols_by("Species") %>%
analyze(head(names(iris), -1), afun = function(x) {
list(
"mean / sd" = rcell(c(mean(x), sd(x)), format = "xx.xx (xx.xx)"),
"range" = rcell(diff(range(x)), format = "xx.xx")
)
})
lyt2
#> A Pre-data Table Layout
#>
#> Column-Split Structure:
#> Species (lvls)
#>
#> Row-Split Structure:
#> Sepal.Length:Sepal.Width:Petal.Length:Petal.Width (** multivar analysis **)
#>
tbl2 <- build_table(lyt2, iris)
tbl2
#> setosa versicolor virginica
#> ——————————————————————————————————————————————————————
#> Sepal.Length
#> mean / sd 5.01 (0.35) 5.94 (0.52) 6.59 (0.64)
#> range 1.50 2.10 3.00
#> Sepal.Width
#> mean / sd 3.43 (0.38) 2.77 (0.31) 2.97 (0.32)
#> range 2.10 1.40 1.60
#> Petal.Length
#> mean / sd 1.46 (0.17) 4.26 (0.47) 5.55 (0.55)
#> range 0.90 2.10 2.40
#> Petal.Width
#> mean / sd 0.25 (0.11) 1.33 (0.20) 2.03 (0.27)
#> range 0.50 0.80 1.10