Sorting a table at a specific path

Main sorting function to order the sub-structure of a TableTree at a particular path in the table tree.

Usage

sort_at_path(
  tt,
  path,
  scorefun,
  decreasing = NA,
  na.pos = c("omit", "last", "first"),
  .prev_path = character()
)

Arguments

tt: (TableTree or related class)
a TableTree object representing a populated table.
path: (character)
a vector path for a position within the structure of a TableTree. Each element represents a subsequent choice amongst the children of the previous choice.
scorefun: (function)
scoring function. Should accept the type of children directly under the position at path (either VTableTree, VTableRow, or VTableNodeInfo, which covers both) and return a numeric value to be sorted.
decreasing: (flag)
whether the scores generated by scorefun should be sorted in decreasing order. If unset (the default of NA), it is set to TRUE if the generated scores are numeric and FALSE if they are characters.
na.pos: (string)
what should be done with children (sub-trees/rows) with NA scores. Defaults to "omit", which removes them. Other allowed values are "last" and "first", which indicate where NA scores should be placed in the order.
.prev_path: (character)
internal detail, do not set manually.

Value

A TableTree with the same structure as tt with the exception that the requested sorting has been done at path.

Details

sort_at_path, given a path, locates the (sub)table(s) described by the path (see below for handling of the "*" wildcard). For each such subtable, it then calls scorefun on each direct child of the table, using the resulting scores to determine their sorted order. tt is then modified to reflect each of these one or more sorting operations.

In path, a leading "root" element will be ignored, regardless of whether this matches the object name (and thus actual root path name) of tt. Including "root" in paths where it does not match the name of tt may mask deeper misunderstandings of how valid paths within a TableTree object correspond to the layout used to originally declare it, which we encourage users to avoid.

path can include the "wildcard" "*" as a step, which translates roughly to any node/branching element and means that each child at that step will be separately sorted based on scorefun and the remaining path entries. This can occur multiple times in a path.

A list of valid (non-wildcard) paths can be seen in the path column of the data.frame created by formatters::make_row_df() with the visible_only argument set to FALSE. It can also be inferred from the summary given by table_structure().

Note that sorting needs a deeper understanding of table structure in rtables. Please consider reading the related vignette (Sorting and Pruning) and explore table structure with useful functions like table_structure() and row_paths_summary(). It is also very important to understand the difference between "content" rows and "data" rows. The first one analyzes and describes the split variable generally and is generated with summarize_row_groups(), while the second one is commonly produced by calling one of the various analyze() instances.

Built-in score functions are cont_n_allcols() and cont_n_onecol(). They are both working with content rows (coming from summarize_row_groups()) while a custom score function needs to be used on DataRows. Here, some useful descriptor and accessor functions (coming from related vignette):

cell_values() - Retrieves a named list of a TableRow or TableTree object's values.
formatters::obj_name() - Retrieves the name of an object. Note this can differ from the label that is displayed (if any is) when printing.
formatters::obj_label() - Retrieves the display label of an object. Note this can differ from the name that appears in the path.
content_table() - Retrieves a TableTree object's content table (which contains its summary rows).
tree_children() - Retrieves a TableTree object's direct children (either subtables, rows or possibly a mix thereof, though that should not happen in practice).

Examples

# Creating a table to sort

# Function that gives two statistics per table-tree "leaf"
more_analysis_fnc <- function(x) {
  in_rows(
    "median" = median(x),
    "mean" = mean(x),
    .formats = "xx.x"
  )
}

# Main layout of the table
raw_lyt <- basic_table() %>%
  split_cols_by("ARM") %>%
  split_rows_by(
    "RACE",
    split_fun = drop_and_remove_levels("WHITE") # dropping WHITE levels
  ) %>%
  summarize_row_groups() %>%
  split_rows_by("STRATA1") %>%
  summarize_row_groups() %>%
  analyze("AGE", afun = more_analysis_fnc)

# Creating the table and pruning empty and NAs
tbl <- build_table(raw_lyt, DM) %>%
  prune_table()

# Peek at the table structure to understand how it is built
table_structure(tbl)
#> [TableTree] RACE
#>  [TableTree] ASIAN [cont: 1 x 3]
#>   [TableTree] STRATA1
#>    [TableTree] A [cont: 1 x 3]
#>     [ElementaryTable] AGE (2 x 3)
#>    [TableTree] B [cont: 1 x 3]
#>     [ElementaryTable] AGE (2 x 3)
#>    [TableTree] C [cont: 1 x 3]
#>     [ElementaryTable] AGE (2 x 3)
#>  [TableTree] BLACK OR AFRICAN AMERICAN [cont: 1 x 3]
#>   [TableTree] STRATA1
#>    [TableTree] A [cont: 1 x 3]
#>     [ElementaryTable] AGE (2 x 3)
#>    [TableTree] B [cont: 1 x 3]
#>     [ElementaryTable] AGE (2 x 3)
#>    [TableTree] C [cont: 1 x 3]
#>     [ElementaryTable] AGE (2 x 3)

#  Sorting only ASIAN sub-table, or, in other words, sorting STRATA elements for
# the ASIAN group/row-split. This uses content_table() accessor function as it
# is a "ContentRow". In this case, we also base our sorting only on the second column.
sort_at_path(tbl, c("ASIAN", "STRATA1"), cont_n_onecol(2))
#>                             A: Drug X    B: Placebo   C: Combination
#> ————————————————————————————————————————————————————————————————————
#> ASIAN                       79 (65.3%)   68 (64.2%)     84 (65.1%)  
#>   B                         24 (19.8%)   29 (27.4%)     22 (17.1%)  
#>     median                     32.5         32.0           34.0     
#>     mean                       34.1         31.6           34.7     
#>   A                         27 (22.3%)   20 (18.9%)     31 (24.0%)  
#>     median                     30.0         33.0           36.0     
#>     mean                       32.2         33.9           36.8     
#>   C                         28 (23.1%)   19 (17.9%)     31 (24.0%)  
#>     median                     36.5         34.0           33.0     
#>     mean                       36.2         33.0           32.4     
#> BLACK OR AFRICAN AMERICAN   28 (23.1%)   24 (22.6%)     27 (20.9%)  
#>   A                          6 (5.0%)     7 (6.6%)       8 (6.2%)   
#>     median                     32.0         29.0           32.5     
#>     mean                       31.5         28.6           33.6     
#>   B                         10 (8.3%)     6 (5.7%)      12 (9.3%)   
#>     median                     33.0         30.0           33.5     
#>     mean                       35.6         30.8           33.7     
#>   C                         12 (9.9%)    11 (10.4%)      7 (5.4%)   
#>     median                     33.0         36.0           32.0     
#>     mean                       35.5         34.2           35.0     

# Custom scoring function that is working on "DataRow"s
scorefun <- function(tt) {
  # Here we could use browser()
  sum(unlist(row_values(tt))) # Different accessor function
}
# Sorting mean and median for all the AGE leaves!
sort_at_path(tbl, c("RACE", "*", "STRATA1", "*", "AGE"), scorefun)
#>                             A: Drug X    B: Placebo   C: Combination
#> ————————————————————————————————————————————————————————————————————
#> ASIAN                       79 (65.3%)   68 (64.2%)     84 (65.1%)  
#>   A                         27 (22.3%)   20 (18.9%)     31 (24.0%)  
#>     mean                       32.2         33.9           36.8     
#>     median                     30.0         33.0           36.0     
#>   B                         24 (19.8%)   29 (27.4%)     22 (17.1%)  
#>     mean                       34.1         31.6           34.7     
#>     median                     32.5         32.0           34.0     
#>   C                         28 (23.1%)   19 (17.9%)     31 (24.0%)  
#>     median                     36.5         34.0           33.0     
#>     mean                       36.2         33.0           32.4     
#> BLACK OR AFRICAN AMERICAN   28 (23.1%)   24 (22.6%)     27 (20.9%)  
#>   A                          6 (5.0%)     7 (6.6%)       8 (6.2%)   
#>     mean                       31.5         28.6           33.6     
#>     median                     32.0         29.0           32.5     
#>   B                         10 (8.3%)     6 (5.7%)      12 (9.3%)   
#>     mean                       35.6         30.8           33.7     
#>     median                     33.0         30.0           33.5     
#>   C                         12 (9.9%)    11 (10.4%)      7 (5.4%)   
#>     mean                       35.5         34.2           35.0     
#>     median                     33.0         36.0           32.0

Usage

Arguments

Value

Details

See also

Examples