Skip to contents

[Stable]

Utility functions to get valid statistic methods for different method groups (.stats) and their associated formats (.formats), labels (.labels), and indent modifiers (.indent_mods). This utility is used across tern, but some of its working principles can be seen in analyze_vars(). See notes to understand why this is experimental.

Usage

get_stats(
  method_groups = "analyze_vars_numeric",
  stats_in = NULL,
  add_pval = FALSE
)

get_stat_names(stat_results, stat_names_in = NULL)

get_formats_from_stats(stats, formats_in = NULL)

get_labels_from_stats(stats, labels_in = NULL, levels_per_stats = NULL)

get_indents_from_stats(stats, indents_in = NULL, row_nms = NULL)

tern_default_stats

tern_default_formats

tern_default_labels

summary_formats(type = "numeric", include_pval = FALSE)

summary_labels(type = "numeric", include_pval = FALSE)

Format

  • tern_default_stats is a named list of available statistics, with each element named for their corresponding statistical method group.

  • tern_default_formats is a named vector of available default formats, with each element named for their corresponding statistic.

  • tern_default_labels is a named character vector of available default labels, with each element named for their corresponding statistic.

Arguments

method_groups

(character)
indicates the statistical method group (tern analyze function) to retrieve default statistics for. A character vector can be used to specify more than one statistical method group.

stats_in

(character)
statistics to retrieve for the selected method group.

add_pval

(flag)
should "pval" (or "pval_counts" if method_groups contains "analyze_vars_counts") be added to the statistical methods?

stat_results

(list)
list of statistical results. It should be used close to the end of a statistical function. See examples for a structure with two statistical results and two groups.

stat_names_in

(character)
custom modification of statistical values.

stats

(character)
statistical methods to get defaults for.

formats_in

(named vector)
inserted formats to replace defaults. It can be a character vector from formatters::list_valid_format_labels() or a custom format function.

labels_in

(named character)
inserted labels to replace defaults.

levels_per_stats

(named list of character or NULL)
Levels of a factor or character variable, each of which the statistics in .stats will be calculated for. If this parameter is set, these variable levels will be used as the defaults, and the names of the given custom values should correspond to levels (or have format statistic.level) instead of statistics. Can also be variable names if rows correspond to different variables instead of levels. Defaults to NULL.

indents_in

(named vector)
inserted indent modifiers to replace defaults (default is 0L).

row_nms

(character)
See levels_per_stats. Deprecation cycle started.

type

(string)
"numeric" or "counts".

include_pval

(flag)
same as the add_pval argument in get_stats().

Value

  • get_stats() returns a character vector of statistical methods.

  • get_stat_names() returns a named list ofcharacter vectors, indicating the names of statistical outputs.

  • get_labels_from_stats() returns a named character vector of labels (if present in either tern_default_labels or labels_in, otherwise NULL).

  • get_indents_from_stats() returns a single indent modifier value to apply to all rows or a named numeric vector of indent modifiers (if present, otherwise NULL).

  • summary_formats() returns a named vector of default statistic formats for the given data type.

  • summary_labels returns a named vector of default statistic labels for the given data type.

Details

Current choices for type are counts and numeric for analyze_vars() and affect get_stats().

summary_* quick get functions for labels or formats uses get_stats and get_labels_from_stats or get_formats_from_stats respectively to retrieve relevant information.

Functions

  • get_stats(): Get statistics available for a given method group (analyze function). To check available defaults see tern::tern_default_stats list.

  • get_stat_names(): Get statistical NAMES available for a given method group (analyze function). Please use the s_* functions to get the statistical names.

  • get_formats_from_stats(): Get formats corresponding to a list of statistics. To check available defaults see tern::tern_default_formats list.

  • get_labels_from_stats(): Get labels corresponding to a list of statistics. To check for available defaults see tern::tern_default_labels list. If not available there, the statistics name will be used as label.

  • get_indents_from_stats(): Format indent modifiers for a given vector/list of statistics. It defaults to 0L for all values.

  • tern_default_stats: Named list of available statistics by method group for tern.

  • tern_default_formats: Named vector of default formats for tern.

  • tern_default_labels: Named character vector of default labels for tern.

  • summary_formats(): [Stable] Quick function to retrieve default formats for summary statistics: analyze_vars() and analyze_vars_in_cols() principally.

  • summary_labels(): [Stable] Quick function to retrieve default labels for summary statistics. Returns labels of descriptive statistics which are understood by rtables. Similar to summary_formats.

Note

These defaults are experimental because we use the names of functions to retrieve the default statistics. This should be generalized in groups of methods according to more reasonable groupings.

Formats in tern and rtables can be functions that take in the table cell value and return a string. This is well documented in vignette("custom_appearance", package = "rtables").

Examples

# analyze_vars is numeric
num_stats <- get_stats("analyze_vars_numeric") # also the default

# Other type
cnt_stats <- get_stats("analyze_vars_counts")

# Weirdly taking the pval from count_occurrences
only_pval <- get_stats("count_occurrences", add_pval = TRUE, stats_in = "pval")

# All count_occurrences
all_cnt_occ <- get_stats("count_occurrences")

# Multiple
get_stats(c("count_occurrences", "analyze_vars_counts"))
#> [1] "count"                   "count_fraction"         
#> [3] "count_fraction_fixed_dp" "fraction"               
#> [5] "n"                       "n_blq"                  

stat_results <- list(
  "n" = list("M" = 1, "F" = 2),
  "count_fraction" = list("M" = c(1, 0.2), "F" = c(2, 0.1))
)
get_stat_names(stat_results)
#> $n
#> [1] "M" "F"
#> 
#> $count_fraction
#> [1] "M" "F"
#> 
get_stat_names(stat_results, list("n" = "argh"))
#> $n
#> [1] "argh"
#> 
#> $count_fraction
#> [1] "M" "F"
#> 

# Defaults formats
get_formats_from_stats(num_stats)
#> $n
#> [1] "xx."
#> 
#> $sum
#> [1] "xx.x"
#> 
#> $mean
#> [1] "xx.x"
#> 
#> $sd
#> [1] "xx.x"
#> 
#> $se
#> [1] "xx.x"
#> 
#> $mean_sd
#> [1] "xx.x (xx.x)"
#> 
#> $mean_se
#> [1] "xx.x (xx.x)"
#> 
#> $mean_ci
#> [1] "(xx.xx, xx.xx)"
#> 
#> $mean_sei
#> [1] "(xx.xx, xx.xx)"
#> 
#> $mean_sdi
#> [1] "(xx.xx, xx.xx)"
#> 
#> $mean_pval
#> [1] "x.xxxx | (<0.0001)"
#> 
#> $median
#> [1] "xx.x"
#> 
#> $mad
#> [1] "xx.x"
#> 
#> $median_ci
#> [1] "(xx.xx, xx.xx)"
#> 
#> $quantiles
#> [1] "xx.x - xx.x"
#> 
#> $iqr
#> [1] "xx.x"
#> 
#> $range
#> [1] "xx.x - xx.x"
#> 
#> $min
#> [1] "xx.x"
#> 
#> $max
#> [1] "xx.x"
#> 
#> $median_range
#> [1] "xx.x (xx.x - xx.x)"
#> 
#> $cv
#> [1] "xx.x"
#> 
#> $geom_mean
#> [1] "xx.x"
#> 
#> $geom_mean_ci
#> [1] "(xx.xx, xx.xx)"
#> 
#> $geom_cv
#> [1] "xx.x"
#> 
#> $median_ci_3d
#> [1] "xx.xx (xx.xx - xx.xx)"
#> 
#> $mean_ci_3d
#> [1] "xx.xx (xx.xx - xx.xx)"
#> 
#> $geom_mean_ci_3d
#> [1] "xx.xx (xx.xx - xx.xx)"
#> 
get_formats_from_stats(cnt_stats)
#> $n
#> [1] "xx."
#> 
#> $count
#> [1] "xx."
#> 
#> $count_fraction
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else {
#>     paste0(x[1], " (", round(x[2] * 100, 1), "%)")
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $count_fraction_fixed_dp
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else if (.is_equal_float(x[2], 1)) {
#>     sprintf("%d (100%%)", x[1])
#>   } else {
#>     sprintf("%d (%.1f%%)", x[1], x[2] * 100)
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $fraction
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#>   checkmate::assert_vector(x)
#>   checkmate::assert_count(x["num"])
#>   checkmate::assert_count(x["denom"])
#> 
#>   result <- if (x["num"] == 0) {
#>     paste0(x["num"], "/", x["denom"])
#>   } else {
#>     paste0(
#>       x["num"], "/", x["denom"],
#>       " (", sprintf("%.1f", round(x["num"] / x["denom"] * 100, 1)), "%)"
#>     )
#>   }
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $n_blq
#> [1] "xx."
#> 
get_formats_from_stats(only_pval)
#> $pval
#> [1] "x.xxxx | (<0.0001)"
#> 
get_formats_from_stats(all_cnt_occ)
#> $count
#> [1] "xx."
#> 
#> $count_fraction
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else {
#>     paste0(x[1], " (", round(x[2] * 100, 1), "%)")
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $count_fraction_fixed_dp
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else if (.is_equal_float(x[2], 1)) {
#>     sprintf("%d (100%%)", x[1])
#>   } else {
#>     sprintf("%d (%.1f%%)", x[1], x[2] * 100)
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $fraction
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#>   checkmate::assert_vector(x)
#>   checkmate::assert_count(x["num"])
#>   checkmate::assert_count(x["denom"])
#> 
#>   result <- if (x["num"] == 0) {
#>     paste0(x["num"], "/", x["denom"])
#>   } else {
#>     paste0(
#>       x["num"], "/", x["denom"],
#>       " (", sprintf("%.1f", round(x["num"] / x["denom"] * 100, 1)), "%)"
#>     )
#>   }
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 

# Addition of customs
get_formats_from_stats(all_cnt_occ, formats_in = c("fraction" = c("xx")))
#> $count
#> [1] "xx."
#> 
#> $count_fraction
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else {
#>     paste0(x[1], " (", round(x[2] * 100, 1), "%)")
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $count_fraction_fixed_dp
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else if (.is_equal_float(x[2], 1)) {
#>     sprintf("%d (100%%)", x[1])
#>   } else {
#>     sprintf("%d (%.1f%%)", x[1], x[2] * 100)
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $fraction
#> [1] "xx"
#> 
get_formats_from_stats(all_cnt_occ, formats_in = list("fraction" = c("xx.xx", "xx")))
#> $count
#> [1] "xx."
#> 
#> $count_fraction
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else {
#>     paste0(x[1], " (", round(x[2] * 100, 1), "%)")
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $count_fraction_fixed_dp
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else if (.is_equal_float(x[2], 1)) {
#>     sprintf("%d (100%%)", x[1])
#>   } else {
#>     sprintf("%d (%.1f%%)", x[1], x[2] * 100)
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $fraction
#> [1] "xx.xx" "xx"   
#> 

# Defaults labels
get_labels_from_stats(num_stats)
#>                             n                           sum 
#>                           "n"                         "Sum" 
#>                          mean                            sd 
#>                        "Mean"                          "SD" 
#>                            se                       mean_sd 
#>                          "SE"                   "Mean (SD)" 
#>                       mean_se                       mean_ci 
#>                   "Mean (SE)"                 "Mean 95% CI" 
#>                      mean_sei                      mean_sdi 
#>               "Mean -/+ 1xSE"               "Mean -/+ 1xSD" 
#>                     mean_pval                        median 
#> "Mean p-value (H0: mean = 0)"                      "Median" 
#>                           mad                     median_ci 
#>   "Median Absolute Deviation"               "Median 95% CI" 
#>                     quantiles                           iqr 
#>             "25% and 75%-ile"                         "IQR" 
#>                         range                           min 
#>                   "Min - Max"                     "Minimum" 
#>                           max                  median_range 
#>                     "Maximum"          "Median (Min - Max)" 
#>                            cv                     geom_mean 
#>                      "CV (%)"              "Geometric Mean" 
#>                  geom_mean_ci                       geom_cv 
#>       "Geometric Mean 95% CI"         "CV % Geometric Mean" 
#>                  median_ci_3d                    mean_ci_3d 
#>             "Median (95% CI)"               "Mean (95% CI)" 
#>               geom_mean_ci_3d 
#>     "Geometric Mean (95% CI)" 
get_labels_from_stats(cnt_stats)
#>                         n                     count            count_fraction 
#>                       "n"                   "count"          "count_fraction" 
#>   count_fraction_fixed_dp                  fraction                     n_blq 
#> "count_fraction_fixed_dp"                "fraction"                   "n_blq" 
get_labels_from_stats(only_pval)
#>               pval 
#> "p-value (t-test)" 
get_labels_from_stats(all_cnt_occ)
#>                     count            count_fraction   count_fraction_fixed_dp 
#>                   "count"          "count_fraction" "count_fraction_fixed_dp" 
#>                  fraction 
#>                "fraction" 

# Addition of customs
get_labels_from_stats(all_cnt_occ, labels_in = c("fraction" = "Fraction"))
#>                     count            count_fraction   count_fraction_fixed_dp 
#>                   "count"          "count_fraction" "count_fraction_fixed_dp" 
#>                  fraction 
#>                "Fraction" 
get_labels_from_stats(all_cnt_occ, labels_in = list("fraction" = c("Some more fractions")))
#>                     count            count_fraction   count_fraction_fixed_dp 
#>                   "count"          "count_fraction" "count_fraction_fixed_dp" 
#>                  fraction 
#>     "Some more fractions" 

get_indents_from_stats(all_cnt_occ, indents_in = 3L)
#> [1] 3 3 3 3
get_indents_from_stats(all_cnt_occ, indents_in = list(count = 2L, count_fraction = 5L))
#> $count
#> [1] 2
#> 
#> $count_fraction
#> [1] 5
#> 
#> $count_fraction_fixed_dp
#> [1] 0
#> 
#> $fraction
#> [1] 0
#> 
get_indents_from_stats(
  all_cnt_occ,
  indents_in = list(a = 2L, count.a = 1L, count.b = 5L), row_nms = c("a", "b")
)
#> $count.a
#> [1] 1
#> 
#> $count.b
#> [1] 5
#> 
#> $count_fraction.a
#> [1] 2
#> 
#> $count_fraction.b
#> [1] 0
#> 
#> $count_fraction_fixed_dp.a
#> [1] 2
#> 
#> $count_fraction_fixed_dp.b
#> [1] 0
#> 
#> $fraction.a
#> [1] 2
#> 
#> $fraction.b
#> [1] 0
#> 

summary_formats()
#> $n
#> [1] "xx."
#> 
#> $sum
#> [1] "xx.x"
#> 
#> $mean
#> [1] "xx.x"
#> 
#> $sd
#> [1] "xx.x"
#> 
#> $se
#> [1] "xx.x"
#> 
#> $mean_sd
#> [1] "xx.x (xx.x)"
#> 
#> $mean_se
#> [1] "xx.x (xx.x)"
#> 
#> $mean_ci
#> [1] "(xx.xx, xx.xx)"
#> 
#> $mean_sei
#> [1] "(xx.xx, xx.xx)"
#> 
#> $mean_sdi
#> [1] "(xx.xx, xx.xx)"
#> 
#> $mean_pval
#> [1] "x.xxxx | (<0.0001)"
#> 
#> $median
#> [1] "xx.x"
#> 
#> $mad
#> [1] "xx.x"
#> 
#> $median_ci
#> [1] "(xx.xx, xx.xx)"
#> 
#> $quantiles
#> [1] "xx.x - xx.x"
#> 
#> $iqr
#> [1] "xx.x"
#> 
#> $range
#> [1] "xx.x - xx.x"
#> 
#> $min
#> [1] "xx.x"
#> 
#> $max
#> [1] "xx.x"
#> 
#> $median_range
#> [1] "xx.x (xx.x - xx.x)"
#> 
#> $cv
#> [1] "xx.x"
#> 
#> $geom_mean
#> [1] "xx.x"
#> 
#> $geom_mean_ci
#> [1] "(xx.xx, xx.xx)"
#> 
#> $geom_cv
#> [1] "xx.x"
#> 
#> $median_ci_3d
#> [1] "xx.xx (xx.xx - xx.xx)"
#> 
#> $mean_ci_3d
#> [1] "xx.xx (xx.xx - xx.xx)"
#> 
#> $geom_mean_ci_3d
#> [1] "xx.xx (xx.xx - xx.xx)"
#> 
summary_formats(type = "counts", include_pval = TRUE)
#> $n
#> [1] "xx."
#> 
#> $count
#> [1] "xx."
#> 
#> $count_fraction
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else {
#>     paste0(x[1], " (", round(x[2] * 100, 1), "%)")
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $count_fraction_fixed_dp
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else if (.is_equal_float(x[2], 1)) {
#>     sprintf("%d (100%%)", x[1])
#>   } else {
#>     sprintf("%d (%.1f%%)", x[1], x[2] * 100)
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $fraction
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#>   checkmate::assert_vector(x)
#>   checkmate::assert_count(x["num"])
#>   checkmate::assert_count(x["denom"])
#> 
#>   result <- if (x["num"] == 0) {
#>     paste0(x["num"], "/", x["denom"])
#>   } else {
#>     paste0(
#>       x["num"], "/", x["denom"],
#>       " (", sprintf("%.1f", round(x["num"] / x["denom"] * 100, 1)), "%)"
#>     )
#>   }
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $n_blq
#> [1] "xx."
#> 
#> $pval_counts
#> [1] "x.xxxx | (<0.0001)"
#> 

summary_labels()
#>                             n                           sum 
#>                           "n"                         "Sum" 
#>                          mean                            sd 
#>                        "Mean"                          "SD" 
#>                            se                       mean_sd 
#>                          "SE"                   "Mean (SD)" 
#>                       mean_se                       mean_ci 
#>                   "Mean (SE)"                 "Mean 95% CI" 
#>                      mean_sei                      mean_sdi 
#>               "Mean -/+ 1xSE"               "Mean -/+ 1xSD" 
#>                     mean_pval                        median 
#> "Mean p-value (H0: mean = 0)"                      "Median" 
#>                           mad                     median_ci 
#>   "Median Absolute Deviation"               "Median 95% CI" 
#>                     quantiles                           iqr 
#>             "25% and 75%-ile"                         "IQR" 
#>                         range                           min 
#>                   "Min - Max"                     "Minimum" 
#>                           max                  median_range 
#>                     "Maximum"          "Median (Min - Max)" 
#>                            cv                     geom_mean 
#>                      "CV (%)"              "Geometric Mean" 
#>                  geom_mean_ci                       geom_cv 
#>       "Geometric Mean 95% CI"         "CV % Geometric Mean" 
#>                  median_ci_3d                    mean_ci_3d 
#>             "Median (95% CI)"               "Mean (95% CI)" 
#>               geom_mean_ci_3d 
#>     "Geometric Mean (95% CI)" 
summary_labels(type = "counts", include_pval = TRUE)
#>                            n                        count 
#>                          "n"                      "count" 
#>               count_fraction      count_fraction_fixed_dp 
#>             "count_fraction"    "count_fraction_fixed_dp" 
#>                     fraction                        n_blq 
#>                   "fraction"                      "n_blq" 
#>                  pval_counts 
#> "p-value (chi-squared test)"