Skip to contents

[Experimental]

Utility functions to get valid statistic methods for different method groups (.stats) and their associated formats (.formats) and labels (.labels). This utility is used across tern, but some of its working principles can be seen in analyze_vars(). See notes to understand why this is experimental.

Usage

get_stats(
  method_groups = "analyze_vars_numeric",
  stats_in = NULL,
  add_pval = FALSE
)

get_formats_from_stats(stats, formats_in = NULL)

get_labels_from_stats(stats, labels_in = NULL)

tern_default_formats

tern_default_labels

summary_formats(type = "numeric", include_pval = FALSE)

summary_labels(type = "numeric", include_pval = FALSE)

summary_custom(
  type = "numeric",
  include_pval = FALSE,
  stats_custom = NULL,
  formats_custom = NULL,
  labels_custom = NULL,
  indent_mods_custom = NULL
)

Format

  • tern_default_formats is a list of available formats, named after their relevant statistic.

  • tern_default_labels is a character vector of available labels, named after their relevant statistic.

Arguments

method_groups

(character)
indicates the group of statistical methods that we need the defaults from. A character vector can be used to collect more than one group of statistical methods.

stats_in

(character)
desired stats to be picked out from the selected method group.

add_pval

(flag)
should "pval" or "pval_counts" (if method_groups contains "analyze_vars_counts") be added to the statistical methods?

stats

(character)
statistical methods to get defaults formats or labels for.

formats_in

(named vector)
inserted formats to replace defaults. It can be a character vector from formatters::list_valid_format_labels() or a custom format function.

labels_in

(named vector)
inserted labels to replace defaults.

type

(flag)
is it going to be "numeric" or "counts"?

include_pval

(flag)
deprecated parameter. Same as add_pval.

stats_custom

(named vector of character)
vector of statistics to include if not the defaults. This argument overrides include_pval and other custom value arguments such that only settings for these statistics will be returned.

formats_custom

(named vector of character)
vector of custom statistics formats to use in place of the defaults defined in summary_formats(). Names should be a subset of the statistics defined in stats_custom (or default statistics if this is NULL).

labels_custom

(named vector of character)
vector of custom statistics labels to use in place of the defaults defined in summary_labels(). Names should be a subset of the statistics defined in stats_custom (or default statistics if this is NULL).

indent_mods_custom

(integer or named vector of integer)
vector of custom indentation modifiers for statistics to use instead of the default of 0L for all statistics. Names should be a subset of the statistics defined in stats_custom (or default statistics if this is NULL). Alternatively, the same indentation modifier can be applied to all statistics by setting indent_mods_custom to a single integer value.

Value

  • get_stats() returns a character vector with all default statistical methods.

  • get_labels_from_stats() returns a named character vector of default labels (if present otherwise NULL).

  • summary_formats() returns a named vector of default statistic formats for the given data type.

  • summary_labels returns a named vector of default statistic labels for the given data type.

  • summary_custom returns a list of 4 named elements: stats, formats, labels, and indent_mods.

Details

Current choices for type are counts and numeric for analyze_vars() and affect get_stats().

Functions

  • get_stats(): Get defaults statistical methods for different groups of methods.

  • get_formats_from_stats(): Get formats from vector of statistical methods. If not present NULL is returned.

  • get_labels_from_stats(): Get labels from vector of statistical methods.

  • tern_default_formats: Named list of default formats for tern.

  • tern_default_labels: character vector that contains default labels for tern.

  • summary_formats(): Quick function to retrieve default formats for summary statistics: analyze_vars() and analyze_vars_in_cols() principally.

  • summary_labels(): Quick function to retrieve default labels for summary statistics. Returns labels of descriptive statistics which are understood by rtables. Similar to summary_formats

  • summary_custom(): [Deprecated] Function to configure settings for default or custom summary statistics for a given data type. In addition to selecting a custom subset of statistics, the user can also set custom formats, labels, and indent modifiers for any of these statistics.

Note

These defaults are experimental because we use the names of functions to retrieve the default statistics. This should be generalized in groups of methods according to more reasonable groupings.

Formats in tern and rtables can be functions that take in the table cell value and return a string. This is well documented in vignette("custom_appearance", package = "rtables").

Examples

# analyze_vars is numeric
num_stats <- get_stats("analyze_vars_numeric") # also the default

# Other type
cnt_stats <- get_stats("analyze_vars_counts")

# Weirdly taking the pval from count_occurrences
only_pval <- get_stats("count_occurrences", add_pval = TRUE, stats_in = "pval")

# All count_occurrences
all_cnt_occ <- get_stats("count_occurrences")

# Multiple
get_stats(c("count_occurrences", "analyze_vars_counts"))
#> [1] "count"                   "count_fraction_fixed_dp"
#> [3] "fraction"                "n"                      
#> [5] "count_fraction"          "n_blq"                  

# Defaults formats
get_formats_from_stats(num_stats)
#> $n
#> [1] "xx."
#> 
#> $sum
#> [1] "xx.x"
#> 
#> $mean
#> [1] "xx.x"
#> 
#> $sd
#> [1] "xx.x"
#> 
#> $se
#> [1] "xx.x"
#> 
#> $mean_sd
#> [1] "xx.x (xx.x)"
#> 
#> $mean_se
#> [1] "xx.x (xx.x)"
#> 
#> $mean_ci
#> [1] "(xx.xx, xx.xx)"
#> 
#> $mean_sei
#> [1] "(xx.xx, xx.xx)"
#> 
#> $mean_sdi
#> [1] "(xx.xx, xx.xx)"
#> 
#> $mean_pval
#> [1] "xx.xx"
#> 
#> $median
#> [1] "xx.x"
#> 
#> $mad
#> [1] "xx.x"
#> 
#> $median_ci
#> [1] "(xx.xx, xx.xx)"
#> 
#> $quantiles
#> [1] "xx.x - xx.x"
#> 
#> $iqr
#> [1] "xx.x"
#> 
#> $range
#> [1] "xx.x - xx.x"
#> 
#> $min
#> [1] "xx.x"
#> 
#> $max
#> [1] "xx.x"
#> 
#> $median_range
#> [1] "xx.x (xx.x - xx.x)"
#> 
#> $cv
#> [1] "xx.x"
#> 
#> $geom_mean
#> [1] "xx.x"
#> 
#> $geom_mean_ci
#> [1] "(xx.xx, xx.xx)"
#> 
#> $geom_cv
#> [1] "xx.x"
#> 
get_formats_from_stats(cnt_stats)
#> $n
#> [1] "xx."
#> 
#> $count
#> [1] "xx."
#> 
#> $count_fraction
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else {
#>     paste0(x[1], " (", round(x[2] * 100, 1), "%)")
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $n_blq
#> [1] "xx."
#> 
get_formats_from_stats(only_pval)
#> $pval
#> [1] "x.xxxx | (<0.0001)"
#> 
get_formats_from_stats(all_cnt_occ)
#> $count
#> [1] "xx."
#> 
#> $count_fraction_fixed_dp
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else if (x[2] == 1) {
#>     sprintf("%d (100%%)", x[1])
#>   } else {
#>     sprintf("%d (%.1f%%)", x[1], x[2] * 100)
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $fraction
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#>   checkmate::assert_vector(x)
#>   checkmate::assert_count(x["num"])
#>   checkmate::assert_count(x["denom"])
#> 
#>   result <- if (x["num"] == 0) {
#>     paste0(x["num"], "/", x["denom"])
#>   } else {
#>     paste0(
#>       x["num"], "/", x["denom"],
#>       " (", sprintf("%.1f", round(x["num"] / x["denom"] * 100, 1)), "%)"
#>     )
#>   }
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 

# Addition of customs
get_formats_from_stats(all_cnt_occ, formats_in = c("fraction" = c("xx")))
#> $count
#> [1] "xx."
#> 
#> $count_fraction_fixed_dp
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else if (x[2] == 1) {
#>     sprintf("%d (100%%)", x[1])
#>   } else {
#>     sprintf("%d (%.1f%%)", x[1], x[2] * 100)
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $fraction
#> [1] "xx"
#> 
get_formats_from_stats(all_cnt_occ, formats_in = list("fraction" = c("xx.xx", "xx")))
#> $count
#> [1] "xx."
#> 
#> $count_fraction_fixed_dp
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else if (x[2] == 1) {
#>     sprintf("%d (100%%)", x[1])
#>   } else {
#>     sprintf("%d (%.1f%%)", x[1], x[2] * 100)
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $fraction
#> [1] "xx.xx" "xx"   
#> 

# Defaults labels
get_labels_from_stats(num_stats)
#>                             n                           sum 
#>                           "n"                         "Sum" 
#>                          mean                            sd 
#>                        "Mean"                          "SD" 
#>                            se                       mean_sd 
#>                          "SE"                   "Mean (SD)" 
#>                       mean_se                       mean_ci 
#>                   "Mean (SE)"                 "Mean 95% CI" 
#>                      mean_sei                      mean_sdi 
#>               "Mean -/+ 1xSE"               "Mean -/+ 1xSD" 
#>                     mean_pval                        median 
#> "Mean p-value (H0: mean = 0)"                      "Median" 
#>                           mad                     median_ci 
#>   "Median Absolute Deviation"               "Median 95% CI" 
#>                     quantiles                           iqr 
#>             "25% and 75%-ile"                         "IQR" 
#>                         range                           min 
#>                   "Min - Max"                     "Minimum" 
#>                           max                  median_range 
#>                     "Maximum"          "Median (Min - Max)" 
#>                            cv                     geom_mean 
#>                      "CV (%)"              "Geometric Mean" 
#>                  geom_mean_ci                       geom_cv 
#>       "Geometric Mean 95% CI"         "CV % Geometric Mean" 
get_labels_from_stats(cnt_stats)
#>                n            count   count_fraction            n_blq 
#>              "n"          "count" "count_fraction"          "n_blq" 
get_labels_from_stats(only_pval)
#>               pval 
#> "p-value (t-test)" 
get_labels_from_stats(all_cnt_occ)
#>                   count count_fraction_fixed_dp                fraction 
#>                 "count"                      ""                      "" 

# Addition of customs
get_labels_from_stats(all_cnt_occ, labels_in = c("fraction" = "Fraction"))
#>                   count count_fraction_fixed_dp                fraction 
#>                 "count"                      ""              "Fraction" 
get_labels_from_stats(all_cnt_occ, labels_in = list("fraction" = c("Some more fractions")))
#> $count
#> [1] "count"
#> 
#> $count_fraction_fixed_dp
#> [1] ""
#> 
#> $fraction
#> [1] "Some more fractions"
#> 

summary_formats()
#> $n
#> [1] "xx."
#> 
#> $sum
#> [1] "xx.x"
#> 
#> $mean
#> [1] "xx.x"
#> 
#> $sd
#> [1] "xx.x"
#> 
#> $se
#> [1] "xx.x"
#> 
#> $mean_sd
#> [1] "xx.x (xx.x)"
#> 
#> $mean_se
#> [1] "xx.x (xx.x)"
#> 
#> $mean_ci
#> [1] "(xx.xx, xx.xx)"
#> 
#> $mean_sei
#> [1] "(xx.xx, xx.xx)"
#> 
#> $mean_sdi
#> [1] "(xx.xx, xx.xx)"
#> 
#> $mean_pval
#> [1] "xx.xx"
#> 
#> $median
#> [1] "xx.x"
#> 
#> $mad
#> [1] "xx.x"
#> 
#> $median_ci
#> [1] "(xx.xx, xx.xx)"
#> 
#> $quantiles
#> [1] "xx.x - xx.x"
#> 
#> $iqr
#> [1] "xx.x"
#> 
#> $range
#> [1] "xx.x - xx.x"
#> 
#> $min
#> [1] "xx.x"
#> 
#> $max
#> [1] "xx.x"
#> 
#> $median_range
#> [1] "xx.x (xx.x - xx.x)"
#> 
#> $cv
#> [1] "xx.x"
#> 
#> $geom_mean
#> [1] "xx.x"
#> 
#> $geom_mean_ci
#> [1] "(xx.xx, xx.xx)"
#> 
#> $geom_cv
#> [1] "xx.x"
#> 
summary_formats(type = "counts", include_pval = TRUE)
#> $n
#> [1] "xx."
#> 
#> $count
#> [1] "xx."
#> 
#> $count_fraction
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else {
#>     paste0(x[1], " (", round(x[2] * 100, 1), "%)")
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $n_blq
#> [1] "xx."
#> 
#> $pval_counts
#> [1] "x.xxxx | (<0.0001)"
#> 

summary_labels()
#>                             n                           sum 
#>                           "n"                         "Sum" 
#>                          mean                            sd 
#>                        "Mean"                          "SD" 
#>                            se                       mean_sd 
#>                          "SE"                   "Mean (SD)" 
#>                       mean_se                       mean_ci 
#>                   "Mean (SE)"                 "Mean 95% CI" 
#>                      mean_sei                      mean_sdi 
#>               "Mean -/+ 1xSE"               "Mean -/+ 1xSD" 
#>                     mean_pval                        median 
#> "Mean p-value (H0: mean = 0)"                      "Median" 
#>                           mad                     median_ci 
#>   "Median Absolute Deviation"               "Median 95% CI" 
#>                     quantiles                           iqr 
#>             "25% and 75%-ile"                         "IQR" 
#>                         range                           min 
#>                   "Min - Max"                     "Minimum" 
#>                           max                  median_range 
#>                     "Maximum"          "Median (Min - Max)" 
#>                            cv                     geom_mean 
#>                      "CV (%)"              "Geometric Mean" 
#>                  geom_mean_ci                       geom_cv 
#>       "Geometric Mean 95% CI"         "CV % Geometric Mean" 
summary_labels(type = "counts", include_pval = TRUE)
#>                            n                        count 
#>                          "n"                      "count" 
#>               count_fraction                        n_blq 
#>             "count_fraction"                      "n_blq" 
#>                  pval_counts 
#> "p-value (chi-squared test)" 

summary_custom()
#> Warning: `summary_custom()` was deprecated in tern 0.9.0.9001.
#>  Please use `get_stats`, `get_formats_from_stats`, and `get_labels_from_stats`
#>   directly instead.
#> $stats
#>  [1] "n"            "sum"          "mean"         "sd"           "se"          
#>  [6] "mean_sd"      "mean_se"      "mean_ci"      "mean_sei"     "mean_sdi"    
#> [11] "mean_pval"    "median"       "mad"          "median_ci"    "quantiles"   
#> [16] "iqr"          "range"        "min"          "max"          "median_range"
#> [21] "cv"           "geom_mean"    "geom_mean_ci" "geom_cv"     
#> 
#> $formats
#> $formats$n
#> [1] "xx."
#> 
#> $formats$sum
#> [1] "xx.x"
#> 
#> $formats$mean
#> [1] "xx.x"
#> 
#> $formats$sd
#> [1] "xx.x"
#> 
#> $formats$se
#> [1] "xx.x"
#> 
#> $formats$mean_sd
#> [1] "xx.x (xx.x)"
#> 
#> $formats$mean_se
#> [1] "xx.x (xx.x)"
#> 
#> $formats$mean_ci
#> [1] "(xx.xx, xx.xx)"
#> 
#> $formats$mean_sei
#> [1] "(xx.xx, xx.xx)"
#> 
#> $formats$mean_sdi
#> [1] "(xx.xx, xx.xx)"
#> 
#> $formats$mean_pval
#> [1] "xx.xx"
#> 
#> $formats$median
#> [1] "xx.x"
#> 
#> $formats$mad
#> [1] "xx.x"
#> 
#> $formats$median_ci
#> [1] "(xx.xx, xx.xx)"
#> 
#> $formats$quantiles
#> [1] "xx.x - xx.x"
#> 
#> $formats$iqr
#> [1] "xx.x"
#> 
#> $formats$range
#> [1] "xx.x - xx.x"
#> 
#> $formats$min
#> [1] "xx.x"
#> 
#> $formats$max
#> [1] "xx.x"
#> 
#> $formats$median_range
#> [1] "xx.x (xx.x - xx.x)"
#> 
#> $formats$cv
#> [1] "xx.x"
#> 
#> $formats$geom_mean
#> [1] "xx.x"
#> 
#> $formats$geom_mean_ci
#> [1] "(xx.xx, xx.xx)"
#> 
#> $formats$geom_cv
#> [1] "xx.x"
#> 
#> 
#> $labels
#>                             n                           sum 
#>                           "n"                         "Sum" 
#>                          mean                            sd 
#>                        "Mean"                          "SD" 
#>                            se                       mean_sd 
#>                          "SE"                   "Mean (SD)" 
#>                       mean_se                       mean_ci 
#>                   "Mean (SE)"                 "Mean 95% CI" 
#>                      mean_sei                      mean_sdi 
#>               "Mean -/+ 1xSE"               "Mean -/+ 1xSD" 
#>                     mean_pval                        median 
#> "Mean p-value (H0: mean = 0)"                      "Median" 
#>                           mad                     median_ci 
#>   "Median Absolute Deviation"               "Median 95% CI" 
#>                     quantiles                           iqr 
#>             "25% and 75%-ile"                         "IQR" 
#>                         range                           min 
#>                   "Min - Max"                     "Minimum" 
#>                           max                  median_range 
#>                     "Maximum"          "Median (Min - Max)" 
#>                            cv                     geom_mean 
#>                      "CV (%)"              "Geometric Mean" 
#>                  geom_mean_ci                       geom_cv 
#>       "Geometric Mean 95% CI"         "CV % Geometric Mean" 
#> 
#> $indent_mods
#>            n          sum         mean           sd           se      mean_sd 
#>            0            0            0            0            0            0 
#>      mean_se      mean_ci     mean_sei     mean_sdi    mean_pval       median 
#>            0            0            0            0            0            0 
#>          mad    median_ci    quantiles          iqr        range          min 
#>            0            0            0            0            0            0 
#>          max median_range           cv    geom_mean geom_mean_ci      geom_cv 
#>            0            0            0            0            0            0 
#> 
summary_custom(type = "counts", include_pval = TRUE)
#> $stats
#> [1] "n"              "count"          "count_fraction" "n_blq"         
#> [5] "pval_counts"   
#> 
#> $formats
#> $formats$n
#> [1] "xx."
#> 
#> $formats$count
#> [1] "xx."
#> 
#> $formats$count_fraction
#> function(x, ...) {
#>   attr(x, "label") <- NULL
#> 
#>   if (any(is.na(x))) {
#>     return("NA")
#>   }
#> 
#>   checkmate::assert_vector(x)
#>   checkmate::assert_integerish(x[1])
#>   assert_proportion_value(x[2], include_boundaries = TRUE)
#> 
#>   result <- if (x[1] == 0) {
#>     "0"
#>   } else {
#>     paste0(x[1], " (", round(x[2] * 100, 1), "%)")
#>   }
#> 
#>   return(result)
#> }
#> <environment: namespace:tern>
#> 
#> $formats$n_blq
#> [1] "xx."
#> 
#> $formats$pval_counts
#> [1] "x.xxxx | (<0.0001)"
#> 
#> 
#> $labels
#>                            n                        count 
#>                          "n"                      "count" 
#>               count_fraction                        n_blq 
#>             "count_fraction"                      "n_blq" 
#>                  pval_counts 
#> "p-value (chi-squared test)" 
#> 
#> $indent_mods
#>              n          count count_fraction          n_blq    pval_counts 
#>              0              0              0              0              0 
#> 
summary_custom(
  include_pval = TRUE, stats_custom = c("n", "mean", "sd", "pval"),
  labels_custom = c(sd = "Std. Dev."), indent_mods_custom = 3L
)
#> $stats
#> [1] "n"    "mean" "sd"   "pval"
#> 
#> $formats
#> $formats$n
#> [1] "xx."
#> 
#> $formats$mean
#> [1] "xx.x"
#> 
#> $formats$sd
#> [1] "xx.x"
#> 
#> $formats$pval
#> [1] "x.xxxx | (<0.0001)"
#> 
#> 
#> $labels
#>                  n               mean                 sd               pval 
#>                "n"             "Mean"        "Std. Dev." "p-value (t-test)" 
#> 
#> $indent_mods
#>    n mean   sd pval 
#>    3    3    3    3 
#>