The analyze function compare_vars()
creates a layout element to summarize and compare one or more variables, using
the S3 generic function s_summary()
to calculate a list of summary statistics. A list of all available statistics
for numeric variables can be viewed by running get_stats("analyze_vars_numeric", add_pval = TRUE)
and for
non-numeric variables by running get_stats("analyze_vars_counts", add_pval = TRUE)
. Use the .stats
parameter to
specify the statistics to include in your output summary table.
Prior to using this function in your table layout you must use rtables::split_cols_by()
to create a column
split on the variable to be used in comparisons, and specify a reference group via the ref_group
parameter.
Comparisons can be performed for each group (column) against the specified reference group by including the p-value
statistic.
Usage
compare_vars(
lyt,
vars,
var_labels = vars,
na_str = default_na_str(),
nested = TRUE,
...,
na.rm = TRUE,
show_labels = "default",
table_names = vars,
section_div = NA_character_,
.stats = c("n", "mean_sd", "count_fraction", "pval"),
.formats = NULL,
.labels = NULL,
.indent_mods = NULL
)
s_compare(x, .ref_group, .in_ref_col, ...)
# S3 method for class 'numeric'
s_compare(x, .ref_group, .in_ref_col, ...)
# S3 method for class 'factor'
s_compare(x, .ref_group, .in_ref_col, denom = "n", na.rm = TRUE, ...)
# S3 method for class 'character'
s_compare(
x,
.ref_group,
.in_ref_col,
denom = "n",
na.rm = TRUE,
.var,
verbose = TRUE,
...
)
# S3 method for class 'logical'
s_compare(x, .ref_group, .in_ref_col, na.rm = TRUE, denom = "n", ...)
Arguments
- lyt
(
PreDataTableLayouts
)
layout that analyses will be added to.- vars
(
character
)
variable names for the primary analysis variable to be iterated over.- var_labels
(
character
)
variable labels.- na_str
(
string
)
string used to replace allNA
or empty values in the output.- nested
(
flag
)
whether this layout instruction should be applied within the existing layout structure _if possible (TRUE
, the default) or as a new top-level element (FALSE
). Ignored if it would nest a split. underneath analyses, which is not allowed.- ...
arguments passed to
s_compare()
.- na.rm
(
flag
)
whetherNA
values should be removed fromx
prior to analysis.- show_labels
(
string
)
label visibility: one of "default", "visible" and "hidden".- table_names
(
character
)
this can be customized in the case that the samevars
are analyzed multiple times, to avoid warnings fromrtables
.- section_div
(
string
)
string which should be repeated as a section divider after each group defined by this split instruction, orNA_character_
(the default) for no section divider.- .stats
-
(
character
)
statistics to select for the table.Options for numeric variables are:
'n', 'sum', 'mean', 'sd', 'se', 'mean_sd', 'mean_se', 'mean_ci', 'mean_sei', 'mean_sdi', 'mean_pval', 'median', 'mad', 'median_ci', 'quantiles', 'iqr', 'range', 'min', 'max', 'median_range', 'cv', 'geom_mean', 'geom_mean_ci', 'geom_cv', 'median_ci_3d', 'mean_ci_3d', 'geom_mean_ci_3d', 'pval'
Options for non-numeric variables are:
'n', 'count', 'count_fraction', 'count_fraction_fixed_dp', 'fraction', 'n_blq', 'pval_counts'
- .formats
(named
character
orlist
)
formats for the statistics. See Details inanalyze_vars
for more information on the"auto"
setting.- .labels
(named
character
)
labels for the statistics (without indent).- .indent_mods
(named
integer
)
indent modifiers for the labels. Each element of the vector should be a name-value pair with name corresponding to a statistic specified in.stats
and value the indentation for that statistic's row label.- x
(
numeric
)
vector of numbers we want to analyze.- .ref_group
(
data.frame
orvector
)
the data corresponding to the reference group.- .in_ref_col
(
flag
)TRUE
when working with the reference level,FALSE
otherwise.- denom
(
string
)
choice of denominator for factor proportions, can only ben
(number of values in this row and column intersection).- .var
(
string
)
single variable name that is passed byrtables
when requested by a statistics function.- verbose
(
flag
)
whether warnings and messages should be printed. Mainly used to print out information about factor casting. Defaults toTRUE
.
Value
compare_vars()
returns a layout object suitable for passing to further layouting functions, or tortables::build_table()
. Adding this function to anrtable
layout will add formatted rows containing the statistics froms_compare()
to the table layout.
s_compare()
returns output ofs_summary()
and comparisons versus the reference group in the form of p-values.
Functions
compare_vars()
: Layout-creating function which can take statistics function arguments and additional format arguments. This function is a wrapper forrtables::analyze()
.s_compare()
: S3 generic function to produce a comparison summary.s_compare(numeric)
: Method fornumeric
class. This uses the standard t-test to calculate the p-value.s_compare(factor)
: Method forfactor
class. This uses the chi-squared test to calculate the p-value.s_compare(character)
: Method forcharacter
class. This makes an automatic conversion tofactor
(with a warning) and then forwards to the method for factors.s_compare(logical)
: Method forlogical
class. A chi-squared test is used. If missing values are not removed, then they are counted asFALSE
.
Note
For factor variables,
denom
for factor proportions can only ben
since the purpose is to compare proportions between columns, therefore a row-based proportion would not make sense. Proportion based onN_col
would be difficult since we use counts for the chi-squared test statistic, therefore missing values should be accounted for as explicit factor levels.If factor variables contain
NA
, theseNA
values are excluded by default. To includeNA
values setna.rm = FALSE
and missing values will be displayed as anNA
level. Alternatively, an explicit factor level can be defined forNA
values during pre-processing viadf_explicit_na()
- the defaultna_level
("<Missing>"
) will also be excluded whenna.rm
is set toTRUE
.For character variables, automatic conversion to factor does not guarantee that the table will be generated correctly. In particular for sparse tables this very likely can fail. Therefore it is always better to manually convert character variables to factors during pre-processing.
For
compare_vars()
, the column split must define a reference group viaref_group
so that the comparison is well defined.
See also
s_summary()
which is used internally to compute a summary within s_compare()
, and a_summary()
which is used (with compare = TRUE
) as the analysis function for compare_vars()
.
Examples
# `compare_vars()` in `rtables` pipelines
## Default output within a `rtables` pipeline.
lyt <- basic_table() %>%
split_cols_by("ARMCD", ref_group = "ARM B") %>%
compare_vars(c("AGE", "SEX"))
build_table(lyt, tern_ex_adsl)
#> ARM A ARM B ARM C
#> ———————————————————————————————————————————————————————————————————
#> AGE
#> n 69 73 58
#> Mean (SD) 34.1 (6.8) 35.8 (7.1) 36.1 (7.4)
#> p-value (t-test) 0.1446 0.8212
#> SEX
#> n 69 73 58
#> F 38 (55.1%) 40 (54.8%) 32 (55.2%)
#> M 31 (44.9%) 33 (45.2%) 26 (44.8%)
#> p-value (chi-squared test) 1.0000 1.0000
## Select and format statistics output.
lyt <- basic_table() %>%
split_cols_by("ARMCD", ref_group = "ARM C") %>%
compare_vars(
vars = "AGE",
.stats = c("mean_sd", "pval"),
.formats = c(mean_sd = "xx.x, xx.x"),
.labels = c(mean_sd = "Mean, SD")
)
build_table(lyt, df = tern_ex_adsl)
#> ARM A ARM B ARM C
#> ————————————————————————————————————————————————————
#> Mean, SD 34.1, 6.8 35.8, 7.1 36.1, 7.4
#> p-value (t-test) 0.1176 0.8212
# `s_compare.numeric`
## Usual case where both this and the reference group vector have more than 1 value.
s_compare(rnorm(10, 5, 1), .ref_group = rnorm(5, -5, 1), .in_ref_col = FALSE)
#> $n
#> n
#> 10
#>
#> $sum
#> sum
#> 51.27191
#>
#> $mean
#> mean
#> 5.127191
#>
#> $sd
#> sd
#> 1.226119
#>
#> $se
#> se
#> 0.387733
#>
#> $mean_sd
#> mean sd
#> 5.127191 1.226119
#>
#> $mean_se
#> mean se
#> 5.127191 0.387733
#>
#> $mean_ci
#> mean_ci_lwr mean_ci_upr
#> 4.250078 6.004304
#> attr(,"label")
#> [1] "Mean 95% CI"
#>
#> $mean_sei
#> mean_sei_lwr mean_sei_upr
#> 4.739458 5.514924
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#>
#> $mean_sdi
#> mean_sdi_lwr mean_sdi_upr
#> 3.901071 6.353310
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#>
#> $mean_ci_3d
#> mean mean_ci_lwr mean_ci_upr
#> 5.127191 4.250078 6.004304
#> attr(,"label")
#> [1] "Mean (95% CI)"
#>
#> $mean_pval
#> p_value
#> 3.353908e-07
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#>
#> $median
#> median
#> 5.024369
#>
#> $mad
#> mad
#> -4.440892e-16
#>
#> $median_ci
#> median_ci_lwr median_ci_upr
#> 4.638779 5.862086
#> attr(,"conf_level")
#> [1] 0.9785156
#> attr(,"label")
#> [1] "Median 95% CI"
#>
#> $median_ci_3d
#> median median_ci_lwr median_ci_upr
#> 5.024369 4.638779 5.862086
#> attr(,"label")
#> [1] "Median (95% CI)"
#>
#> $quantiles
#> quantile_0.25 quantile_0.75
#> 4.756763 5.549828
#> attr(,"label")
#> [1] "25% and 75%-ile"
#>
#> $iqr
#> iqr
#> 0.7930643
#>
#> $range
#> min max
#> 2.725885 7.682557
#>
#> $min
#> min
#> 2.725885
#>
#> $max
#> max
#> 7.682557
#>
#> $median_range
#> median min max
#> 5.024369 2.725885 7.682557
#> attr(,"label")
#> [1] "Median (Min - Max)"
#>
#> $cv
#> cv
#> 23.91406
#>
#> $geom_mean
#> geom_mean
#> 4.985435
#>
#> $geom_mean_ci
#> mean_ci_lwr mean_ci_upr
#> 4.144463 5.997052
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#>
#> $geom_cv
#> geom_cv
#> 26.26258
#>
#> $geom_mean_ci_3d
#> geom_mean mean_ci_lwr mean_ci_upr
#> 4.985435 4.144463 5.997052
#> attr(,"label")
#> [1] "Geometric Mean (95% CI)"
#>
#> $pval
#> [1] 2.25779e-08
#>
## If one group has not more than 1 value, then p-value is not calculated.
s_compare(rnorm(10, 5, 1), .ref_group = 1, .in_ref_col = FALSE)
#> $n
#> n
#> 10
#>
#> $sum
#> sum
#> 50.71578
#>
#> $mean
#> mean
#> 5.071578
#>
#> $sd
#> sd
#> 1.105832
#>
#> $se
#> se
#> 0.3496948
#>
#> $mean_sd
#> mean sd
#> 5.071578 1.105832
#>
#> $mean_se
#> mean se
#> 5.0715780 0.3496948
#>
#> $mean_ci
#> mean_ci_lwr mean_ci_upr
#> 4.280513 5.862643
#> attr(,"label")
#> [1] "Mean 95% CI"
#>
#> $mean_sei
#> mean_sei_lwr mean_sei_upr
#> 4.721883 5.421273
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#>
#> $mean_sdi
#> mean_sdi_lwr mean_sdi_upr
#> 3.965746 6.177410
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#>
#> $mean_ci_3d
#> mean mean_ci_lwr mean_ci_upr
#> 5.071578 4.280513 5.862643
#> attr(,"label")
#> [1] "Mean (95% CI)"
#>
#> $mean_pval
#> p_value
#> 1.511204e-07
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#>
#> $median
#> median
#> 5.260423
#>
#> $mad
#> mad
#> 0
#>
#> $median_ci
#> median_ci_lwr median_ci_upr
#> 3.529264 6.318293
#> attr(,"conf_level")
#> [1] 0.9785156
#> attr(,"label")
#> [1] "Median 95% CI"
#>
#> $median_ci_3d
#> median median_ci_lwr median_ci_upr
#> 5.260423 3.529264 6.318293
#> attr(,"label")
#> [1] "Median (95% CI)"
#>
#> $quantiles
#> quantile_0.25 quantile_0.75
#> 4.024149 6.065057
#> attr(,"label")
#> [1] "25% and 75%-ile"
#>
#> $iqr
#> iqr
#> 2.040908
#>
#> $range
#> min max
#> 3.300549 6.337320
#>
#> $min
#> min
#> 3.300549
#>
#> $max
#> max
#> 6.33732
#>
#> $median_range
#> median min max
#> 5.260423 3.300549 6.337320
#> attr(,"label")
#> [1] "Median (Min - Max)"
#>
#> $cv
#> cv
#> 21.8045
#>
#> $geom_mean
#> geom_mean
#> 4.952266
#>
#> $geom_mean_ci
#> mean_ci_lwr mean_ci_upr
#> 4.181833 5.864639
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#>
#> $geom_cv
#> geom_cv
#> 23.97201
#>
#> $geom_mean_ci_3d
#> geom_mean mean_ci_lwr mean_ci_upr
#> 4.952266 4.181833 5.864639
#> attr(,"label")
#> [1] "Geometric Mean (95% CI)"
#>
#> $pval
#> character(0)
#>
## Empty numeric does not fail, it returns NA-filled items and no p-value.
s_compare(numeric(), .ref_group = numeric(), .in_ref_col = FALSE)
#> $n
#> n
#> 0
#>
#> $sum
#> sum
#> NA
#>
#> $mean
#> mean
#> NA
#>
#> $sd
#> sd
#> NA
#>
#> $se
#> se
#> NA
#>
#> $mean_sd
#> mean sd
#> NA NA
#>
#> $mean_se
#> mean se
#> NA NA
#>
#> $mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean 95% CI"
#>
#> $mean_sei
#> mean_sei_lwr mean_sei_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#>
#> $mean_sdi
#> mean_sdi_lwr mean_sdi_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#>
#> $mean_ci_3d
#> mean mean_ci_lwr mean_ci_upr
#> NA NA NA
#> attr(,"label")
#> [1] "Mean (95% CI)"
#>
#> $mean_pval
#> p_value
#> NA
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#>
#> $median
#> median
#> NA
#>
#> $mad
#> mad
#> NA
#>
#> $median_ci
#> median_ci_lwr median_ci_upr
#> NA NA
#> attr(,"conf_level")
#> [1] NA
#> attr(,"label")
#> [1] "Median 95% CI"
#>
#> $median_ci_3d
#> median median_ci_lwr median_ci_upr
#> NA NA NA
#> attr(,"label")
#> [1] "Median (95% CI)"
#>
#> $quantiles
#> quantile_0.25 quantile_0.75
#> NA NA
#> attr(,"label")
#> [1] "25% and 75%-ile"
#>
#> $iqr
#> iqr
#> NA
#>
#> $range
#> min max
#> NA NA
#>
#> $min
#> min
#> NA
#>
#> $max
#> max
#> NA
#>
#> $median_range
#> median min max
#> NA NA NA
#> attr(,"label")
#> [1] "Median (Min - Max)"
#>
#> $cv
#> cv
#> NA
#>
#> $geom_mean
#> geom_mean
#> NaN
#>
#> $geom_mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#>
#> $geom_cv
#> geom_cv
#> NA
#>
#> $geom_mean_ci_3d
#> geom_mean mean_ci_lwr mean_ci_upr
#> NaN NA NA
#> attr(,"label")
#> [1] "Geometric Mean (95% CI)"
#>
#> $pval
#> character(0)
#>
# `s_compare.factor`
## Basic usage:
x <- factor(c("a", "a", "b", "c", "a"))
y <- factor(c("a", "b", "c"))
s_compare(x = x, .ref_group = y, .in_ref_col = FALSE)
#> $n
#> [1] 5
#>
#> $count
#> $count$a
#> [1] 3
#>
#> $count$b
#> [1] 1
#>
#> $count$c
#> [1] 1
#>
#>
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0 0.6
#>
#> $count_fraction$b
#> [1] 1.0 0.2
#>
#> $count_fraction$c
#> [1] 1.0 0.2
#>
#>
#> $fraction
#> $fraction$a
#> num denom
#> 3 5
#>
#> $fraction$b
#> num denom
#> 1 5
#>
#> $fraction$c
#> num denom
#> 1 5
#>
#>
#> $n_blq
#> [1] 0
#>
#> $pval_counts
#> [1] 0.7659283
#>
## Management of NA values.
x <- explicit_na(factor(c("a", "a", "b", "c", "a", NA, NA)))
y <- explicit_na(factor(c("a", "b", "c", NA)))
s_compare(x = x, .ref_group = y, .in_ref_col = FALSE, na.rm = TRUE)
#> $n
#> [1] 5
#>
#> $count
#> $count$a
#> [1] 3
#>
#> $count$b
#> [1] 1
#>
#> $count$c
#> [1] 1
#>
#>
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0 0.6
#>
#> $count_fraction$b
#> [1] 1.0 0.2
#>
#> $count_fraction$c
#> [1] 1.0 0.2
#>
#>
#> $fraction
#> $fraction$a
#> num denom
#> 3 5
#>
#> $fraction$b
#> num denom
#> 1 5
#>
#> $fraction$c
#> num denom
#> 1 5
#>
#>
#> $n_blq
#> [1] 0
#>
#> $pval_counts
#> [1] 0.7659283
#>
s_compare(x = x, .ref_group = y, .in_ref_col = FALSE, na.rm = FALSE)
#> $n
#> [1] 7
#>
#> $count
#> $count$a
#> [1] 3
#>
#> $count$b
#> [1] 1
#>
#> $count$c
#> [1] 1
#>
#> $count$`<Missing>`
#> [1] 2
#>
#>
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0000000 0.4285714
#>
#> $count_fraction$b
#> [1] 1.0000000 0.1428571
#>
#> $count_fraction$c
#> [1] 1.0000000 0.1428571
#>
#> $count_fraction$`<Missing>`
#> [1] 2.0000000 0.2857143
#>
#>
#> $fraction
#> $fraction$a
#> num denom
#> 3 7
#>
#> $fraction$b
#> num denom
#> 1 7
#>
#> $fraction$c
#> num denom
#> 1 7
#>
#> $fraction$`<Missing>`
#> num denom
#> 2 7
#>
#>
#> $n_blq
#> [1] 0
#>
#> $pval_counts
#> [1] 0.9063036
#>
# `s_compare.character`
## Basic usage:
x <- c("a", "a", "b", "c", "a")
y <- c("a", "b", "c")
s_compare(x, .ref_group = y, .in_ref_col = FALSE, .var = "x", verbose = FALSE)
#> $n
#> [1] 5
#>
#> $count
#> $count$a
#> [1] 3
#>
#> $count$b
#> [1] 1
#>
#> $count$c
#> [1] 1
#>
#>
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0 0.6
#>
#> $count_fraction$b
#> [1] 1.0 0.2
#>
#> $count_fraction$c
#> [1] 1.0 0.2
#>
#>
#> $fraction
#> $fraction$a
#> num denom
#> 3 5
#>
#> $fraction$b
#> num denom
#> 1 5
#>
#> $fraction$c
#> num denom
#> 1 5
#>
#>
#> $n_blq
#> [1] 0
#>
#> $pval_counts
#> [1] 0.7659283
#>
## Note that missing values handling can make a large difference:
x <- c("a", "a", "b", "c", "a", NA)
y <- c("a", "b", "c", rep(NA, 20))
s_compare(x,
.ref_group = y, .in_ref_col = FALSE,
.var = "x", verbose = FALSE
)
#> $n
#> [1] 5
#>
#> $count
#> $count$a
#> [1] 3
#>
#> $count$b
#> [1] 1
#>
#> $count$c
#> [1] 1
#>
#>
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0 0.6
#>
#> $count_fraction$b
#> [1] 1.0 0.2
#>
#> $count_fraction$c
#> [1] 1.0 0.2
#>
#>
#> $fraction
#> $fraction$a
#> num denom
#> 3 5
#>
#> $fraction$b
#> num denom
#> 1 5
#>
#> $fraction$c
#> num denom
#> 1 5
#>
#>
#> $n_blq
#> [1] 0
#>
#> $pval_counts
#> [1] 0.7659283
#>
s_compare(x,
.ref_group = y, .in_ref_col = FALSE, .var = "x",
na.rm = FALSE, verbose = FALSE
)
#> $n
#> [1] 6
#>
#> $count
#> $count$a
#> [1] 3
#>
#> $count$b
#> [1] 1
#>
#> $count$c
#> [1] 1
#>
#> $count$`<Missing>`
#> [1] 1
#>
#>
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0 0.5
#>
#> $count_fraction$b
#> [1] 1.0000000 0.1666667
#>
#> $count_fraction$c
#> [1] 1.0000000 0.1666667
#>
#> $count_fraction$`<Missing>`
#> [1] 1.0000000 0.1666667
#>
#>
#> $fraction
#> $fraction$a
#> num denom
#> 3 6
#>
#> $fraction$b
#> num denom
#> 1 6
#>
#> $fraction$c
#> num denom
#> 1 6
#>
#> $fraction$`<Missing>`
#> num denom
#> 1 6
#>
#>
#> $n_blq
#> [1] 0
#>
#> $pval_counts
#> [1] 0.005768471
#>
# `s_compare.logical`
## Basic usage:
x <- c(TRUE, FALSE, TRUE, TRUE)
y <- c(FALSE, FALSE, TRUE)
s_compare(x, .ref_group = y, .in_ref_col = FALSE)
#> $n
#> [1] 4
#>
#> $count
#> [1] 3
#>
#> $count_fraction
#> [1] 3.00 0.75
#>
#> $n_blq
#> [1] 0
#>
#> $pval_counts
#> [1] 0.2702894
#>
## Management of NA values.
x <- c(NA, TRUE, FALSE)
y <- c(NA, NA, NA, NA, FALSE)
s_compare(x, .ref_group = y, .in_ref_col = FALSE, na.rm = TRUE)
#> $n
#> [1] 2
#>
#> $count
#> [1] 1
#>
#> $count_fraction
#> [1] 1.0 0.5
#>
#> $n_blq
#> [1] 0
#>
#> $pval_counts
#> [1] 0.3864762
#>
s_compare(x, .ref_group = y, .in_ref_col = FALSE, na.rm = FALSE)
#> $n
#> [1] 3
#>
#> $count
#> [1] 1
#>
#> $count_fraction
#> [1] 1.0000000 0.3333333
#>
#> $n_blq
#> [1] 0
#>
#> $pval_counts
#> [1] 0.1675463
#>