The analyze function count_occurrences()
creates a layout element to calculate occurrence counts for patients.
This function analyzes the variable(s) supplied to vars
and returns a table of occurrence counts for
each unique value (or level) of the variable(s). This variable (or variables) must be
non-numeric. The id
variable is used to indicate unique subject identifiers (defaults to USUBJID
).
If there are multiple occurrences of the same value recorded for a patient, the value is only counted once.
The summarize function summarize_occurrences()
performs the same function as count_occurrences()
except it
creates content rows, not data rows, to summarize the current table row/column context and operates on the level of
the latest row split or the root of the table if no row splits have occurred.
Usage
count_occurrences(
lyt,
vars,
id = "USUBJID",
drop = TRUE,
var_labels = vars,
show_labels = "hidden",
riskdiff = FALSE,
na_str = default_na_str(),
nested = TRUE,
...,
table_names = vars,
.stats = "count_fraction_fixed_dp",
.formats = NULL,
.labels = NULL,
.indent_mods = NULL
)
summarize_occurrences(
lyt,
var,
id = "USUBJID",
drop = TRUE,
riskdiff = FALSE,
na_str = default_na_str(),
...,
.stats = "count_fraction_fixed_dp",
.formats = NULL,
.indent_mods = NULL,
.labels = NULL
)
s_count_occurrences(
df,
denom = c("N_col", "n", "N_row"),
.N_col,
.N_row,
.df_row,
drop = TRUE,
.var = "MHDECOD",
id = "USUBJID"
)
a_count_occurrences(
df,
labelstr = "",
id = "USUBJID",
denom = c("N_col", "n", "N_row"),
drop = TRUE,
.N_col,
.N_row,
.var = NULL,
.df_row = NULL,
.stats = NULL,
.formats = NULL,
.labels = NULL,
.indent_mods = NULL,
na_str = default_na_str()
)
Arguments
- lyt
(
PreDataTableLayouts
)
layout that analyses will be added to.- vars
(
character
)
variable names for the primary analysis variable to be iterated over.- id
(
string
)
subject variable name.- drop
(
flag
)
whether non-appearing occurrence levels should be dropped from the resulting table. Note that in that case the remaining occurrence levels in the table are sorted alphabetically.- var_labels
(
character
)
variable labels.- show_labels
(
string
)
label visibility: one of "default", "visible" and "hidden".- riskdiff
(
flag
)
whether a risk difference column is present. When set toTRUE
,add_riskdiff()
must be used assplit_fun
in the prior column split of the table layout, specifying which columns should be compared. Seestat_propdiff_ci()
for details on risk difference calculation.- na_str
(
string
)
string used to replace allNA
or empty values in the output.- nested
(
flag
)
whether this layout instruction should be applied within the existing layout structure _if possible (TRUE
, the default) or as a new top-level element (FALSE
). Ignored if it would nest a split. underneath analyses, which is not allowed.- ...
additional arguments for the lower level functions.
- table_names
(
character
)
this can be customized in the case that the samevars
are analyzed multiple times, to avoid warnings fromrtables
.- .stats
-
(
character
)
statistics to select for the table.Options are:
'count', 'count_fraction', 'count_fraction_fixed_dp', 'fraction'
- .formats
(named
character
orlist
)
formats for the statistics. See Details inanalyze_vars
for more information on the"auto"
setting.- .labels
(named
character
)
labels for the statistics (without indent).- .indent_mods
(named
integer
)
indent modifiers for the labels. Defaults to 0, which corresponds to the unmodified default behavior. Can be negative.- df
(
data.frame
)
data set containing all analysis variables.- denom
-
(
string
)
choice of denominator for proportion. Options are:N_col
: total number of patients in this column across rows.n
: number of patients with any occurrences.N_row
: total number of patients in this row across columns.
- .N_col
(
integer(1)
)
column-wise N (column count) for the full column being analyzed that is typically passed byrtables
.- .N_row
(
integer(1)
)
row-wise N (row group count) for the group of observations being analyzed (i.e. with no column-based subsetting) that is typically passed byrtables
.- .df_row
(
data.frame
)
data frame across all of the columns for the given row split.- .var, var
(
string
)
single variable name that is passed byrtables
when requested by a statistics function.- labelstr
(
string
)
label of the level of the parent split currently being summarized (must be present as second argument in Content Row Functions). Seertables::summarize_row_groups()
for more information.
Value
count_occurrences()
returns a layout object suitable for passing to further layouting functions, or tortables::build_table()
. Adding this function to anrtable
layout will add formatted rows containing the statistics froms_count_occurrences()
to the table layout.
summarize_occurrences()
returns a layout object suitable for passing to further layouting functions, or tortables::build_table()
. Adding this function to anrtable
layout will add formatted content rows containing the statistics froms_count_occurrences()
to the table layout.
-
s_count_occurrences()
returns a list with:count
: list of counts with one element per occurrence.count_fraction
: list of counts and fractions with one element per occurrence.fraction
: list of numerators and denominators with one element per occurrence.
a_count_occurrences()
returns the corresponding list with formattedrtables::CellValue()
.
Functions
count_occurrences()
: Layout-creating function which can take statistics function arguments and additional format arguments. This function is a wrapper forrtables::analyze()
.summarize_occurrences()
: Layout-creating function which can take content function arguments and additional format arguments. This function is a wrapper forrtables::summarize_row_groups()
.s_count_occurrences()
: Statistics function which counts number of patients that report an occurrence.a_count_occurrences()
: Formatted analysis function which is used asafun
incount_occurrences()
.
Note
By default, occurrences which don't appear in a given row split are dropped from the table and
the occurrences in the table are sorted alphabetically per row split. Therefore, the corresponding layout
needs to use split_fun = drop_split_levels
in the split_rows_by
calls. Use drop = FALSE
if you would
like to show all occurrences.
Examples
library(dplyr)
df <- data.frame(
USUBJID = as.character(c(
1, 1, 2, 4, 4, 4,
6, 6, 6, 7, 7, 8
)),
MHDECOD = c(
"MH1", "MH2", "MH1", "MH1", "MH1", "MH3",
"MH2", "MH2", "MH3", "MH1", "MH2", "MH4"
),
ARM = rep(c("A", "B"), each = 6),
SEX = c("F", "F", "M", "M", "M", "M", "F", "F", "F", "M", "M", "F")
)
df_adsl <- df %>%
select(USUBJID, ARM) %>%
unique()
# Create table layout
lyt <- basic_table() %>%
split_cols_by("ARM") %>%
add_colcounts() %>%
count_occurrences(vars = "MHDECOD", .stats = c("count_fraction"))
# Apply table layout to data and produce `rtable` object
tbl <- lyt %>%
build_table(df, alt_counts_df = df_adsl) %>%
prune_table()
tbl
#> A B
#> (N=3) (N=3)
#> ———————————————————————————
#> MH1 3 (100%) 1 (33.3%)
#> MH2 1 (33.3%) 2 (66.7%)
#> MH3 1 (33.3%) 1 (33.3%)
#> MH4 0 1 (33.3%)
# Layout creating function with custom format.
basic_table() %>%
add_colcounts() %>%
split_rows_by("SEX", child_labels = "visible") %>%
summarize_occurrences(
var = "MHDECOD",
.formats = c("count_fraction" = "xx.xx (xx.xx%)")
) %>%
build_table(df, alt_counts_df = df_adsl)
#> all obs
#> (N=6)
#> —————————————————
#> F
#> MH1 1 (16.7%)
#> MH2 2 (33.3%)
#> MH3 1 (16.7%)
#> MH4 1 (16.7%)
#> M
#> MH1 3 (50.0%)
#> MH2 1 (16.7%)
#> MH3 1 (16.7%)
# Count unique occurrences per subject.
s_count_occurrences(
df,
.N_col = 4L,
.N_row = 4L,
.df_row = df,
.var = "MHDECOD",
id = "USUBJID"
)
#> $count
#> $count$MH1
#> [1] 4
#>
#> $count$MH2
#> [1] 3
#>
#> $count$MH3
#> [1] 2
#>
#> $count$MH4
#> [1] 1
#>
#>
#> $count_fraction
#> $count_fraction$MH1
#> [1] 4 1
#>
#> $count_fraction$MH2
#> [1] 3.00 0.75
#>
#> $count_fraction$MH3
#> [1] 2.0 0.5
#>
#> $count_fraction$MH4
#> [1] 1.00 0.25
#>
#>
#> $fraction
#> $fraction$MH1
#> num denom
#> 4 4
#>
#> $fraction$MH2
#> num denom
#> 3 4
#>
#> $fraction$MH3
#> num denom
#> 2 4
#>
#> $fraction$MH4
#> num denom
#> 1 4
#>
#>
a_count_occurrences(
df,
.N_col = 4L,
.df_row = df,
.var = "MHDECOD",
id = "USUBJID"
)
#> RowsVerticalSection (in_rows) object print method:
#> ----------------------------
#> row_name formatted_cell indent_mod row_label
#> 1 MH1 4 0 MH1
#> 2 MH2 3 0 MH2
#> 3 MH3 2 0 MH3
#> 4 MH4 1 0 MH4
#> 5 MH1 4 (100%) 0 MH1
#> 6 MH2 3 (75%) 0 MH2
#> 7 MH3 2 (50%) 0 MH3
#> 8 MH4 1 (25%) 0 MH4
#> 9 MH1 4 (100%) 0 MH1
#> 10 MH2 3 (75.0%) 0 MH2
#> 11 MH3 2 (50.0%) 0 MH3
#> 12 MH4 1 (25.0%) 0 MH4
#> 13 MH1 4/4 (100.0%) 0 MH1
#> 14 MH2 3/4 (75.0%) 0 MH2
#> 15 MH3 2/4 (50.0%) 0 MH3
#> 16 MH4 1/4 (25.0%) 0 MH4