Stacked Hierarchical ARD Statistics — ard_stack

Use these functions to calculate multiple summaries of nested or hierarchical data in a single call.

ard_stack_hierarchical(): Calculates rates of events (e.g. adverse events) utilizing the denominator and id arguments to identify the rows in data to include in each rate calculation.
ard_stack_hierarchical_count(): Calculates counts of events utilizing all rows for each tabulation.

Usage

ard_stack_hierarchical(
  data,
  variables,
  by = dplyr::group_vars(data),
  id,
  denominator,
  include = everything(),
  statistic = everything() ~ c("n", "N", "p"),
  overall = FALSE,
  over_variables = FALSE,
  attributes = FALSE,
  total_n = FALSE,
  shuffle = FALSE
)

ard_stack_hierarchical_count(
  data,
  variables,
  by = dplyr::group_vars(data),
  denominator = NULL,
  include = everything(),
  overall = FALSE,
  over_variables = FALSE,
  attributes = FALSE,
  total_n = FALSE,
  shuffle = FALSE
)

Arguments

data

(data.frame)
a data frame

variables

(tidy-select)
Specifies the nested/hierarchical structure of the data. The variables that are specified here and in the include argument will have summary statistics calculated.

by

(tidy-select)
variables to perform tabulations by. All combinations of the variables specified here appear in results. Default is dplyr::group_vars(data).

id

(tidy-select)
argument used to subset data to identify rows in data to calculate event rates in ard_stack_hierarchical(). See details below.

denominator

(data.frame, integer)
used to define the denominator and enhance the output. The argument is required for ard_stack_hierarchical() and optional for ard_stack_hierarchical_count().

the univariate tabulations of the by variables are calculated with denominator, when a data frame is passed, e.g. tabulation of the treatment assignment counts that may appear in the header of a table.
the denominator argument must be specified when id is used to calculate the event rates.
if total_n=TRUE, the denominator argument is used to return the total N

include

(tidy-select)
Specify the subset a columns indicated in the variables argument for which summary statistics will be returned. Default is everything().

statistic

(formula-list-selector)
a named list, a list of formulas, or a single formula where the list element one or more of c("n", "N", "p", "n_cum", "p_cum") (on the RHS of a formula).

overall

(scalar logical)
logical indicating whether overall statistics should be calculated (i.e. repeat the operations with by=NULL in most cases, see below for details). Default is FALSE.

over_variables

(scalar logical)
logical indicating whether summary statistics should be calculated over or across the columns listed in the variables argument. Default is FALSE.

attributes

(scalar logical)
logical indicating whether to include the results of ard_attributes() for all variables represented in the ARD. Default is FALSE.

total_n

(scalar logical)
logical indicating whether to include of ard_total_n(denominator) in the returned ARD.

shuffle

(scalar logical)
logical indicating whether to perform shuffle_ard() on the final result. Default is FALSE.

Value

an ARD data frame of class 'card'

Subsetting Data for Rate Calculations

To calculate event rates, the ard_stack_hierarchical() function identifies rows to include in the calculation. First, the primary data frame is sorted by the columns identified in the id, by, and variables arguments.

As the function cycles over the variables specified in the variables argument, the data frame is grouped by id, intersect(by, names(denominator)), and variables utilizing the last row within each of the groups.

For example, if the call is ard_stack_hierarchical(data = ADAE, variables = c(AESOC, AEDECOD), id = USUBJID), then we'd first subset ADAE to be one row within the grouping c(USUBJID, AESOC, AEDECOD) to calculate the event rates in 'AEDECOD'. We'd then repeat and subset ADAE to be one row within the grouping c(USUBJID, AESOC) to calculate the event rates in 'AESOC'.

Overall Argument

When we set overall=TRUE, we wish to re-run our calculations removing the stratifying columns. For example, if we ran the code below, we results would include results with the code chunk being re-run with by=NULL.

ard_stack_hierarchical(
  data = ADAE,
  variables = c(AESOC, AEDECOD),
  by = TRTA,
  denominator = ADSL |> dplyr::rename(TRTA = ARM),
  overall = TRUE
)

But there is another case to be aware of: when the by argument includes columns that are not present in the denominator, for example when tabulating results by AE grade or severity in addition to treatment assignment. In the example below, we're tabulating results by treatment assignment and AE severity. By specifying overall=TRUE, we will re-run the to get results with by = AESEV and again with by = NULL.

ard_stack_hierarchical(
  data = ADAE,
  variables = c(AESOC, AEDECOD),
  by = c(TRTA, AESEV),
  denominator = ADSL |> dplyr::rename(TRTA = ARM),
  overall = TRUE
)

Examples

ard_stack_hierarchical(
  ADAE,
  variables = c(AESOC, AEDECOD),
  by = TRTA,
  denominator = ADSL |> dplyr::rename(TRTA = ARM),
  id = USUBJID
)
#> {cards} data frame: 2394 x 13
#>    group1 group1_level group2 group2_level variable variable_level stat_name
#> 1    TRTA      Placebo   <NA>                 AESOC      CARDIAC …         n
#> 2    TRTA      Placebo   <NA>                 AESOC      CARDIAC …         N
#> 3    TRTA      Placebo   <NA>                 AESOC      CARDIAC …         p
#> 4    TRTA      Placebo   <NA>                 AESOC      CONGENIT…         n
#> 5    TRTA      Placebo   <NA>                 AESOC      CONGENIT…         N
#> 6    TRTA      Placebo   <NA>                 AESOC      CONGENIT…         p
#> 7    TRTA      Placebo   <NA>                 AESOC      EAR AND …         n
#> 8    TRTA      Placebo   <NA>                 AESOC      EAR AND …         N
#> 9    TRTA      Placebo   <NA>                 AESOC      EAR AND …         p
#> 10   TRTA      Placebo   <NA>                 AESOC      EYE DISO…         n
#>    stat_label  stat
#> 1           n    13
#> 2           N    86
#> 3           % 0.151
#> 4           n     0
#> 5           N    86
#> 6           %     0
#> 7           n     1
#> 8           N    86
#> 9           % 0.012
#> 10          n     4
#> ℹ 2384 more rows
#> ℹ Use `print(n = ...)` to see more rows
#> ℹ 4 more variables: context, fmt_fn, warning, error

ard_stack_hierarchical_count(
  ADAE,
  variables = c(AESOC, AEDECOD),
  by = TRTA,
  denominator = ADSL |> dplyr::rename(TRTA = ARM)
)
#> {cards} data frame: 804 x 13
#>    group1 group1_level group2 group2_level variable variable_level stat_name
#> 1    TRTA      Placebo   <NA>                 AESOC      CARDIAC …         n
#> 2    TRTA      Placebo   <NA>                 AESOC      CONGENIT…         n
#> 3    TRTA      Placebo   <NA>                 AESOC      EAR AND …         n
#> 4    TRTA      Placebo   <NA>                 AESOC      EYE DISO…         n
#> 5    TRTA      Placebo   <NA>                 AESOC      GASTROIN…         n
#> 6    TRTA      Placebo   <NA>                 AESOC      GENERAL …         n
#> 7    TRTA      Placebo   <NA>                 AESOC      HEPATOBI…         n
#> 8    TRTA      Placebo   <NA>                 AESOC      IMMUNE S…         n
#> 9    TRTA      Placebo   <NA>                 AESOC      INFECTIO…         n
#> 10   TRTA      Placebo   <NA>                 AESOC      INJURY, …         n
#>    stat_label stat
#> 1           n   27
#> 2           n    0
#> 3           n    2
#> 4           n    8
#> 5           n   26
#> 6           n   48
#> 7           n    1
#> 8           n    0
#> 9           n   35
#> 10          n    9
#> ℹ 794 more rows
#> ℹ Use `print(n = ...)` to see more rows
#> ℹ 4 more variables: context, fmt_fn, warning, error