vignettes/articles/hierarchical-and-nested-tables.Rmd
hierarchical-and-nested-tables.Rmd
Using functions from the {gtsummary} package we can produce nested tables and hierarchical tables commonly used in clinical reporting.
tbl_hierarchical()
To begin, let’s create a basic tbl_hierarchical()
adverse event table summarizing adverse events (AEs) within System Organ
Class (SOC).
ADAE_subset <- cards::ADAE |>
dplyr::filter(AEBODSYS %in% c("SKIN AND SUBCUTANEOUS TISSUE DISORDERS", "EAR AND LABYRINTH DISORDERS")) |>
dplyr::filter(.by = AEBODSYS, dplyr::row_number() < 20)
tbl <-
tbl_hierarchical(
data = ADAE_subset,
variables = c(AEBODSYS, AEDECOD),
by = TRTA,
denominator = cards::ADSL,
id = USUBJID,
overall_row = TRUE
)
tbl
Body System or Organ Class Dictionary-Derived Term |
Placebo N = 861 |
Xanomeline High Dose N = 841 |
Xanomeline Low Dose N = 841 |
---|---|---|---|
Number of patients with event | 3 (3.5%) | 3 (3.6%) | 5 (6.0%) |
EAR AND LABYRINTH DISORDERS | 1 (1.2%) | 1 (1.2%) | 2 (2.4%) |
CERUMEN IMPACTION | 0 (0%) | 0 (0%) | 1 (1.2%) |
EAR PAIN | 1 (1.2%) | 0 (0%) | 0 (0%) |
TINNITUS | 0 (0%) | 0 (0%) | 1 (1.2%) |
VERTIGO | 0 (0%) | 1 (1.2%) | 1 (1.2%) |
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | 2 (2.3%) | 2 (2.4%) | 3 (3.6%) |
ACTINIC KERATOSIS | 0 (0%) | 1 (1.2%) | 0 (0%) |
ERYTHEMA | 1 (1.2%) | 0 (0%) | 3 (3.6%) |
PRURITUS | 1 (1.2%) | 0 (0%) | 2 (2.4%) |
PRURITUS GENERALISED | 0 (0%) | 0 (0%) | 1 (1.2%) |
RASH | 0 (0%) | 1 (1.2%) | 0 (0%) |
1 n (%) |
The gtsummary::tbl_hierarchical()
function was created
to build hierarchy tables, which are most commonly seen when reporting
rates of adverse events. .
The variables
argument takes a vector of variable names,
where the first corresponds to the outermost hierarchy level and the
last corresponds to the innermost hierarchy level.
AE rates are calculated within each hierarchy for each level of the innermost variable.
By default, summary AE rates are also calculated on label rows at
each outer level of the hierarchy. This setting is controlled by the
include
argument, such that any variable
not
in include
will not include summary AE counts. For example,
the table above includes summaries for both AE and SOC, but had we
specified include = AEDECOD
the resulting table would
include rows for both AEs and SOCs, but only AE rows would have summary
statistics.
The by
parameter can be used to set a single variable on
which the table will be split column-wise. To include an overall row
where AE rates are calculated for the overall dataset at the top of the
table, set overall_row = TRUE
.
If multiple AEs are recorded for a subject, only one of these events
is counted once for that subject. Unique subjects are identified via the
variable supplied using the required id
argument.
The dataset specified via the required denominator
argument is used to calculate the denominators in percentage
calculations as well as any N
counts included in the table
header.
Note that only observed combinations of hierarchy levels are included in the table—if all event rates within a given row are zero the row will be excluded.
As is typical of most tbl_*()
functions in {gtsummary},
the tbl_hierarchical()
function also has the
statistic
, label
, and digits
arguments to further customize hierarchy tables. Valid statistics
include n
, N
, and p
. The
label
argument can specify the top-left labels for each
variable as well as the label used for the “overall” row (if
overall_row = TRUE
).
See below the same example with further customization applied.
tbl <-
tbl_hierarchical(
data = ADAE_subset,
variables = c(AEBODSYS, AEDECOD),
by = TRTA,
denominator = cards::ADSL,
id = USUBJID,
include = AEDECOD,
digits = everything() ~ list(p = 0),
overall_row = TRUE,
label = list(..ard_hierarchical_overall.. = "Any Adverse Event")
) |>
add_overall()
tbl
Body System or Organ Class Dictionary-Derived Term |
Overall N = 2541 |
Placebo N = 861 |
Xanomeline High Dose N = 841 |
Xanomeline Low Dose N = 841 |
---|---|---|---|---|
Any Adverse Event | 11 (4%) | 3 (3%) | 3 (4%) | 5 (6%) |
EAR AND LABYRINTH DISORDERS | ||||
CERUMEN IMPACTION | 1 (0%) | 0 (0%) | 0 (0%) | 1 (1%) |
EAR PAIN | 1 (0%) | 1 (1%) | 0 (0%) | 0 (0%) |
TINNITUS | 1 (0%) | 0 (0%) | 0 (0%) | 1 (1%) |
VERTIGO | 2 (1%) | 0 (0%) | 1 (1%) | 1 (1%) |
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | ||||
ACTINIC KERATOSIS | 1 (0%) | 0 (0%) | 1 (1%) | 0 (0%) |
ERYTHEMA | 4 (2%) | 1 (1%) | 0 (0%) | 3 (4%) |
PRURITUS | 3 (1%) | 1 (1%) | 0 (0%) | 2 (2%) |
PRURITUS GENERALISED | 1 (0%) | 0 (0%) | 0 (0%) | 1 (1%) |
RASH | 1 (0%) | 0 (0%) | 1 (1%) | 0 (0%) |
1 n (%) |
tbl_hierarchical_count()
Similar to the tbl_hierarchical()
function, the
tbl_hierarchical_count()
operates on nested data
structures. Instead of event rates this function calculates
event counts, with no option to specify an id
parameter. This means that events from every row of data
will be counted, regardless of whether they occur multiple times per
subject. If denominator
is specified, it is only used to
calculate N
values in the table header.
Only the n
statistic is available when using this
function.
ADAE_subset <- cards::ADAE |>
dplyr::filter(AEBODSYS %in% c("SKIN AND SUBCUTANEOUS TISSUE DISORDERS", "EAR AND LABYRINTH DISORDERS")) |>
dplyr::filter(.by = AEBODSYS, dplyr::row_number() < 20)
tbl <-
tbl_hierarchical_count(
data = ADAE_subset,
variables = c(AEBODSYS, AEDECOD),
by = TRTA,
denominator = cards::ADSL,
overall_row = TRUE
)
tbl
Body System or Organ Class Dictionary-Derived Term |
Placebo N = 861 |
Xanomeline High Dose N = 841 |
Xanomeline Low Dose N = 841 |
---|---|---|---|
Total number of events | 6 | 4 | 15 |
EAR AND LABYRINTH DISORDERS | 2 | 1 | 3 |
CERUMEN IMPACTION | 0 | 0 | 1 |
EAR PAIN | 2 | 0 | 0 |
TINNITUS | 0 | 0 | 1 |
VERTIGO | 0 | 1 | 1 |
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | 4 | 3 | 12 |
ACTINIC KERATOSIS | 0 | 1 | 0 |
ERYTHEMA | 3 | 0 | 5 |
PRURITUS | 1 | 0 | 3 |
PRURITUS GENERALISED | 0 | 0 | 4 |
RASH | 0 | 2 | 0 |
1 n |
In addition to the standard implementation of
tbl_hierarchical()
to create tables of event rates, this
function can also be used to generate tables of event rates by highest
severity.
This means that for each subject in data
only one
event—the event with highest severity—will be reported in the table. To
do so, the innermost hierarchy variable (the last variable in
variable
) must be converted to a factor variable. For
example, consider the following table:
ADAE_subset <- cards::ADAE |>
dplyr::filter(
AEBODSYS %in% c("SKIN AND SUBCUTANEOUS TISSUE DISORDERS", "EAR AND LABYRINTH DISORDERS", "CARDIAC DISORDERS")
) |>
dplyr::filter(.by = AEBODSYS, dplyr::row_number() < 20) |>
dplyr::mutate(AESEV = factor(AESEV, ordered = TRUE, levels = c("MILD", "MODERATE", "SEVERE")))
tbl_hierarchical(
data = ADAE_subset,
variables = c(AESOC, AESEV),
id = USUBJID,
denominator = cards::ADSL,
by = TRTA,
overall_row = TRUE,
label = list(AESEV = "Highest Severity")
)
#> ℹ Denominator set by "TRTA" column in `denominator` data
#> frame.
Primary System Organ Class Highest Severity |
Placebo N = 861 |
Xanomeline High Dose N = 841 |
Xanomeline Low Dose N = 841 |
---|---|---|---|
Number of patients with event | 6 (7.0%) | 8 (9.5%) | 5 (6.0%) |
CARDIAC DISORDERS | 4 (4.7%) | 5 (6.0%) | 1 (1.2%) |
MILD | 2 (2.3%) | 4 (4.8%) | 0 (0%) |
MODERATE | 1 (1.2%) | 1 (1.2%) | 1 (1.2%) |
SEVERE | 1 (1.2%) | 0 (0%) | 0 (0%) |
EAR AND LABYRINTH DISORDERS | 1 (1.2%) | 1 (1.2%) | 2 (2.4%) |
MILD | 1 (1.2%) | 0 (0%) | 0 (0%) |
MODERATE | 0 (0%) | 1 (1.2%) | 2 (2.4%) |
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | 2 (2.3%) | 2 (2.4%) | 3 (3.6%) |
MILD | 1 (1.2%) | 2 (2.4%) | 1 (1.2%) |
MODERATE | 1 (1.2%) | 0 (0%) | 2 (2.4%) |
1 n (%) |
Since each subject has exactly one event recorded for this type of table, the numbers within each hierarchy section will always add up to the numbers in the preceding summary row. For example, see in the table above that the total number of subjects with treatment Placebo in the CARDIAC DISORDERS section is 4, and the numbers within the section below, for all severity levels, also add up to 4.
tbl_ard_hierarchical()
In addition to the above two functions which use a data frame-first
approach to creating hierarchy tables, an ARD-first approach can be
taken using the gtsummary::tbl_ard_hierarchical()
function.
This function takes a hierarchical ARD object of class
"card"
and converts it to a hierarchical table.
Hierarchical ARDs can be constructed using the
cards::ard_stack_hierarchical()
(for event rates) and
cards::ard_stack_hierarchical_count()
(for event counts)
functions.
For example:
# Build the ARD
ard <-
cards::ard_stack_hierarchical(
data = ADAE_subset,
variables = c(AESOC, AETERM),
by = TRTA,
denominator = cards::ADSL,
id = USUBJID
)
# Build the table from the ARD
tbl <-
tbl_ard_hierarchical(
cards = ard,
variables = c(AESOC, AETERM),
by = TRTA,
label = list(AESOC = "Body System or Organ Class", AETERM = "Dictionary-Derived Term")
)
tbl
Body System or Organ Class Dictionary-Derived Term |
Placebo N = 861 |
Xanomeline High Dose N = 841 |
Xanomeline Low Dose N = 841 |
---|---|---|---|
CARDIAC DISORDERS | 4 (4.7%) | 5 (6.0%) | 1 (1.2%) |
ATRIAL FIBRILLATION | 0 (0.0%) | 1 (1.2%) | 0 (0.0%) |
ATRIAL FLUTTER | 0 (0.0%) | 1 (1.2%) | 0 (0.0%) |
ATRIOVENTRICULAR BLOCK SECOND DEGREE | 2 (2.3%) | 1 (1.2%) | 0 (0.0%) |
BUNDLE BRANCH BLOCK LEFT | 1 (1.2%) | 0 (0.0%) | 0 (0.0%) |
CARDIAC DISORDER | 0 (0.0%) | 1 (1.2%) | 0 (0.0%) |
MYOCARDIAL INFARCTION | 1 (1.2%) | 1 (1.2%) | 0 (0.0%) |
PALPITATIONS | 0 (0.0%) | 0 (0.0%) | 1 (1.2%) |
SINUS BRADYCARDIA | 0 (0.0%) | 2 (2.4%) | 0 (0.0%) |
TACHYCARDIA | 1 (1.2%) | 0 (0.0%) | 0 (0.0%) |
WOLFF-PARKINSON-WHITE SYNDROME | 0 (0.0%) | 0 (0.0%) | 1 (1.2%) |
EAR AND LABYRINTH DISORDERS | 1 (1.2%) | 1 (1.2%) | 2 (2.4%) |
CERUMEN IMPACTION | 0 (0.0%) | 0 (0.0%) | 1 (1.2%) |
EAR PAIN | 1 (1.2%) | 0 (0.0%) | 0 (0.0%) |
TINNITUS | 0 (0.0%) | 0 (0.0%) | 1 (1.2%) |
VERTIGO | 0 (0.0%) | 1 (1.2%) | 1 (1.2%) |
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | 2 (2.3%) | 2 (2.4%) | 3 (3.6%) |
ACTINIC KERATOSIS | 0 (0.0%) | 1 (1.2%) | 0 (0.0%) |
ERYTHEMA | 1 (1.2%) | 0 (0.0%) | 3 (3.6%) |
PRURITUS | 1 (1.2%) | 0 (0.0%) | 2 (2.4%) |
PRURITUS GENERALISED | 0 (0.0%) | 0 (0.0%) | 1 (1.2%) |
RASH | 0 (0.0%) | 1 (1.2%) | 0 (0.0%) |
1 n (%) |
We can stack hierarchical table results with results from another
table if additional rows are needed. For example, if we wanted to add an
“Any AE” row which calculates the number of unique subjects for which
any AE was observed to the previous hierarchical table, we could do so
as follows using gtsummary::tbl_stack()
:
# Calculate subjects with any AE across TRTA and create a single-line table for this result
any_ae <-
ADAE_subset |>
distinct(USUBJID, TRTA) |>
mutate(AE_ANY = TRUE) |>
cards::ard_tabulate_value(
by = TRTA,
variables = AE_ANY,
denominator = cards::ADSL
) |>
tbl_ard_summary(by = TRTA, label = list(AE_ANY = "Any AE"))
any_ae
Characteristic | Placebo1 | Xanomeline High Dose1 | Xanomeline Low Dose1 |
---|---|---|---|
Any AE | 6 (7.0%) | 8 (9.5%) | 5 (6.0%) |
1 n (%) |
The “Any AE” table header differs from the previous table header as
it does not include labels or the column N values, so we specify
attr_order = 2
in the following tbl_stack()
call to keep the table header from the second table in the stack and not
the first (and quiet = TRUE
to silence the message about
differing headers).
# Stack tables so that the "Any AE" row is added at the top of the previous table
tbl_stack(
tbls = list(any_ae, tbl),
attr_order = 2,
quiet = TRUE
)
Body System or Organ Class Dictionary-Derived Term |
Placebo N = 861 |
Xanomeline High Dose N = 841 |
Xanomeline Low Dose N = 841 |
---|---|---|---|
Any AE | 6 (7.0%) | 8 (9.5%) | 5 (6.0%) |
CARDIAC DISORDERS | 4 (4.7%) | 5 (6.0%) | 1 (1.2%) |
ATRIAL FIBRILLATION | 0 (0.0%) | 1 (1.2%) | 0 (0.0%) |
ATRIAL FLUTTER | 0 (0.0%) | 1 (1.2%) | 0 (0.0%) |
ATRIOVENTRICULAR BLOCK SECOND DEGREE | 2 (2.3%) | 1 (1.2%) | 0 (0.0%) |
BUNDLE BRANCH BLOCK LEFT | 1 (1.2%) | 0 (0.0%) | 0 (0.0%) |
CARDIAC DISORDER | 0 (0.0%) | 1 (1.2%) | 0 (0.0%) |
MYOCARDIAL INFARCTION | 1 (1.2%) | 1 (1.2%) | 0 (0.0%) |
PALPITATIONS | 0 (0.0%) | 0 (0.0%) | 1 (1.2%) |
SINUS BRADYCARDIA | 0 (0.0%) | 2 (2.4%) | 0 (0.0%) |
TACHYCARDIA | 1 (1.2%) | 0 (0.0%) | 0 (0.0%) |
WOLFF-PARKINSON-WHITE SYNDROME | 0 (0.0%) | 0 (0.0%) | 1 (1.2%) |
EAR AND LABYRINTH DISORDERS | 1 (1.2%) | 1 (1.2%) | 2 (2.4%) |
CERUMEN IMPACTION | 0 (0.0%) | 0 (0.0%) | 1 (1.2%) |
EAR PAIN | 1 (1.2%) | 0 (0.0%) | 0 (0.0%) |
TINNITUS | 0 (0.0%) | 0 (0.0%) | 1 (1.2%) |
VERTIGO | 0 (0.0%) | 1 (1.2%) | 1 (1.2%) |
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | 2 (2.3%) | 2 (2.4%) | 3 (3.6%) |
ACTINIC KERATOSIS | 0 (0.0%) | 1 (1.2%) | 0 (0.0%) |
ERYTHEMA | 1 (1.2%) | 0 (0.0%) | 3 (3.6%) |
PRURITUS | 1 (1.2%) | 0 (0.0%) | 2 (2.4%) |
PRURITUS GENERALISED | 0 (0.0%) | 0 (0.0%) | 1 (1.2%) |
RASH | 0 (0.0%) | 1 (1.2%) | 0 (0.0%) |
1 n (%) |
Hierarchical tables can be further customized after creation by
sorting and filtering them using the
gtsummary::sort_hierarchical()
and
gtsummary::filter_hierarchical()
functions,
respectively.
By default, hierarchical tables are sorted alphanumerically at each level of the hierarchy.
By applying the sort_hierarchical()
function to an
existing hierarchical table the sort order can be changed to
“descending” - sorting rows by descending event rate/count sum at each
level of the hierarchy such that the most prevalent events will be
listed first.
In addition to sorting, hierarchical tables can also be filtered
using filter_hierarchical()
, which uses a given expression
to filter out rows of the table.
For examples of possible filters please see the documentation for the
filter_hierarchical()
function.
Filtering is only applied at the innermost hierarchy level, with all
summary rows from outer hierarchy levels kept. If all rows of a section
are filtered out then the summary row for that section is also removed
unless keep_empty = TRUE
is specified.
If the table contains an overall column then this column is ignored when filtering.
See the following example where a hierarchy table is sorted in descending order and filtered so that only rows with more than one event recorded across all treatment groups are kept.
tbl <-
tbl_hierarchical(
data = ADAE_subset,
variables = c(AEBODSYS, AEDECOD),
by = TRTA,
denominator = cards::ADSL,
id = USUBJID,
overall_row = TRUE
) |>
add_overall()
tbl |>
sort_hierarchical("descending") |>
filter_hierarchical(filter = sum(n) > 1)
Body System or Organ Class Dictionary-Derived Term |
Overall N = 2541 |
Placebo N = 861 |
Xanomeline High Dose N = 841 |
Xanomeline Low Dose N = 841 |
---|---|---|---|---|
Number of patients with event | 19 (7.5%) | 6 (7.0%) | 8 (9.5%) | 5 (6.0%) |
CARDIAC DISORDERS | 10 (3.9%) | 4 (4.7%) | 5 (6.0%) | 1 (1.2%) |
ATRIOVENTRICULAR BLOCK SECOND DEGREE | 3 (1.2%) | 2 (2.3%) | 1 (1.2%) | 0 (0%) |
MYOCARDIAL INFARCTION | 2 (0.8%) | 1 (1.2%) | 1 (1.2%) | 0 (0%) |
SINUS BRADYCARDIA | 2 (0.8%) | 0 (0%) | 2 (2.4%) | 0 (0%) |
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | 7 (2.8%) | 2 (2.3%) | 2 (2.4%) | 3 (3.6%) |
ERYTHEMA | 4 (1.6%) | 1 (1.2%) | 0 (0%) | 3 (3.6%) |
PRURITUS | 3 (1.2%) | 1 (1.2%) | 0 (0%) | 2 (2.4%) |
EAR AND LABYRINTH DISORDERS | 4 (1.6%) | 1 (1.2%) | 1 (1.2%) | 2 (2.4%) |
VERTIGO | 2 (0.8%) | 0 (0%) | 1 (1.2%) | 1 (1.2%) |
1 n (%) |
tbl_strata_nested_stack()
Oftentimes in clinical trial reporting we want a similar nested structure to hierarchical tables but don’t want to calculate event rates or counts.
For these calculations, we use the
gtsummary::tbl_strata_nested_stack()
function so that
instead of treating the variables provided as a hierarchy they are used
as nested stratifying variables. Stratified sections are then stacked to
produce the full table.
This function is similar to tbl_strata()
but nests and
indents each stratified section. Take for example the standard
laboratory results table which reports summary statistics of the
AVAL
variable stratified by the PARAM
and
AVISIT
variables:
ADLB_subset <- pharmaverseadam::adlb |>
dplyr::filter(
PARAMCD %in% c("ALB", "ALKPH"),
AVISIT %in% c("Baseline", "Week 2", "Week 4")
)
tbl_strata_nested_stack(
ADLB_subset,
strata = c(PARAM, AVISIT),
.tbl_fun = ~ .x |>
crane::tbl_roche_summary(
include = AVAL,
by = TRT01A,
nonmissing = "always",
type = list(AVAL = "continuous2"),
statistic = list(AVAL = c("{mean} ({sd})", "{median}", "{min} - {max}"))
) |>
modify_header(all_stat_cols() ~ "**{level}**")
)
Placebo | Xanomeline High Dose | Xanomeline Low Dose | |
---|---|---|---|
Albumin (g/L) | |||
Baseline | |||
Analysis Value | |||
n | 86 | 72 | 94 |
Mean (SD) | 39.84 (2.81) | 40.40 (2.74) | 39.74 (2.67) |
Median | 40.00 | 40.00 | 40.00 |
Min - Max | 32.00 - 46.00 | 35.00 - 49.00 | 32.00 - 46.00 |
Week 2 | |||
Analysis Value | |||
n | 83 | 70 | 88 |
Mean (SD) | 38.9 (3.1) | 39.0 (2.8) | 38.7 (3.1) |
Median | 39.0 | 39.0 | 39.0 |
Min - Max | 31.0 - 46.0 | 33.0 - 46.0 | 31.0 - 46.0 |
Week 4 | |||
Analysis Value | |||
n | 79 | 72 | 72 |
Mean (SD) | 38.8 (3.3) | 39.1 (3.0) | 38.6 (2.8) |
Median | 39.0 | 39.0 | 38.5 |
Min - Max | 29.0 - 47.0 | 30.0 - 48.0 | 31.0 - 45.0 |
Alkaline Phosphatase (U/L) | |||
Baseline | |||
Analysis Value | |||
n | 86 | 72 | 92 |
Mean (SD) | 78 (58) | 72 (41) | 72 (20) |
Median | 70 | 66 | 71 |
Min - Max | 28 - 565 | 33 - 386 | 31 - 155 |
Week 2 | |||
Analysis Value | |||
n | 84 | 70 | 88 |
Mean (SD) | 78 (70) | 72 (42) | 73 (22) |
Median | 67 | 66 | 69 |
Min - Max | 32 - 672 | 34 - 390 | 28 - 146 |
Week 4 | |||
Analysis Value | |||
n | 82 | 72 | 72 |
Mean (SD) | 78 (69) | 71 (41) | 73 (23) |
Median | 68 | 65 | 71 |
Min - Max | 31 - 657 | 34 - 382 | 26 - 166 |