DMT01 - Demographics

Demographics and Baseline Characteristics (DMT01) QC Workflow

# load libraries
library(cards)

1. Generate a table using {chevron}

Show the code

# Create a table using the chevron package
tlg_dmt01 <- chevron::dmt01_main(chevron::syn_data, summaryvars = c("AGE", "SEX"))
head(tlg_dmt01, n = 15)

              A: Drug X    B: Placebo   C: Combination   All Patients
                (N=15)       (N=15)         (N=15)          (N=45)   
—————————————————————————————————————————————————————————————————————
Age                                                                  
  n               15           15             15              45     
  Mean (SD)   31.3 (5.3)   35.1 (9.0)     36.6 (6.4)      34.3 (7.3) 
  Median         31.0         35.0           35.0            34.0    
  Min - Max    24 - 40      24 - 57        24 - 49         24 - 57   
Sex                                                                  
  n               15           15             15              45     
  F           12 (80.0%)   8 (53.3%)      10 (66.7%)      30 (66.7%) 
  M           3 (20.0%)    7 (46.7%)      5 (33.3%)       15 (33.3%)

2. Flatten the table into a data.frame

A {rtables} based output can be flattened into a data.frame using the as_results_df() function from the {rtables} package. The make_ard argument set to TRUE, will format the data similar to the output generated by the {cards} package.

rtables_result <- rtables::as_result_df(tlg_dmt01, make_ard = TRUE, add_tbl_str_decimals = FALSE)
rtables_result[1:10, c("group1_level", "variable", "variable_level", "stat_name", "stat")]

group1_level	variable	variable_level	stat_name	stat
A: Drug X	AGE	n	n	15.000000
A: Drug X	AGE	mean_sd	mean	31.333333
A: Drug X	AGE	mean_sd	sd	5.259911
A: Drug X	AGE	median	median	31.000000
A: Drug X	AGE	range	min	24.000000
A: Drug X	AGE	range	max	40.000000
A: Drug X	SEX	n.n	n	15.000000
A: Drug X	SEX	count_fraction.F	count	12.000000
A: Drug X	SEX	count_fraction.F	p	0.800000
A: Drug X	SEX	count_fraction.M	count	3.000000

3. Create a comparable ARD

Using the {cards} package, we stack the functions ard_continuous() for the continuous variables and ard_categorical() for categorical variables. The default statistics calculated for each of these data types are included - these can be adapted for bespoke analyses.

If any variable manipulation was done to the data prior to running the {citril}/{chevron}/{tern}/{rtables} commands, we suggest supplying the same data to these ARD functions, or running the same pre-processing steps to your data prior to creating ARDs to ensure variable names/levels match.

# build ARDs that calculate relevant statistics for continuous and categorical variables.
ard_result <-
  ard_stack(
    chevron::syn_data$adsl,
    ard_continuous(
      variables = c(AGE),
      statistic = ~ continuous_summary_fns(c("N", "mean", "sd", "median", "min", "max"))
    ),
    ard_categorical(variables = c(SEX), statistic = everything() ~ c("n", "p")),
    ard_missing(variables = c(SEX), statistic = everything() ~ c("N_obs")),
    .by = "ARM",
    .overall = TRUE
  ) |>
  apply_fmt_fn() |>
  unlist_ard_columns()

ard_result[1:10, c("group1_level", "variable", "variable_level", "stat_name", "stat")]

group1_level	variable	variable_level	stat_name	stat
A: Drug X	AGE	NA	N	15.000000
A: Drug X	AGE	NA	mean	31.333333
A: Drug X	AGE	NA	sd	5.259911
A: Drug X	AGE	NA	median	31.000000
A: Drug X	AGE	NA	min	24.000000
A: Drug X	AGE	NA	max	40.000000
A: Drug X	SEX	F	n	12.000000
A: Drug X	SEX	F	p	0.800000
A: Drug X	SEX	M	n	3.000000
A: Drug X	SEX	M	p	0.200000

4.Statistics comparison

{rtables} reformat

In order to compare the two data.frames programatically, some identifying variables must align to be used as “key columns”. Below are some data wrangling steps used to match the statistics for comparison. Note the {rtables} output:

Show the code

tail(rtables_result)

	group1	group1_level	variable	variable_level	variable_label	stat_name	stat
39	ARM	All Patients	AGE	range	Min - Max	max	57.0000000
40	ARM	All Patients	SEX	n.n	n	n	45.0000000
41	ARM	All Patients	SEX	count_fraction.F	F	count	30.0000000
42	ARM	All Patients	SEX	count_fraction.F	F	p	0.6666667
43	ARM	All Patients	SEX	count_fraction.M	M	count	15.0000000
44	ARM	All Patients	SEX	count_fraction.M	M	p	0.3333333

The variable_level leads with the statistic name, followed by "." and then the actual level that matches the variable_level in the ARD object. We will mutate the level to match the ARD object. Similarly, the total number of observations for a group is labelled "N" in the ARD object, while it is named "n" in the {rtables} object. The following manipulations are completed below:

Set the variable_level to NA in the {rtables} result for a continuous data summary (as variable levels don’t apply and is NULL in the ARD object).
Remove the stat_name (ie. “count”) before the variable level.
Convert the “n”:“N” and “count”:“n”.
Remove columns we know won’t be in the ARD data.frame for simplicity (ie.variable_label)

rtables_result <- rtables_result |>
  dplyr::mutate(
    variable_level = dplyr::case_when(
      variable == "AGE" & variable_level %in% c("mean_sd", "median", "range", "n") ~ NA_character_,
      TRUE ~ variable_level
    ),
    variable_level = sub("^[^.]*\\.", "", variable_level), # use variable_label
    stat_name = dplyr::recode(stat_name, "n" = "N", "count" = "n")
  ) |>
  dplyr::select(-c("variable_label"))

head(rtables_result, n = 10)

group1	group1_level	variable	variable_level	stat_name	stat
ARM	A: Drug X	AGE	NA	N	15.000000
ARM	A: Drug X	AGE	NA	mean	31.333333
ARM	A: Drug X	AGE	NA	sd	5.259911
ARM	A: Drug X	AGE	NA	median	31.000000
ARM	A: Drug X	AGE	NA	min	24.000000
ARM	A: Drug X	AGE	NA	max	40.000000
ARM	A: Drug X	SEX	n	N	15.000000
ARM	A: Drug X	SEX	F	n	12.000000
ARM	A: Drug X	SEX	F	p	0.800000
ARM	A: Drug X	SEX	M	n	3.000000

ARD reformat

A reformatting step is necessary for the ARD output to complete the comparison. We’ll add the string “ARM” to any NULL observations in the group1 column to match the {rtables} result and add the “All Patients” label to the group1_level.

ard_result <- ard_result |>
  dplyr::mutate(
    group1 = dplyr::coalesce(group1, "ARM"),
    group1_level = dplyr::coalesce(group1_level, "All Patients"),
    stat_name = dplyr::recode(stat_name, "N_obs" = "N")
  ) |>
  dplyr::select(c("group1_level", "group1", "variable", "variable_level", "stat_name", "stat"))

Note that the ARD result is larger than the {rtables} result. When using ard_stack with a listed by variable, a univariate analysis is run for that variable. Here, it is "ARM". We can remove those statistics as they are not in the rtables_result.

ard_result <- ard_result |>
  dplyr::filter(
    !((variable == "ARM")) | is.na(variable_level)
  ) |>
  dplyr::mutate(variable_level = dplyr::if_else(stat_name == "N" & is.na(variable_level) & variable != "AGE", "n", variable_level))

Compare programmatically

Here we propose using the {diffdf} package to compare the statistics produced by the two table engines. {diffdf} is designed to compare two data.frames and report any differences/inconsistencies to the user.

diffdf::diffdf(rtables_result,
  ard_result,
  keys = c("group1_level", "group1", "variable", "variable_level", "stat_name"),
  suppress_warnings = TRUE
)

No issues were found!

If there are any differences you wish to explore, the above code can be assigned to an object which will collect the reported differences (comparison based on key columns, see dplyr::anti_join()).