The basic functions offered by {cards} can create a wide variety of ARDs. However, sometimes we may need to include the outputs from more complicated statistical methods in our ARDs. In this article we’ll look at a few different ways to implement the output from these statistical methods.
{cardx}
The {cardx} package is an extension of {cards}. The idea is that {cards} provides the core functions to create ARDs, while {cardx} contains a large number of extensions that implement various, commonly used statistical methods. There are a large number of extensions for a wide variety of methods, including (but not limited to):
- Regression models
- ANOVA
- Chi-squared Test
- t-test
- LS Mean Difference
- Survival Estimates and Differences
When looking to include the output from a statistical method your first port of call should be to see if it has already been implemented in {cardx}. You can find the full list of available functions here.
Consider a simple t-test comparing the mean age (AGE
)
across two treatments arms (ARM
). In {cardx} we have the
function cardx::ard_stats_t_test()
cards::ADSL |>
dplyr::filter(ARM %in% c("Xanomeline High Dose", "Xanomeline Low Dose")) |>
cardx::ard_stats_t_test(by = ARM, variables = AGE)
#> {cards} data frame: 14 x 9
#> group1 variable context stat_name stat_label stat
#> 1 ARM AGE stats_t_… estimate Mean Dif… -1.286
#> 2 ARM AGE stats_t_… estimate1 Group 1 … 74.381
#> 3 ARM AGE stats_t_… estimate2 Group 2 … 75.667
#> 4 ARM AGE stats_t_… statistic t Statis… -1.03
#> 5 ARM AGE stats_t_… p.value p-value 0.304
#> 6 ARM AGE stats_t_… parameter Degrees … 165.595
#> 7 ARM AGE stats_t_… conf.low CI Lower… -3.75
#> 8 ARM AGE stats_t_… conf.high CI Upper… 1.179
#> 9 ARM AGE stats_t_… method method Welch Tw…
#> 10 ARM AGE stats_t_… alternative alternat… two.sided
#> 11 ARM AGE stats_t_… mu H0 Mean 0
#> 12 ARM AGE stats_t_… paired Paired t… FALSE
#> 13 ARM AGE stats_t_… var.equal Equal Va… FALSE
#> 14 ARM AGE stats_t_… conf.level CI Confi… 0.95
#> ℹ 3 more variables: fmt_fn, warning, error
In the output, we see the outputs from the t-test; the mean
difference, confidence interval limits and p-value. It’s also useful to
see the functions inputs; for example, we can see that we did not use
equal variances as the stat
is FALSE
for
stat_name
var.equal
. This is useful for
re-use, if we need to run the test again we can use the ARD to see what
options we need to use to recreate the result.
Create a new ard_*()
function
But what do we do if the statistical method that we want to use hasn’t been implemented already in {cardx}?
Implementing a new function to create an ARD for a statistical method is often simple; all we need to do is write a function that outputs the results as a named list!
We’ll first look at how broom::tidy()
can make this even
easier for us, then provide an example on how to implement from
scratch.
Using broom::tidy()
Typically, a user will pass a function which returns a scalar value
in the cards::ard_continuous(statistic)
argument. However,
the argument also allows for functions that return named lists, where
the names from the list will then be used as the statistic names. Since
data frames or tibbles are just named lists with a little more
formatting, we can pass a function to the statistic
argument which returns a single row data frame or tibble and the
behavior will be the same. Each column of the table gives the name of
the statistic via its column name, and then the corresponding value.
Commonly used statistical methods outputs are able to be passed
through broom::tidy()
, which will convert the output into a
tibble. We can then pass the output of broom::tidy()
through to the cards::ard_continuous(statistic)
argument of
the ARD function we wish to use, this leads to an ARD output like we see
in the above example, where we have one row per relevant input or output
from the statistical method.
Please note that we can only use the broom::tidy()
output directly, as we see in the below example, when the output is a
tibble with a single row.
Let’s extend our t-test example from above. This time we want to
carry out a one-sample t-test. We can just pass the code to carry out
the one-sample t-test and pass the output through
broom::tidy()
to the statistic
argument like
so:
cards::ADSL |>
dplyr::filter(ARM %in% c("Xanomeline High Dose", "Xanomeline Low Dose")) |>
cards::ard_continuous(
variables = AGE,
statistic = everything() ~ list(t_test = \(x) t.test(x) |> broom::tidy())
) |>
dplyr::mutate(context = "t_test_one_sample")
#> {cards} data frame: 8 x 8
#> variable context stat_name stat_label stat fmt_fn
#> 1 AGE t_test_o… estimate estimate 75.024 1
#> 2 AGE t_test_o… statistic statistic 120.2 1
#> 3 AGE t_test_o… p.value p.value 0 1
#> 4 AGE t_test_o… parameter parameter 167 1
#> 5 AGE t_test_o… conf.low conf.low 73.792 1
#> 6 AGE t_test_o… conf.high conf.high 76.256 1
#> 7 AGE t_test_o… method method One Samp… <fn>
#> 8 AGE t_test_o… alternative alternat… two.sided <fn>
#> ℹ 2 more variables: warning, error
In the above chunk of code, if we focus on what we pass to the
statistic
argument:
-
everything()
means we want to run this test on all columns passed to thevariables
argument. -
t.test(x)
is the function which carries out the statistical test, in this case with the default arguments. - We pipe the output of
t.test(x)
intobroom::tidy()
which converts the output of the test into a tibble (which, remember, is just a named list!) - The function we’ve defined
(
\(x) t.test(x) |> broom::tidy()
) itself needs to be included in a named list, where here we’ve chosen to use the label't_test'
.
Over 100 different statistical methods implemented in R are able to
be ‘tidied’ using broom::tidy()
. However, the method you
aim to use might not be, or the current broom::tidy()
implementation might not contain the information that you need to be in
your ARD. In that case we’ll have to format the output ourselves.
Without broom::tidy()
As mentioned above, we need to define a function which carries out our required statistical method and outputs a named list of the information we wish to include in the ARD.
As an example, let’s write a function which carries out a Wilcoxon
signed rank test over one variable using the function
wilcox.test
. As an output we just want to record the method
and the p-value.
wilcox_one_var <- \(x) wilcox.test(x)[c("method", "p.value")]
Let’s now use this function when creating an ARD with {cards}.
Remember we just need the statistic to be a named list, so we’ll call
our function inside a named list. We also don’t need to specify any
arguments, in this case it will pick up that the one variable
x
corresponds to the data we are testing, in this case
AGE
for the individual treatment arms.
cards::ADSL |>
cards::ard_continuous(
variables = AGE,
by = ARM,
statistic = ~ list(wilcox = wilcox_one_var)
)
#> {cards} data frame: 6 x 10
#> group1 group1_level variable stat_name stat_label stat
#> 1 ARM Placebo AGE method method Wilcoxon…
#> 2 ARM Placebo AGE p.value p.value 0
#> 3 ARM Xanomeli… AGE method method Wilcoxon…
#> 4 ARM Xanomeli… AGE p.value p.value 0
#> 5 ARM Xanomeli… AGE method method Wilcoxon…
#> 6 ARM Xanomeli… AGE p.value p.value 0
#> ℹ 4 more variables: context, fmt_fn, warning, error
We see here that we get an output of 8 rows, 2 rows (one for the method, and one for the p-value) for each of the 4 treatment arms.
Complex inputs
The examples above are great to illustrate a simple case, but it is perhaps a rare scenario where we are implementing a statistical method with a single vector as its input. How would we update the code above if we need to implement a two-sample t-test?
The cards::ard_complex()
is similar to
cards::ard_continuous()
, but allows for more complex inputs
in the function passed in the statistic
argument. In
cards::ard_continuous()
, the functions passed must accept a
single vector, e.g. \(x) t.test(x)
. But in
cards::ard_complex()
, in addition to the vector being
passed, the data
subset, the full_data
,
character by
, and character strata
are also
passed (see the cards::ard_complex()
for a full
description). Your function does not need to utilize each of these
elements, but each will be passed to your function. As a
result, we recommend your function accept the triple dots to handle
unused arguments.
An implementation of a two-sample t-test may look like this:
ttest_two_sample <- \(x, data, ...) t.test(x ~ data[["ARM"]]) |> broom::tidy()
cards::ADSL |>
dplyr::filter(ARM %in% c("Xanomeline High Dose", "Xanomeline Low Dose")) |>
cards::ard_complex(
variables = AGE,
statistic = everything() ~ list(t_test = ttest_two_sample)
) |>
dplyr::mutate(context = "t_test_two_sample")
#> {cards} data frame: 10 x 8
#> variable context stat_name stat_label stat fmt_fn
#> 1 AGE t_test_t… estimate estimate -1.286 1
#> 2 AGE t_test_t… estimate1 estimate1 74.381 1
#> 3 AGE t_test_t… estimate2 estimate2 75.667 1
#> 4 AGE t_test_t… statistic statistic -1.03 1
#> 5 AGE t_test_t… p.value p.value 0.304 1
#> 6 AGE t_test_t… parameter parameter 165.595 1
#> 7 AGE t_test_t… conf.low conf.low -3.75 1
#> 8 AGE t_test_t… conf.high conf.high 1.179 1
#> 9 AGE t_test_t… method method Welch Tw… <fn>
#> 10 AGE t_test_t… alternative alternat… two.sided <fn>
#> ℹ 2 more variables: warning, error
Handling errors
Let’s consider what happens when we encounter an error in our statistical method.
wilcox_one_var_error <- function(x) {
stop("AN ERROR!")
wilcox.test(x)[c("method", "p.value")]
}
cards::ADSL |>
cards::ard_continuous(
variables = AGE,
by = ARM,
statistic = ~ list(wilcox = wilcox_one_var_error)
)
#> {cards} data frame: 3 x 10
#> group1 group1_level variable stat_name stat_label stat error
#> 1 ARM Placebo AGE wilcox wilcox AN ERROR!
#> 2 ARM Xanomeli… AGE wilcox wilcox AN ERROR!
#> 3 ARM Xanomeli… AGE wilcox wilcox AN ERROR!
#> ℹ 3 more variables: context, fmt_fn, warning
In the output we see that we only get 4 rows of output, the error has
been stored in the error
column but stat_name
and stat_label
now just take the list name of “wilcox” that
we define in the statistic argument. This could have unintended effects
in downstream code, we may be relying on the stat_name
and
stat_label
having values of “method” and “p-value”, or just
that the output has 2 rows per treatment arm.
To handle this we can specify the expected results from our function, so that even if we encounter an error during the code run we can be assured that the output will be of a consistent format so as not to impact downstream code.
Here’s an example of how to specify the expected output using
cards::as_cards_fn()
:
wilcox_one_var_error <- cards::as_cards_fn(
wilcox_one_var_error,
stat_names = c("method", "p.value")
)
cards::ADSL |>
cards::ard_continuous(
variables = AGE,
by = ARM,
statistic = ~ list(wilcox = wilcox_one_var_error)
)
#> {cards} data frame: 6 x 10
#> group1 group1_level variable stat_name stat_label stat error
#> 1 ARM Placebo AGE method method AN ERROR!
#> 2 ARM Placebo AGE p.value p.value AN ERROR!
#> 3 ARM Xanomeli… AGE method method AN ERROR!
#> 4 ARM Xanomeli… AGE p.value p.value AN ERROR!
#> 5 ARM Xanomeli… AGE method method AN ERROR!
#> 6 ARM Xanomeli… AGE p.value p.value AN ERROR!
#> ℹ 3 more variables: context, fmt_fn, warning
Our function becomes the first argument to
cards::as_cards_fn()
, then the second argument is
stat_names
where we specify the expected names of the
output list.
In the output shown here, the error
column is still
populated with the error. However, now we have the expected 8 rows and
we can see that the stat_name
and stat_label
match the values specified in the stat_names
argument in
the as_cards_fn()
—helping us to avoid problems in code that
relies on this output.
Formalizing Your Function
If you are writing function that will be used multiple times, for
example adding it to a package, you may want to include the function’s
arguments in the returned ARD. Returning the arguments improves the
traceability of the ARD, and requires combining the function’s default
arguments (the formals) and any argument passed by the user. The
ard_formals()
function helps combine these results.
In the example below, we expand out one-sample t-test example with
the argument values passed to t.test(...)
my_ard_one_sample_t_test <- function(data, variable, ...) {
# define function to calculate results
t_test_fun <- function(x) t.test(x, ...) |> broom::tidy()
t_test_fun <-
cards::as_cards_fn(
t_test_fun,
c(
"estimate", "statistic", "p.value", "parameter",
"conf.low", "conf.high", "method", "alternative"
)
)
# create the ARD of results
ard_results <-
cards::ard_continuous(
data = data,
variables = {{ variable }},
statistic = everything() ~ list(t_test = \(x) t.test(x, ...) |> broom::tidy())
) |>
dplyr::mutate(context = "t_test_one_sample")
# ard of argument values
ard_arguments <-
cards::ard_formals(
fun = asNamespace("stats")[["t.test.default"]],
arg_names = c("mu", "paired", "var.equal", "conf.level"),
passed_args = rlang::dots_list(...)
)
# combine ARDs and fill arguments with missing information
dplyr::bind_rows(ard_results, ard_arguments) |>
dplyr::mutate(dplyr::across(c(variable, context), dplyr::first))
}
cards::ADSL |>
dplyr::filter(ARM %in% c("Xanomeline High Dose", "Xanomeline Low Dose")) |>
my_ard_one_sample_t_test(
variable = "AGE",
var.equal = TRUE,
conf.level = 0.90
)
#> {cards} data frame: 12 x 8
#> variable context stat_name stat_label stat fmt_fn
#> 1 AGE t_test_o… estimate estimate 75.024 1
#> 2 AGE t_test_o… statistic statistic 120.2 1
#> 3 AGE t_test_o… p.value p.value 0 1
#> 4 AGE t_test_o… parameter parameter 167 1
#> 5 AGE t_test_o… conf.low conf.low 73.991 1
#> 6 AGE t_test_o… conf.high conf.high 76.056 1
#> 7 AGE t_test_o… method method One Samp… <fn>
#> 8 AGE t_test_o… alternative alternat… two.sided <fn>
#> 9 AGE t_test_o… mu mu 0 NULL
#> 10 AGE t_test_o… paired paired FALSE NULL
#> 11 AGE t_test_o… var.equal var.equal TRUE NULL
#> 12 AGE t_test_o… conf.level conf.lev… 0.9 NULL
#> ℹ 2 more variables: warning, error