The analysis function estimate_proportion_diff() creates a layout element to estimate the difference in proportion
of responders within a studied population. The primary analysis variable, vars, is a logical variable indicating
whether a response has occurred for each record. See the method parameter for options of methods to use when
constructing the confidence interval of the proportion difference. A stratification variable can be supplied via the
strata element of the variables argument.
Usage
estimate_proportion_diff(
lyt,
vars,
variables = list(strata = NULL),
conf_level = 0.95,
method = c("waldcc", "wald", "cmh", "cmh_sato", "cmh_mn", "ha", "newcombe",
"newcombecc", "strat_newcombe", "strat_newcombecc"),
weights_method = "cmh",
var_labels = vars,
na_str = default_na_str(),
nested = TRUE,
show_labels = "hidden",
table_names = vars,
section_div = NA_character_,
...,
na_rm = TRUE,
.stats = c("diff", "diff_ci"),
.stat_names = NULL,
.formats = c(diff = "xx.x", diff_ci = "(xx.x, xx.x)"),
.labels = NULL,
.indent_mods = c(diff = 0L, diff_ci = 1L)
)
s_proportion_diff(
df,
.var,
.ref_group,
.in_ref_col,
variables = list(strata = NULL),
conf_level = 0.95,
method = c("waldcc", "wald", "cmh", "cmh_sato", "cmh_mn", "ha", "newcombe",
"newcombecc", "strat_newcombe", "strat_newcombecc"),
weights_method = "cmh",
...
)
a_proportion_diff(
df,
...,
.stats = NULL,
.stat_names = NULL,
.formats = NULL,
.labels = NULL,
.indent_mods = NULL
)Arguments
- lyt
(
PreDataTableLayouts)
layout that analyses will be added to.- vars
(
character)
variable names for the primary analysis variable to be iterated over.- variables
(named
listofstring)
list of additional analysis variables.- conf_level
(
proportion)
confidence level of the interval.- method
(
string)
the method used for the confidence interval estimation.- weights_method
(
string)
weights method. Can be either"cmh"or"heuristic"and directs the way weights are estimated.- var_labels
(
character)
variable labels.- na_str
(
string)
string used to replace allNAor empty values in the output.- nested
(
flag)
whether this layout instruction should be applied within the existing layout structure _if possible (TRUE, the default) or as a new top-level element (FALSE). Ignored if it would nest a split. underneath analyses, which is not allowed.- show_labels
(
string)
label visibility: one of "default", "visible" and "hidden".- table_names
(
character)
this can be customized in the case that the samevarsare analyzed multiple times, to avoid warnings fromrtables.- section_div
(
string)
string which should be repeated as a section divider after each group defined by this split instruction, orNA_character_(the default) for no section divider.- ...
additional arguments for the lower level functions.
- na_rm
(
flag)
whetherNAvalues should be removed fromxprior to analysis.- .stats
-
(
character)
statistics to select for the table.Options are:
'diff', 'diff_ci' - .stat_names
(
character)
names of the statistics that are passed directly to name single statistics (.stats). This option is visible when producingrtables::as_result_df()withmake_ard = TRUE.- .formats
(named
characterorlist)
formats for the statistics. See Details inanalyze_varsfor more information on the"auto"setting.- .labels
(named
character)
labels for the statistics (without indent).- .indent_mods
(named
integer)
indent modifiers for the labels. Defaults to 0, which corresponds to the unmodified default behavior. Can be negative.- df
(
data.frame)
data set containing all analysis variables.- .var
(
string)
single variable name that is passed byrtableswhen requested by a statistics function.- .ref_group
(
data.frameorvector)
the data corresponding to the reference group.- .in_ref_col
(
flag)TRUEwhen working with the reference level,FALSEotherwise.
Value
estimate_proportion_diff()returns a layout object suitable for passing to further layouting functions, or tortables::build_table(). Adding this function to anrtablelayout will add formatted rows containing the statistics froms_proportion_diff()to the table layout.
s_proportion_diff()returns a named list of elementsdiffanddiff_ci.
a_proportion_diff()returns the corresponding list with formattedrtables::CellValue().
Details
The possible methods are:
"waldcc": Wald confidence interval with continuity correction (Agresti and Coull 1998) ."wald": Wald confidence interval without continuity correction (Agresti and Coull 1998) ."cmh": Cochran-Mantel-Haenszel (CMH) confidence interval (Mantel and Haenszel 1959) ."cmh_sato": CMH confidence interval with Sato variance estimator (Sato et al. 1989) ."cmh_mn": CMH confidence interval with Miettinen and Nurminen confidence interval (Miettinen and Nurminen 1985) ."ha": Anderson-Hauck confidence interval (Hauck and Anderson 1986) ."newcombe": Newcombe confidence interval without continuity correction (Newcombe 1998) ."newcombecc": Newcombe confidence interval with continuity correction (Newcombe 1998) ."strat_newcombe": Stratified Newcombe confidence interval without continuity correction (Yan and Su 2010) ."strat_newcombecc": Stratified Newcombe confidence interval with continuity correction (Yan and Su 2010) .
Functions
estimate_proportion_diff(): Layout-creating function which can take statistics function arguments and additional format arguments. This function is a wrapper forrtables::analyze().s_proportion_diff(): Statistics function estimating the difference in terms of responder proportion.a_proportion_diff(): Formatted analysis function which is used asafuninestimate_proportion_diff().
Note
When performing an unstratified analysis, methods "cmh", "cmh_sato", "strat_newcombe",
and "strat_newcombecc" are not permitted.
References
Agresti A, Coull BA (1998).
“Approximate is Better than "Exact" for Interval Estimation of Binomial Proportions.”
The American Statistician, 52(2), 119–126.
doi:10.1080/00031305.1998.10480550
.
Hauck WW, Anderson S (1986).
“A Comparison of Large-Sample Confidence Interval Methods for the Difference of Two Binomial Probabilities.”
The American Statistician, 40(4), 318–322.
doi:10.2307/2684618
, 2025-12-08.
Mantel N, Haenszel W (1959).
“Statistical aspects of the analysis of data from retrospective studies of disease.”
Journal of the National Cancer Institute, 22(4), 719–748.
Miettinen OS, Nurminen M (1985).
“Comparative analysis of two rates.”
Statistics in Medicine, 4(2), 213–226.
doi:10.1002/sim.4780040211
.
Newcombe RG (1998).
“Interval estimation for the difference between independent proportions: comparison of eleven methods.”
Statistics in Medicine, 17(8), 873-890.
doi:10.1002/(SICI)1097-0258(19980430)17:8<873::AID-SIM779>3.0.CO;2-I
.
Sato T, Greenland S, Robins JM (1989).
“On the variance estimator for the Mantel-Haenszel Risk Difference.”
Biometrics, 45(4), 1323–1324.
http://www.jstor.org/stable/2531784.
Yan X, Su XG (2010).
“Stratified Wilson and Newcombe Confidence Intervals for Multiple Binomial Proportions.”
Stat. Biopharm. Res., 2(3), 329–335.
Examples
## "Mid" case: 4/4 respond in group A, 1/2 respond in group B.
nex <- 100 # Number of example rows
dta <- data.frame(
"rsp" = sample(c(TRUE, FALSE), nex, TRUE),
"grp" = sample(c("A", "B"), nex, TRUE),
"f1" = sample(c("a1", "a2"), nex, TRUE),
"f2" = sample(c("x", "y", "z"), nex, TRUE),
stringsAsFactors = TRUE
)
l <- basic_table() %>%
split_cols_by(var = "grp", ref_group = "B") %>%
estimate_proportion_diff(
vars = "rsp",
conf_level = 0.90,
method = "ha"
)
build_table(l, df = dta)
#> A B
#> ——————————————————————————————————————————————————
#> Difference in Response rate (%) 12.0
#> 90% CI (Anderson-Hauck) (-5.4, 29.4)
s_proportion_diff(
df = subset(dta, grp == "A"),
.var = "rsp",
.ref_group = subset(dta, grp == "B"),
.in_ref_col = FALSE,
conf_level = 0.90,
method = "ha"
)
#> $diff
#> diff_ha
#> 12
#> attr(,"label")
#> [1] "Difference in Response rate (%)"
#>
#> $diff_ci
#> diff_ci_ha_l diff_ci_ha_u
#> -5.374519 29.374519
#> attr(,"label")
#> [1] "90% CI (Anderson-Hauck)"
#>
# CMH example with strata
s_proportion_diff(
df = subset(dta, grp == "A"),
.var = "rsp",
.ref_group = subset(dta, grp == "B"),
.in_ref_col = FALSE,
variables = list(strata = c("f1", "f2")),
conf_level = 0.90,
method = "cmh"
)
#> $diff
#> diff_cmh
#> 12.27932
#> attr(,"label")
#> [1] "Difference in Response rate (%)"
#>
#> $diff_ci
#> diff_ci_cmh_l diff_ci_cmh_u
#> -2.657093 27.215725
#> attr(,"label")
#> [1] "90% CI (CMH, without correction)"
#>
a_proportion_diff(
df = subset(dta, grp == "A"),
.stats = c("diff"),
.var = "rsp",
.ref_group = subset(dta, grp == "B"),
.in_ref_col = FALSE,
conf_level = 0.90,
method = "ha"
)
#> RowsVerticalSection (in_rows) object print method:
#> ----------------------------
#> row_name formatted_cell indent_mod row_label
#> 1 diff 12 0 Difference in Response rate (%)
