DG2
Boxplots of a Numeric Variable by Categorical Variables
The graph below plots the distribution of a biomarker variable (on a continuous scale) as a boxplot by one or more categorical clinical variables with overlaid points.
We will use the cadsl
data set from the random.cdisc.data
package to illustrate the graph and will select the biomarker evaluable population with BEP01FL
. The column BMRKR1
contains the biomarker values on a continuous scale. We will use STRATA2
and ARM
as categorical clinical variables.
Here below the code for a simple boxplot with the outliers displayed. Note that you may run into warning messages after producing the graph if the variable you want to analyze contains NAs. To avoid these warning messages, you can use the drop_na()
function from tidyr
in the data manipulation step above to remove the NAs rows from the dataset (e.g drop_na(BMRKR1)
).
Code
Now we overlay the original data points, and remove the display of the outliers to avoid duplicate points.
Code
We can customize the labels of the axes.
Code
We can add the group sizes as annotations.
We can also display the biomarker by a further categorical variable with the facet_grid()
layer.
This example shows how to display the biomarker axis on a log scale.
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_1.1.4 ggplot2.utils_0.3.2 ggplot2_3.5.1
[4] tern_0.9.5 rtables_0.6.9 magrittr_2.0.3
[7] formatters_0.5.9
loaded via a namespace (and not attached):
[1] utf8_1.2.4 generics_0.1.3 tidyr_1.3.1
[4] EnvStats_3.0.0 stringi_1.8.4 lattice_0.22-6
[7] digest_0.6.37 evaluate_0.24.0 grid_4.4.1
[10] fastmap_1.2.0 jsonlite_1.8.8 Matrix_1.7-0
[13] backports_1.5.0 survival_3.7-0 purrr_1.0.2
[16] fansi_1.0.6 scales_1.3.0 codetools_0.2-20
[19] Rdpack_2.6.1 cli_3.6.3 ggpp_0.5.8-1
[22] rlang_1.1.4 rbibutils_2.2.16 munsell_0.5.1
[25] splines_4.4.1 withr_3.0.1 yaml_2.3.10
[28] tools_4.4.1 polynom_1.4-1 checkmate_2.3.2
[31] colorspace_2.1-1 forcats_1.0.0 ggstats_0.6.0
[34] broom_1.0.6 vctrs_0.6.5 R6_2.5.1
[37] lifecycle_1.0.4 stringr_1.5.1 htmlwidgets_1.6.4
[40] MASS_7.3-61 pkgconfig_2.0.3 pillar_1.9.0
[43] gtable_0.3.5 glue_1.7.0 xfun_0.47
[46] tibble_3.2.1 tidyselect_1.2.1 knitr_1.48
[49] farver_2.1.2 htmltools_0.5.8.1 labeling_0.4.3
[52] rmarkdown_2.28 random.cdisc.data_0.3.15 compiler_4.4.1