Biomarker Analysis Catalog - Stable
  • Stable
    • Dev
  1. Graphs
  2. DG
  3. DG2
  • Index

  • Tables
    • CPMT
      • CPMT1
      • CPMT2
        • CPMT2A
      • CPMT3
    • DT
      • DT1
        • DT1A
        • DT1B
        • DT1C
      • DT2
        • DT2A
    • TET
      • TET1
        • TET1A

  • Graphs
    • AG
      • AG1
    • DG
      • DG1
        • DG1A
        • DG1B
      • DG2
      • DG3
        • DG3A
      • DG4
    • KG
      • KG1
        • KG1A
        • KG1B
      • KG2
        • KG2A
      • KG3
      • KG4
        • KG4A
        • KG4B
      • KG5
        • KG5A
        • KG5B
    • RFG
      • RFG1
        • RFG1A
      • RFG2
        • RFG2A
        • RFG2B
        • RFG2C
      • RFG3
    • RG
      • RG1
        • RG1A
        • RG1B
        • RG1C
      • RG2
        • RG2A
      • RG3
        • RG3A
        • RG3B
    • SPG
      • SPG1
      • SPG2
    • RNAG
      • RNAG1
      • RNAG2
      • RNAG3
      • RNAG4
      • RNAG5
      • RNAG6
      • RNAG7
      • RNAG8
      • RNAG9
      • RNAG10
    • SFG
      • SFG1
        • SFG1A
        • SFG1B
      • SFG2
        • SFG2A
        • SFG2B
        • SFG2C
        • SFG2D
      • SFG3
        • SFG3A
      • SFG4
      • SFG5
        • SFG5A
        • SFG5B
        • SFG5C
      • SFG6
        • SFG6A
        • SFG6B
        • SFG6C
  1. Graphs
  2. DG
  3. DG2

DG2

Boxplots of a Numeric Variable by Categorical Variables

DG

  • Setup
  • Plot
  • Session Info

The graph below plots the distribution of a biomarker variable (on a continuous scale) as a boxplot by one or more categorical clinical variables with overlaid points.

We will use the cadsl data set from the random.cdisc.data package to illustrate the graph and will select the biomarker evaluable population with BEP01FL. The column BMRKR1 contains the biomarker values on a continuous scale. We will use STRATA2 and ARM as categorical clinical variables.

Code
library(tern)
library(ggplot2.utils)
library(dplyr)

adsl <- random.cdisc.data::cadsl %>%
  df_explicit_na() %>%
  filter(BEP01FL == "Y")

Here below the code for a simple boxplot with the outliers displayed. Note that you may run into warning messages after producing the graph if the variable you want to analyze contains NAs. To avoid these warning messages, you can use the drop_na() function from tidyr in the data manipulation step above to remove the NAs rows from the dataset (e.g drop_na(BMRKR1)).

Code
graph <- ggplot(adsl, aes(x = STRATA2, y = BMRKR1)) +
  geom_boxplot() +
  stat_boxplot(geom = "errorbar")

graph

Now we overlay the original data points, and remove the display of the outliers to avoid duplicate points.

Code
graph <- ggplot(adsl, aes(x = STRATA2, y = BMRKR1)) +
  geom_boxplot(outlier.shape = NA) +
  stat_boxplot(geom = "errorbar") +
  geom_point(
    position = position_jitter(width = 0.2),
    alpha = 1 / 4
  )

graph

We can customize the labels of the axes.

Code
graph +
  scale_x_discrete(
    breaks = c("S1", "S2"),
    labels = c("Stratum 1", "Stratum 2"),
    name = "Strata"
  ) +
  scale_y_continuous(name = "Biomarker (Units)")

We can add the group sizes as annotations.

Code
graph +
  stat_n_text(text.box = TRUE)

We can also display the biomarker by a further categorical variable with the facet_grid() layer.

Code
graph +
  facet_grid(. ~ ARM)

This example shows how to display the biomarker axis on a log scale.

Code
graph +
  scale_y_log10(name = "Biomarker (Log(Units))")

Code
sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.1.4         ggplot2.utils_0.3.2 ggplot2_3.5.1      
[4] tern_0.9.7          rtables_0.6.11      magrittr_2.0.3     
[7] formatters_0.5.10  

loaded via a namespace (and not attached):
 [1] generics_0.1.3           tidyr_1.3.1              EnvStats_3.0.0          
 [4] stringi_1.8.4            lattice_0.22-6           digest_0.6.37           
 [7] evaluate_1.0.3           grid_4.4.2               fastmap_1.2.0           
[10] jsonlite_1.9.0           Matrix_1.7-2             backports_1.5.0         
[13] survival_3.8-3           purrr_1.0.4              scales_1.3.0            
[16] codetools_0.2-20         Rdpack_2.6.2             cli_3.6.4               
[19] ggpp_0.5.8-1             nestcolor_0.1.3          rlang_1.1.5             
[22] rbibutils_2.3            munsell_0.5.1            splines_4.4.2           
[25] withr_3.0.2              yaml_2.3.10              tools_4.4.2             
[28] polynom_1.4-1            checkmate_2.3.2          colorspace_2.1-1        
[31] forcats_1.0.0            ggstats_0.8.0            broom_1.0.7             
[34] vctrs_0.6.5              R6_2.6.1                 lifecycle_1.0.4         
[37] stringr_1.5.1            htmlwidgets_1.6.4        MASS_7.3-64             
[40] pkgconfig_2.0.3          pillar_1.10.1            gtable_0.3.6            
[43] glue_1.8.0               xfun_0.51                tibble_3.2.1            
[46] tidyselect_1.2.1         knitr_1.49               farver_2.1.2            
[49] htmltools_0.5.8.1        labeling_0.4.3           rmarkdown_2.29          
[52] random.cdisc.data_0.3.16 compiler_4.4.2          

Reuse

Copyright 2023, Hoffmann-La Roche Ltd.
DG1B
DG3
Source Code
---
title: DG2
subtitle: Boxplots of a Numeric Variable by Categorical Variables
categories: [DG]
---

------------------------------------------------------------------------

::: panel-tabset
## Setup

The graph below plots the distribution of a biomarker variable (on a continuous scale) as a boxplot by one or more categorical clinical variables with overlaid points.

We will use the `cadsl` data set from the `random.cdisc.data` package to illustrate the graph and will select the biomarker evaluable population with `BEP01FL`.
The column `BMRKR1` contains the biomarker values on a continuous scale.
We will use `STRATA2` and `ARM` as categorical clinical variables.

```{r, message = FALSE}
library(tern)
library(ggplot2.utils)
library(dplyr)

adsl <- random.cdisc.data::cadsl %>%
  df_explicit_na() %>%
  filter(BEP01FL == "Y")
```

## Plot

Here below the code for a simple boxplot with the outliers displayed.
Note that you may run into warning messages after producing the graph if the variable you want to analyze contains NAs.
To avoid these warning messages, you can use the `drop_na()` function from `tidyr` in the data manipulation step above to remove the NAs rows from the dataset (e.g `drop_na(BMRKR1)`).

```{r}
graph <- ggplot(adsl, aes(x = STRATA2, y = BMRKR1)) +
  geom_boxplot() +
  stat_boxplot(geom = "errorbar")

graph
```

Now we overlay the original data points, and remove the display of the outliers to avoid duplicate points.

```{r}
graph <- ggplot(adsl, aes(x = STRATA2, y = BMRKR1)) +
  geom_boxplot(outlier.shape = NA) +
  stat_boxplot(geom = "errorbar") +
  geom_point(
    position = position_jitter(width = 0.2),
    alpha = 1 / 4
  )

graph
```

We can customize the labels of the axes.

```{r}
graph +
  scale_x_discrete(
    breaks = c("S1", "S2"),
    labels = c("Stratum 1", "Stratum 2"),
    name = "Strata"
  ) +
  scale_y_continuous(name = "Biomarker (Units)")
```

We can add the group sizes as annotations.

```{r}
graph +
  stat_n_text(text.box = TRUE)
```

We can also display the biomarker by a further categorical variable with the `facet_grid()` layer.

```{r}
graph +
  facet_grid(. ~ ARM)
```

This example shows how to display the biomarker axis on a log scale.

```{r}
graph +
  scale_y_log10(name = "Biomarker (Log(Units))")
```

{{< include ../misc/session_info.qmd >}}
:::

Made with ❤️ by the Statistical Engineering Team StatisticalEngineering

  • License

  • Edit this page
  • Report an issue
Cookie Preferences