Biomarker Analysis Catalog - Stable
  • Stable
    • Dev
  1. Graphs
  2. DG
  3. DG3
  • Index

  • Tables
    • CPMT
      • CPMT1
      • CPMT2
        • CPMT2A
      • CPMT3
    • DT
      • DT1
        • DT1A
        • DT1B
        • DT1C
      • DT2
        • DT2A
    • TET
      • TET1
        • TET1A

  • Graphs
    • AG
      • AG1
    • DG
      • DG1
        • DG1A
        • DG1B
      • DG2
      • DG3
        • DG3A
      • DG4
    • KG
      • KG1
        • KG1A
        • KG1B
      • KG2
        • KG2A
      • KG3
      • KG4
        • KG4A
        • KG4B
      • KG5
        • KG5A
        • KG5B
    • RFG
      • RFG1
        • RFG1A
      • RFG2
        • RFG2A
        • RFG2B
        • RFG2C
      • RFG3
    • RG
      • RG1
        • RG1A
        • RG1B
        • RG1C
      • RG2
        • RG2A
      • RG3
        • RG3A
        • RG3B
    • SPG
      • SPG1
      • SPG2
    • RNAG
      • RNAG1
      • RNAG2
      • RNAG3
      • RNAG4
      • RNAG5
      • RNAG6
      • RNAG7
      • RNAG8
      • RNAG9
      • RNAG10
    • SFG
      • SFG1
        • SFG1A
        • SFG1B
      • SFG2
        • SFG2A
        • SFG2B
        • SFG2C
        • SFG2D
      • SFG3
        • SFG3A
      • SFG4
      • SFG5
        • SFG5A
        • SFG5B
        • SFG5C
      • SFG6
        • SFG6A
        • SFG6B
        • SFG6C
  1. Graphs
  2. DG
  3. DG3

DG3

Barplots of Categorical Variables

DG

  • Setup
  • Plot
  • Session Info

The graphs below summarize the distribution of a categorical biomarker variable as barplots, either in the overall population or by one or more categorical clinical variables.

We will use the cadsl data set from the random.cdisc.data package to illustrate the graph and will select on the biomarker evaluable population with BEP01FL. The column BMRKR2 contains the biomarker values on a categorical scale. We will use ARM as categorical clinical variable.

Code
library(tern)
library(ggplot2.utils)
library(dplyr)

adsl <- random.cdisc.data::cadsl %>%
  df_explicit_na() %>%
  filter(BEP01FL == "Y")

Here below the code for a simple barplot showing the counts of the categories.

Code
graph <- ggplot(adsl, aes(BMRKR2)) +
  geom_bar()

graph

We can customize the labels of the axes.

Code
graph +
  scale_x_discrete(
    breaks = c("LOW", "MEDIUM", "HIGH"),
    labels = c("Low", "Medium", "High"),
    name = "Biomarker"
  ) +
  scale_y_continuous(name = "count")

We can also add the absolute count above each of the columns.

Code
graph +
  geom_text(
    stat = "count",
    aes(label = after_stat(count)),
    vjust = -.5
  )

If instead of counts we want to display the percentages then the following options could be used:

Code
graph <- ggplot(
  adsl,
  aes(
    x = BMRKR2,
    y = prop.table(after_stat(count)),
    label = scales::percent(prop.table(after_stat(count)))
  )
) +
  geom_bar()

graph

We can customize the axes.

Code
graph +
  scale_y_continuous(
    labels = scales::percent_format(),
    name = "Proportion (%)"
  ) +
  scale_x_discrete(
    breaks = c("LOW", "MEDIUM", "HIGH"),
    labels = c("Low", "Medium", "High"),
    name = "Biomarker"
  )

We can add the percentages above each of the columns.

Code
graph +
  geom_text(
    stat = "count",
    vjust = -0.5
  )

Code
sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.1.4         ggplot2.utils_0.3.2 ggplot2_3.5.1      
[4] tern_0.9.7          rtables_0.6.11      magrittr_2.0.3     
[7] formatters_0.5.10  

loaded via a namespace (and not attached):
 [1] generics_0.1.3           tidyr_1.3.1              EnvStats_3.0.0          
 [4] stringi_1.8.4            lattice_0.22-6           digest_0.6.37           
 [7] evaluate_1.0.3           grid_4.4.2               fastmap_1.2.0           
[10] jsonlite_1.9.0           Matrix_1.7-2             backports_1.5.0         
[13] survival_3.8-3           purrr_1.0.4              scales_1.3.0            
[16] codetools_0.2-20         Rdpack_2.6.2             cli_3.6.4               
[19] ggpp_0.5.8-1             nestcolor_0.1.3          rlang_1.1.5             
[22] rbibutils_2.3            munsell_0.5.1            splines_4.4.2           
[25] withr_3.0.2              yaml_2.3.10              tools_4.4.2             
[28] polynom_1.4-1            checkmate_2.3.2          colorspace_2.1-1        
[31] forcats_1.0.0            ggstats_0.8.0            broom_1.0.7             
[34] vctrs_0.6.5              R6_2.6.1                 lifecycle_1.0.4         
[37] stringr_1.5.1            htmlwidgets_1.6.4        MASS_7.3-64             
[40] pkgconfig_2.0.3          pillar_1.10.1            gtable_0.3.6            
[43] glue_1.8.0               xfun_0.51                tibble_3.2.1            
[46] tidyselect_1.2.1         knitr_1.49               farver_2.1.2            
[49] htmltools_0.5.8.1        labeling_0.4.3           rmarkdown_2.29          
[52] random.cdisc.data_0.3.16 compiler_4.4.2          

Reuse

Copyright 2023, Hoffmann-La Roche Ltd.
DG2
DG3A
Source Code
---
title: DG3
subtitle: Barplots of Categorical Variables
categories: [DG]
---

------------------------------------------------------------------------

::: panel-tabset
{{< include setup.qmd >}}

## Plot

Here below the code for a simple barplot showing the counts of the categories.

```{r}
graph <- ggplot(adsl, aes(BMRKR2)) +
  geom_bar()

graph
```

We can customize the labels of the axes.

```{r}
graph +
  scale_x_discrete(
    breaks = c("LOW", "MEDIUM", "HIGH"),
    labels = c("Low", "Medium", "High"),
    name = "Biomarker"
  ) +
  scale_y_continuous(name = "count")
```

We can also add the absolute count above each of the columns.

```{r}
graph +
  geom_text(
    stat = "count",
    aes(label = after_stat(count)),
    vjust = -.5
  )
```

If instead of counts we want to display the percentages then the following options could be used:

```{r}
graph <- ggplot(
  adsl,
  aes(
    x = BMRKR2,
    y = prop.table(after_stat(count)),
    label = scales::percent(prop.table(after_stat(count)))
  )
) +
  geom_bar()

graph
```

We can customize the axes.

```{r}
graph +
  scale_y_continuous(
    labels = scales::percent_format(),
    name = "Proportion (%)"
  ) +
  scale_x_discrete(
    breaks = c("LOW", "MEDIUM", "HIGH"),
    labels = c("Low", "Medium", "High"),
    name = "Biomarker"
  )
```

We can add the percentages above each of the columns.

```{r}
graph +
  geom_text(
    stat = "count",
    vjust = -0.5
  )
```

{{< include ../../misc/session_info.qmd >}}
:::

Made with ❤️ by the Statistical Engineering Team StatisticalEngineering

  • License

  • Edit this page
  • Report an issue
Cookie Preferences