AG1

Utilities for All Graphs

In this page we collect standard utilities for plotting which can be applied in principle to all graphs. Then we don’t need to repeat explaining these in each of the other graph pages. So it is kind of a cheat sheet for ggplot2. General introductions to ggplot2 and a pdf cheatsheet are also recommended and linked at https://ggplot2.tidyverse.org/.

We start by creating a graph. Note that we load our ggplot2.utils NEST package instead of ggplot2 so we benefit from additional utilities selected from the ggplot2 extension package ecosystem. Note that ggplot2.utils automatically loads also ggplot2, there we do not need to do that manually.

We also typically apply the df_explicit_na() function to the incoming dataset to convert character to factor variables, and code missing values as an explicit factor level, which avoids downstream problems.

Note that you may still run into warning messages after producing some of the graphs if a continuous variable you want to analyze contains NAs. To avoid these warning messages, you can use the drop_na() function from tidyr in the data manipulation step below to remove the rows containing NAs for the specific numeric column (e.g. drop_na(BMRKR1) to remove rows where BMRKR1 is missing).

Code

library(tern)
library(tidyr)
library(dplyr)
library(ggplot2.utils)

adsl <- random.cdisc.data::cadsl %>%
  df_explicit_na() %>%
  drop_na(BMRKR1)

graph <- ggplot(adsl, aes(BMRKR1)) +
  geom_histogram(aes(y = after_stat(density)), bins = 30) +
  geom_density(aes(y = after_stat(density)))

graph

Another possibility is to add na.rm = TRUE option to the ggplot() arguments, or alternatively select() the relevant variables and then delete any rows with missing values by na.omit() and finally pipe that to ggplot().

Code

ggplot(adsl, aes(BMRKR1), na.rm = TRUE) +
  geom_histogram(aes(y = after_stat(density)), bins = 30) +
  geom_density(aes(y = after_stat(density)))

Code

adsl %>%
  select(BMRKR1) %>%
  na.omit() %>%
  ggplot(aes(BMRKR1)) +
  geom_histogram(aes(y = after_stat(density)), bins = 30) +
  geom_density(aes(y = after_stat(density)))

Title, subtitle, axis as well as caption labels can be added with the labs() layer. Alternatively specific tern functions can be used, see [Titles and Footnotes] below.

Code

graph + labs(
  x = "Baseline Biomarker",
  y = "Density",
  title = "Distribution of the Baseline Biomarker",
  subtitle = "Histogram and Density Plot",
  caption = "Note: No outliers have been removed here."
)

We can change the coordinate system with coord_*() layers, e.g. to rotate the plot.

Code

graph + coord_flip()

We can set the limits of the coordinate axes with coord_cartesian(). This performs a real zoom into the plot, instead of just replacing outside values with NA as xlim() or lims() are doing it. It is therefore preferred.

Code

graph +
  coord_cartesian(xlim = c(0, 15), ylim = c(0, 0.3)) +
  labs(caption = "Note: biomarker values greater than 15 are not shown on this plot.")

General plot theme can be specified with theme_*() functions. The default theme is theme_gray(). For example in publications other themes might be preferred, such as theme_classic().

Code

graph + theme_classic()

Many different scales mapping data values to visual values of an aesthetic are available in scale_*() functions.

We can change the location scale easily, for example to show x on a log scale:

Code

graph + scale_x_log10()

Faceting is an elegant approach to create the same graph separately for each of the levels of one or multiple other factors. It can just be added as an additional layer to the existing graph. For example, we can show the distribution of the biomarker for each of the gender levels.

Code

graph + facet_grid(~SEX)

NEST provides a function to add titles, footnotes, and page numbers to grob objects (read: graphical objects) with tern::decorate_grob().

First, we need to prepare the pieces: graph, titles/footnotes. A ggplot object must be converted to a grob. We can use ggplot::ggplotGrob() to accomplish this easily.

Titles and footnotes can be defined as vectors where each element is a new line. Tip: The paste() function can be helpful to split long sentences across multiple lines.

Code

grob_graph <- ggplotGrob(graph)

titles <- c(
  "Distribution of the Baseline Biomarker 1",
  "Biomarker Evaluable Patients",
  "Protocol: AB12345 (Data Cut: 01 JAN 2021)"
)

footnotes <- c(
  "Biomarker 1 = Gene ABC",
  "Data Cut-off: 01 JAN 2021; RAVE Data Extracted: 15 JAN 2021",
  "Program: biomark1_analysis.R"
)

Now that the pieces are ready, we can put them together. The grid package allows us to manipulate grobs. Using grid::grid.draw() we can apply tern::decorate_grob() to our object, thus adding the titles/footnotes.

Code

library(grid)

grid.newpage()
grid.draw(
  decorate_grob(
    grob = grob_graph,
    titles = titles,
    footnotes = footnotes,
    page = "Page 6 of 129"
  )
)

Code

sessionInfo()

R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] ggplot2.utils_0.3.2 ggplot2_3.5.1       dplyr_1.1.4        
[4] tidyr_1.3.1         tern_0.9.7          rtables_0.6.11     
[7] magrittr_2.0.3      formatters_0.5.10  

loaded via a namespace (and not attached):
 [1] generics_0.1.3           EnvStats_3.0.0           stringi_1.8.4           
 [4] lattice_0.22-6           digest_0.6.37            evaluate_1.0.3          
 [7] fastmap_1.2.0            jsonlite_1.9.0           Matrix_1.7-2            
[10] backports_1.5.0          survival_3.8-3           purrr_1.0.4             
[13] scales_1.3.0             codetools_0.2-20         Rdpack_2.6.2            
[16] cli_3.6.4                ggpp_0.5.8-1             nestcolor_0.1.3         
[19] rlang_1.1.5              rbibutils_2.3            munsell_0.5.1           
[22] splines_4.4.2            withr_3.0.2              yaml_2.3.10             
[25] tools_4.4.2              polynom_1.4-1            checkmate_2.3.2         
[28] colorspace_2.1-1         forcats_1.0.0            ggstats_0.8.0           
[31] broom_1.0.7              vctrs_0.6.5              R6_2.6.1                
[34] lifecycle_1.0.4          stringr_1.5.1            htmlwidgets_1.6.4       
[37] MASS_7.3-64              pkgconfig_2.0.3          pillar_1.10.1           
[40] gtable_0.3.6             glue_1.8.0               xfun_0.51               
[43] tibble_3.2.1             tidyselect_1.2.1         knitr_1.49              
[46] farver_2.1.2             htmltools_0.5.8.1        labeling_0.4.3          
[49] rmarkdown_2.29           random.cdisc.data_0.3.16 compiler_4.4.2

Reuse