Histograms of Numeric Variables
We will use the cadsl
data set from the random.cdisc.data
package and ggplot2
to create the plots. In this example, we will plot histograms of one or multiple numeric variables. We start by selecting the biomarker evaluable population with the flag variable BEP01FL
and then populating a new continuous biomarker variable, BMRKR3
In this example, we will create a combined histogram/density graph of a continuous biomarker variable. Note that you may run into warning messages after producing the graph if the variable you want to analyze contains NAs. To avoid these warning messages, you can use the drop_na()
function from tidyr
in the data manipulation step above to remove the NAs rows from the dataset (e.g drop_na(BMRKR1)
We can also calculate some descriptive statistics and populate a table that we can overlay on top of the plot. The tibble
function is used to build a data frame data_tb
with 3 variables. The x
and y
variables represent the coordinates on the plot to show the statistic values and can be modified based on preference. For example, x = 1
and y = 1
will put the statistics table in the top right corner of the graph, while x = 0
and y = 0
will put the statistics table in the bottom left corner of the graph. The tb
variable contains the statistics to be shown on the plot, in the form of a nested list column starting from the original statistics tibble orig_tb
. Finally, we can use the geom_table_npc()
layer function to process the data_tb
input and add the statistics table to the existing graph.
orig_tb <- with(adsl, tribble(
~Statistic, ~Value,
"N", length(BMRKR1),
"SD", sd(BMRKR1),
"Median", median(BMRKR1),
"Min.", min(BMRKR1),
"Max.", max(BMRKR1)
data_tb <- tibble(x = 1, y = 1, tb = list(orig_tb))
graph <- graph +
geom_table_npc(data = data_tb, aes(npcx = x, npcy = y, label = tb))
