Printing Machinery
Davide Garolini
2024-09-23
Source:vignettes/dev-guide/dg_printing.Rmd
dg_printing.Rmd
Disclaimer
In comparison to other entries of the developer guide, this is
intended to keep track of the general concepts and processing pipeline
behind the printing machinery. It is not intended to be a complete
documentation of the machinery itself, but rather a collection of notes
that can be used to understand the machinery and its internals. Hence,
be aware that this is a working document that captures a snapshot of the
machinery at a certain point in time. It is not meant to be fully
maintained, but it can be used as a starting point for one. Compared to
other parts of the developer guide, this will contain the current state
of rlistings
’ printing machinery, which is often a
simplified version of the machinery that is used in
rtables
.
How print
works
Lets track down what is going under the hood when a standard table is printed. The following is the code that is executed when a table is printed:
# Loading required package: formatters
#
# Attaching package: 'formatters'
# The following object is masked from 'package:base':
#
# %||%
# Loading required package: magrittr
#
# Attaching package: 'rtables'
# The following object is masked from 'package:utils':
#
# str
#
# Attaching package: 'dplyr'
# The following objects are masked from 'package:stats':
#
# filter, lag
# The following objects are masked from 'package:base':
#
# intersect, setdiff, setequal, union
lyt <- basic_table() %>%
split_rows_by("SEX", split_fun = keep_split_levels(c("F", "M"))) %>%
split_cols_by("ARM") %>%
analyze("BMRKR1") %>%
print()
# A Pre-data Table Layout
#
# Column-Split Structure:
# ARM (lvls)
#
# Row-Split Structure:
# SEX (lvls) -> BMRKR1 (** analysis **)
tbl <- build_table(lyt, ex_adsl) %>%
print()
# A: Drug X B: Placebo C: Combination
# ————————————————————————————————————————————————
# F
# Mean 5.75 5.59 5.68
# M
# Mean 6.27 5.87 5.34
We see that also a layout object (PreDataTableLayouts
)
is created and printed. This is because print
is a generic
function that dispatches to different methods depending on the class of
the object. In this case, the S4 class of the object is
PreDataTableLayouts
and the method that is called is
print
. In the case of {rtables} the method is dispatched
towards the show
method of the class
PreDataTableLayouts
. It can be found by searching
A Pre-data Table Layout
into {rtables} source code. I think
R dispatcher for print
methods looks for show
S4 methods instead if there are no S3 or S4 print
methods
available. Indeed, this is the code that is executed:
setMethod(
"show", "PreDataTableLayouts",
function(object) {
cat("A Pre-data Table Layout\n")
cat("\nColumn-Split Structure:\n")
docat_predataxis(object@col_layout)
cat("\nRow-Split Structure:\n")
docat_predataxis(object@row_layout)
cat("\n")
invisible(object)
}
)
This was evident if we searched for methods associated with the class
PreDataTableLayouts
, where only show
is
connected to a sort of printing machinery:
methods(class = "PreDataTableLayouts")
# [1] .add_row_summary clayout clayout<-
# [4] col_exprs colcount_format colcount_format<-
# [7] coltree header_section_div header_section_div<-
# [10] main_footer main_footer<- main_title
# [13] main_title<- prov_footer prov_footer<-
# [16] show subtitles subtitles<-
# [19] table_inset table_inset<- top_left
# [22] top_left<- top_level_section_div top_level_section_div<-
# [25] vars_in_layout
# see '?methods' for accessing help and source code
Now, lets see the same for our result table tbl
:
# [1] "TableTree"
# attr(,"package")
# [1] "rtables"
# Class "TableTree" [package "rtables"]
#
# Slots:
#
# Name: content page_title_prefix children
# Class: ElementaryTable character list
#
# Name: rowspans labelrow page_titles
# Class: data.frame LabelRow character
#
# Name: horizontal_sep header_section_div trailing_section_div
# Class: character character character
#
# Name: col_info format na_str
# Class: InstantiatedColumnInfo FormatSpec character
#
# Name: indent_modifier table_inset level
# Class: integer integer integer
#
# Name: name main_title subtitles
# Class: character character character
#
# Name: main_footer provenance_footer
# Class: character character
#
# Extends:
# Class "VTableTree", directly
# Class "VTableNodeInfo", by class "VTableTree", distance 2
# Class "VTree", by class "VTableTree", distance 2
# Class "VTitleFooter", by class "VTableTree", distance 2
# Class "VNodeInfo", by class "VTableTree", distance 3
# [1] [ [<- as.vector
# [4] cell_footnotes cell_values clayout
# [7] clear_indent_mods col_counts col_counts<-
# [10] col_footnotes col_info col_info<-
# [13] col_total col_total<- colcount_format
# [16] colcount_format<- colcount_na_str colcount_na_str<-
# [19] colcount_visible colcount_visible<- collect_leaves
# [22] coltree content_table content_table<-
# [25] dim do_forced_paginate facet_colcount
# [28] facet_colcount<- fnotes_at_path<- get_formatted_cells
# [31] head header_section_div header_section_div<-
# [34] horizontal_sep horizontal_sep<- indent_mod
# [37] indent_mod<- insert_row_at_path main_footer
# [40] main_footer<- main_title main_title<-
# [43] make_row_df matrix_form names
# [46] ncol no_colinfo nrow
# [49] obj_format obj_format<- obj_label
# [52] obj_label<- obj_na_str obj_na_str<-
# [55] obj_name obj_name<- page_titles
# [58] page_titles<- prov_footer prov_footer<-
# [61] rbind rbind2 rm_all_colcounts
# [64] row_footnotes row.names section_div
# [67] section_div<- show str
# [70] subtitles subtitles<- table_inset
# [73] table_inset<- tail top_left
# [76] top_left<- toString tree_children
# [79] tree_children<- tt_at_path tt_at_path<-
# [82] value_at value_formats
# see '?methods' for accessing help and source code
Again, the class itself has only the show
method.
Nonetheless, if you search for VTableTree"
you will find
the print
method for the TableTree
class. This
is because VTableTree
is a virtual class that is inherited
by TableTree
and is almost identical to the
show
method for TableTree
objects. All the
different statements in this case (show
or
print
) do the same thing, i.e. they call
toString
and cat
on the object. Hence, we know
that every table is printed by toString
with
\n
as separator for different lines so that
cat
renders it in its final format.
From matrix_form
to toString
If we have source code of formatters
,
rtables
, and rlistings
in our local we can
search for "toString"
S4 method definition across these
source folders. We will find generics in formatters
and
three different setMethod(...)
. toString
is
properly defined in formatters
, but it is also present in
rlistings
andrtables
. Let’s take a look at the
latter first.
setMethod("toString", "VTableTree", function(x,
widths = NULL,
col_gap = 3,
hsep = horizontal_sep(x),
indent_size = 2,
tf_wrap = FALSE,
max_width = NULL) {
toString(
matrix_form(x,
indent_rownames = TRUE,
indent_size = indent_size # Only modifies the rownames in matrix_form
),
widths = widths, col_gap = col_gap,
hsep = hsep,
tf_wrap = tf_wrap,
max_width = max_width
)
})
This is only a wrapper/dispatcher to the core toString
function in formatters
, beside the indent_size
specification. This is based on the “rendering-ready” class
MatrixPrintForm
that is produced by
matrix_form
. The latter is the first core transformation
that we need to know to understand the printing process. All exporters
and printers are based on MatrixPrintForm
objects, hence
any bug or problem needs to tracked down to this function or
toString
. If we take a look at toString
for
"listing_df"
in rlistings
, we will find a
shallow wrapper that dispatches to MatrixPrintForm
objects:
setMethod("toString", "listing_df", function(x, ...) {
toString(matrix_form(x), ...)
})
Hence lets take a look at "matrix_form"
(if there are
quotes, it is an S4 function from now on). Beside generics and self
calls
(setMethod("matrix_form", "MatrixPrintForm", [...] obj)
),
rlistings
and rtables
have their own
“constructor” of MatrixPrintForm
(the real one can be found
in formatters
). Let’s start with the latter
"matrix_form"
which is dispatched when dealing with
VTableTree
s.
# Entering matrix_form for VTableTree
trace("matrix_form", signature = "VTableTree", tracer = browser, exit = browser)
matrix_form(tbl)
untrace("matrix_form", signature = "VTableTree")
Now lets see the newly commented code for matrix_form
.
With #->
I will comment some suggestions for further
understandings.
setMethod(
"matrix_form", "VTableTree",
function(obj,
indent_rownames = FALSE,
expand_newlines = TRUE,
indent_size = 2) {
stopifnot(is(obj, "VTableTree"))
#-> Read .tbl_header_mat and subfunctions (based largely on cinfo) it can help for understanding
# column structure and how it is printed (we can add a description of this process xxx)
# Note: it contains the display of column counts directives and specifics
header_content <- .tbl_header_mat(obj) # first col are for row.names or topleft info
nr_header <- nrow(header_content$body) # colcounts were added in .tbl_header_mat
#-> As before, reading this function can help understanding how the content of the table is transformed
# in row content and how the structure of the table is preserved in a compact manner. It is complex
# function as it is a recursive one with the different dispatcher but following how different section_div
# are printed (with the dedicated assignment function) can help understanding the table structure and its
# row-wise transformation.
# Summary of row contents - reprint_inds specifies which rows to reprint (hence the grouping)
sr <- make_row_df(obj)
# With get_formatted_cells we get relevant information inside the table tree
body_content_strings <- if (NROW(sr) == 0) {
character()
} else {
#-> get_formatted_cells is an interesting function to understand the structure of the table as
# it is design to extract only the "data" of the table as strings. Note how the label rows are
# taken from make_row_df instead. Check shell = TRUE afterwards to see how the format are retrieved.
cbind(as.character(sr$label), get_formatted_cells(obj))
}
formats_strings <- if (NROW(sr) == 0) {
character()
} else {
cbind("", get_formatted_cells(obj, shell = TRUE))
}
#-> Here spans are extracted for each row. Spans are rarely modified beyond its standard values.
# Takes the flatten spans for each row and repeats them according to the number elements
tsptmp <- lapply(collect_leaves(obj, TRUE, TRUE), function(rr) {
sp <- row_cspans(rr)
rep(sp, times = sp)
})
## the 1 is for row labels
body_spans <- if (nrow(obj) > 0) {
cbind(1L, do.call(rbind, tsptmp))
} else {
matrix(1, nrow = 0, ncol = ncol(obj) + 1)
}
body_aligns <- if (NROW(sr) == 0) {
character()
} else {
cbind("left", get_cell_aligns(obj)) #-> extracts align values for each cell
}
body <- rbind(header_content$body, body_content_strings)
# Init column format for header (empty if not for column counts)
hdr_fmt_blank <- matrix("",
nrow = nrow(header_content$body),
ncol = ncol(header_content$body)
)
# If column counts are displayed, add column count format
if (disp_ccounts(obj)) {
hdr_fmt_blank[nrow(hdr_fmt_blank), ] <- c("", rep(colcount_format(obj), ncol(obj)))
}
formats <- rbind(hdr_fmt_blank, formats_strings)
spans <- rbind(header_content$span, body_spans)
row.names(spans) <- NULL
aligns <- rbind(
matrix(rep("center", length(header_content$body)),
nrow = nrow(header_content$body)
),
body_aligns
)
aligns[, 1] <- "left" # row names and topleft (still needed for topleft)
# Main indentation of the table rownames #-> Main indentation facility
if (indent_rownames) {
body[, 1] <- indent_string(body[, 1], c(rep(0, nr_header), sr$indent),
incr = indent_size
)
formats[, 1] <- indent_string(formats[, 1], c(rep(0, nr_header), sr$indent),
incr = indent_size
)
}
#-> referential strings are added to the table. get_ref_matrix is the core of this process
# along with format_fnote_ref that in this case is used to format the reference string and their
# indices. Note that the footnotes for the header is taken from the output of .tbl_header_mat
# Handling of references in header and body
col_ref_strs <- matrix(vapply(header_content$footnotes, function(x) {
if (length(x) == 0) {
""
} else {
paste(vapply(x, format_fnote_ref, ""), collapse = " ")
}
}, ""), ncol = ncol(body))
body_ref_strs <- get_ref_matrix(obj)
body <- matrix(
paste0(
body,
rbind(
col_ref_strs, #-> col_ref_strs are added to the body as a separate section
body_ref_strs
)
),
nrow = nrow(body),
ncol = ncol(body)
)
# Solve \n in titles # This is something that is relevant in toString - NO MORE USED HERE
# if (any(grepl("\n", all_titles(obj)))) {
# if (any(grepl("\n", main_title(obj)))) {
# tmp_title_vec <- .quick_handle_nl(main_title(obj))
# main_title(obj) <- tmp_title_vec[1]
# subtitles(obj) <- c(tmp_title_vec[-1], .quick_handle_nl(subtitles(obj)))
# } else {
# subtitles(obj) <- .quick_handle_nl(subtitles(obj))
# }
# }
#
# # Solve \n in footers
# main_footer(obj) <- .quick_handle_nl(main_footer(obj))
# prov_footer(obj) <- .quick_handle_nl(prov_footer(obj))
#-> this is still under development as indicated by xxx. The idea is to allow \n also in peculiar
# cases, such as page titles and referential footnotes. The latter are resolved in toString (pagination
# will not count them as more than one line each), while for the former we do not have any coverage yet.
# xxx \n in page titles are not working atm (I think)
# ref_fnotes <- strsplit(get_formatted_fnotes(obj), "\n", fixed = TRUE)
ref_fnotes <- get_formatted_fnotes(obj) # pagination will not count extra lines coming from here
pag_titles <- page_titles(obj)
MatrixPrintForm(
strings = body, #-> FUNDAMENTAL: this is the matrix that contains all the cell strings
spans = spans,
aligns = aligns,
formats = formats,
## display = display, purely a function of spans, handled in constructor now
row_info = sr, #-> FUNDAMENTAL: this is the data.frame that contains all the information about the rows
# it is the most complex data brought forward into toString
## line_grouping handled internally now line_grouping = 1:nrow(body),
ref_fnotes = ref_fnotes,
nlines_header = nr_header, ## this is fixed internally
nrow_header = nr_header,
expand_newlines = expand_newlines,
has_rowlabs = TRUE,
has_topleft = TRUE, #-> I think topleft material is handled later in toString
main_title = main_title(obj),
subtitles = subtitles(obj),
page_titles = pag_titles,
main_footer = main_footer(obj),
prov_footer = prov_footer(obj),
table_inset = table_inset(obj),
header_section_div = header_section_div(obj),
horizontal_sep = horizontal_sep(obj),
indent_size = indent_size
)
}
)
Now lets see the matrix_form
in
rlistings
:
library(rlistings)
lsting <- as_listing(mtcars)
trace("matrix_form", signature = "listing_df", tracer = browser, exit = browser)
mf <- matrix_form(lsting)
untrace("matrix_form", signature = "listing_df")
setMethod(
"matrix_form", "listing_df",
rix_form <- function(obj, indent_rownames = FALSE) { #-> I have no idea why here there is an assignment xxx
## we intentionally silently ignore indent_rownames because listings have
## no rownames, but formatters::vert_pag_indices calls matrix_form(obj, TRUE)
## unconditionally.
# Keeping only displayed columns
cols <- attr(obj, "listing_dispcols") # this is the list of columns to be displayed
listing <- obj[, cols]
atts <- attributes(obj)
atts$names <- cols
attributes(listing) <- atts
keycols <- get_keycols(listing)
bodymat <- matrix("",
nrow = nrow(listing),
ncol = ncol(listing)
)
colnames(bodymat) <- names(listing)
# Print only first appearer of key columns if repeated
curkey <- ""
for (i in seq_along(keycols)) {
kcol <- keycols[i]
kcolvec <- listing[[kcol]]
#-> format_value transforms the values of the column into strings
kcolvec <- vapply(kcolvec, format_value, "", format = obj_format(kcolvec), na_str = obj_na_str(kcolvec))
curkey <- paste0(curkey, kcolvec)
disp <- c(TRUE, tail(curkey, -1) != head(curkey, -1)) #-> This condition only show the first appearer of a key
bodymat[disp, kcol] <- kcolvec[disp]
}
# Print all other columns directly
nonkeycols <- setdiff(names(listing), keycols)
if (length(nonkeycols) > 0) {
for (nonk in nonkeycols) {
vec <- listing[[nonk]]
vec <- vapply(vec, format_value, "", format = obj_format(vec), na_str = obj_na_str(vec))
bodymat[, nonk] <- vec
}
}
fullmat <- rbind(
var_labels(listing, fill = TRUE), # Extracts the variable labels
bodymat
)
colaligns <- rbind(
rep("center", length(cols)), # Col names are always centered?
matrix(sapply(listing, obj_align),
ncol = length(cols),
nrow = nrow(fullmat) - 1,
byrow = TRUE
)
)
MatrixPrintForm(
strings = fullmat,
spans = matrix(1,
nrow = nrow(fullmat),
ncol = ncol(fullmat)
),
ref_fnotes = list(),
aligns = colaligns,
formats = matrix(1,
nrow = nrow(fullmat),
ncol = ncol(fullmat)
),
row_info = make_row_df(obj),
nlines_header = 1, ## XXX this is probably wrong!!!
nrow_header = 1,
has_topleft = FALSE,
has_rowlabs = FALSE,
expand_newlines = TRUE, # Always expand newlines, but this happens later!! XXX to fix
main_title = main_title(obj),
subtitles = subtitles(obj),
page_titles = page_titles(obj),
main_footer = main_footer(obj),
prov_footer = prov_footer(obj)
)
}
)
We device here the good developer to search and understand the
various methods associated with MatrixPrintForm
objects. It
is relevant to remember how this printed form is meant to
# Example quick table
summary_list <- function(x, ...) as.list(summary(x))
a_table <- qtable(ex_adsl, row_vars = "SEX", col_vars = "ARM", avar = "AGE", afun = summary_list)
tbl_methods <- methods(class = class(a_table))
mpf_methods <- methods(class = class(matrix_form(a_table))[1]) # it is a list of values
# Cleaning values
tbl_methods <- unique(sapply(strsplit(tbl_methods, ","), function(x) x[1]))
mpf_methods <- unique(sapply(strsplit(mpf_methods, ","), function(x) x[1]))
setdiff(tbl_methods, mpf_methods)
# [1] "[" "as.vector" "cell_footnotes"
# [4] "cell_values" "clayout" "clear_indent_mods"
# [7] "col_counts" "col_counts<-" "col_footnotes"
# [10] "col_info" "col_info<-" "col_total"
# [13] "col_total<-" "colcount_format" "colcount_format<-"
# [16] "colcount_na_str" "colcount_na_str<-" "colcount_visible"
# [19] "colcount_visible<-" "collect_leaves" "coltree"
# [22] "content_table" "content_table<-" "dim"
# [25] "do_forced_paginate" "facet_colcount" "facet_colcount<-"
# [28] "fnotes_at_path<-" "get_formatted_cells" "head"
# [31] "header_section_div" "header_section_div<-" "horizontal_sep"
# [34] "horizontal_sep<-" "indent_mod" "indent_mod<-"
# [37] "insert_row_at_path" "names" "no_colinfo"
# [40] "nrow" "obj_format" "obj_format<-"
# [43] "obj_label" "obj_label<-" "obj_na_str"
# [46] "obj_na_str<-" "obj_name" "obj_name<-"
# [49] "rbind" "rbind2" "rm_all_colcounts"
# [52] "row_footnotes" "row.names" "show"
# [55] "str" "tail" "top_left"
# [58] "top_left<-" "tree_children" "tree_children<-"
# [61] "tt_at_path" "tt_at_path<-" "value_at"
# [64] "value_formats"
setdiff(mpf_methods, tbl_methods) # much less unique methods
# [1] "coerce" "coerce<-" "nlines" "num_rep_cols"
# [5] "num_rep_cols<-" "Ops" "rawvalues" "value_names"
intersect(tbl_methods, mpf_methods) # interesting to discover the different behaviors of same functions
# [1] "[<-" "main_footer" "main_footer<-" "main_title"
# [5] "main_title<-" "make_row_df" "matrix_form" "ncol"
# [9] "page_titles" "page_titles<-" "prov_footer" "prov_footer<-"
# [13] "section_div" "section_div<-" "subtitles" "subtitles<-"
# [17] "table_inset" "table_inset<-" "toString"
Let’s now take a look at the final function of all this:
toString
from formatters
:
setMethod("toString", "MatrixPrintForm", function(x,
widths = NULL,
tf_wrap = FALSE,
max_width = NULL,
col_gap = mf_colgap(x),
hsep = NULL,
fontspec = font_spec(),
ttype_ok = FALSE) {
# part 1: checks and widths/max width estimation for columns - propose_column_widths and .handle_max_width
#
# part 2: wrapping for the table - do_cell_fnotes_wrap
#
# part 3: column gap and cell widths calculations (after wrapping) - .calc_cell_widths
#
# part 4: collapse text body and wrapping titles/footers
#
# part 5: final cat()
})
We rely on the future developer to fill in the blanks in the above description and to follow up the various functions to their core mechanics.