Skip to contents

Disclaimer

In comparison to other entries of the developer guide, this is intended to keep track of the general concepts and processing pipeline behind the printing machinery. It is not intended to be a complete documentation of the machinery itself, but rather a collection of notes that can be used to understand the machinery and its internals. Hence, be aware that this is a working document that captures a snapshot of the machinery at a certain point in time. It is not meant to be fully maintained, but it can be used as a starting point for one. Compared to other parts of the developer guide, this will contain the current state of rlistings’ printing machinery, which is often a simplified version of the machinery that is used in rtables.

How print works

Lets track down what is going under the hood when a standard table is printed. The following is the code that is executed when a table is printed:

# Loading required package: formatters
# 
# Attaching package: 'formatters'
# The following object is masked from 'package:base':
# 
#     %||%
# Loading required package: magrittr
# 
# Attaching package: 'rtables'
# The following object is masked from 'package:utils':
# 
#     str
# 
# Attaching package: 'dplyr'
# The following objects are masked from 'package:stats':
# 
#     filter, lag
# The following objects are masked from 'package:base':
# 
#     intersect, setdiff, setequal, union
lyt <- basic_table() %>%
  split_rows_by("SEX", split_fun = keep_split_levels(c("F", "M"))) %>%
  split_cols_by("ARM") %>%
  analyze("BMRKR1") %>%
  print()
# A Pre-data Table Layout
# 
# Column-Split Structure:
# ARM (lvls) 
# 
# Row-Split Structure:
# SEX (lvls) -> BMRKR1 (** analysis **)
tbl <- build_table(lyt, ex_adsl) %>%
  print()
#          A: Drug X   B: Placebo   C: Combination
# ————————————————————————————————————————————————
# F                                               
#   Mean     5.75         5.59           5.68     
# M                                               
#   Mean     6.27         5.87           5.34

We see that also a layout object (PreDataTableLayouts) is created and printed. This is because print is a generic function that dispatches to different methods depending on the class of the object. In this case, the S4 class of the object is PreDataTableLayouts and the method that is called is print. In the case of {rtables} the method is dispatched towards the show method of the class PreDataTableLayouts. It can be found by searching A Pre-data Table Layout into {rtables} source code. I think R dispatcher for print methods looks for show S4 methods instead if there are no S3 or S4 print methods available. Indeed, this is the code that is executed:

setMethod(
  "show", "PreDataTableLayouts",
  function(object) {
    cat("A Pre-data Table Layout\n")
    cat("\nColumn-Split Structure:\n")
    docat_predataxis(object@col_layout)
    cat("\nRow-Split Structure:\n")
    docat_predataxis(object@row_layout)
    cat("\n")
    invisible(object)
  }
)

This was evident if we searched for methods associated with the class PreDataTableLayouts, where only show is connected to a sort of printing machinery:

methods(class = "PreDataTableLayouts")
#  [1] .add_row_summary        clayout                 clayout<-              
#  [4] col_exprs               colcount_format         colcount_format<-      
#  [7] coltree                 header_section_div      header_section_div<-   
# [10] main_footer             main_footer<-           main_title             
# [13] main_title<-            prov_footer             prov_footer<-          
# [16] show                    subtitles               subtitles<-            
# [19] table_inset             table_inset<-           top_left               
# [22] top_left<-              top_level_section_div   top_level_section_div<-
# [25] vars_in_layout         
# see '?methods' for accessing help and source code

Now, lets see the same for our result table tbl:

class(tbl) %>% print()
# [1] "TableTree"
# attr(,"package")
# [1] "rtables"
getClass("TableTree") %>% print() # Main object representing a table in {rtables}
# Class "TableTree" [package "rtables"]
# 
# Slots:
#                                                                            
# Name:                 content      page_title_prefix               children
# Class:        ElementaryTable              character                   list
#                                                                            
# Name:                rowspans               labelrow            page_titles
# Class:             data.frame               LabelRow              character
#                                                                            
# Name:          horizontal_sep     header_section_div   trailing_section_div
# Class:              character              character              character
#                                                                            
# Name:                col_info                 format                 na_str
# Class: InstantiatedColumnInfo             FormatSpec              character
#                                                                            
# Name:         indent_modifier            table_inset                  level
# Class:                integer                integer                integer
#                                                                            
# Name:                    name             main_title              subtitles
# Class:              character              character              character
#                                                     
# Name:             main_footer      provenance_footer
# Class:              character              character
# 
# Extends: 
# Class "VTableTree", directly
# Class "VTableNodeInfo", by class "VTableTree", distance 2
# Class "VTree", by class "VTableTree", distance 2
# Class "VTitleFooter", by class "VTableTree", distance 2
# Class "VNodeInfo", by class "VTableTree", distance 3
methods(class = "TableTree") %>% print() # more than 70 methods but no print method
#  [1] [                    [<-                  as.vector           
#  [4] cell_footnotes       cell_values          clayout             
#  [7] clear_indent_mods    col_counts           col_counts<-        
# [10] col_footnotes        col_info             col_info<-          
# [13] col_total            col_total<-          colcount_format     
# [16] colcount_format<-    colcount_na_str      colcount_na_str<-   
# [19] colcount_visible     colcount_visible<-   collect_leaves      
# [22] coltree              content_table        content_table<-     
# [25] dim                  do_forced_paginate   facet_colcount      
# [28] facet_colcount<-     fnotes_at_path<-     get_formatted_cells 
# [31] head                 header_section_div   header_section_div<-
# [34] horizontal_sep       horizontal_sep<-     indent_mod          
# [37] indent_mod<-         insert_row_at_path   main_footer         
# [40] main_footer<-        main_title           main_title<-        
# [43] make_row_df          matrix_form          names               
# [46] ncol                 no_colinfo           nrow                
# [49] obj_format           obj_format<-         obj_label           
# [52] obj_label<-          obj_na_str           obj_na_str<-        
# [55] obj_name             obj_name<-           page_titles         
# [58] page_titles<-        prov_footer          prov_footer<-       
# [61] rbind                rbind2               rm_all_colcounts    
# [64] row_footnotes        row.names            section_div         
# [67] section_div<-        show                 str                 
# [70] subtitles            subtitles<-          table_inset         
# [73] table_inset<-        tail                 top_left            
# [76] top_left<-           toString             tree_children       
# [79] tree_children<-      tt_at_path           tt_at_path<-        
# [82] value_at             value_formats       
# see '?methods' for accessing help and source code

Again, the class itself has only the show method. Nonetheless, if you search for VTableTree" you will find the print method for the TableTree class. This is because VTableTree is a virtual class that is inherited by TableTree and is almost identical to the show method for TableTree objects. All the different statements in this case (show or print) do the same thing, i.e. they call toString and cat on the object. Hence, we know that every table is printed by toString with \n as separator for different lines so that cat renders it in its final format.

From matrix_form to toString

If we have source code of formatters, rtables, and rlistings in our local we can search for "toString" S4 method definition across these source folders. We will find generics in formatters and three different setMethod(...). toString is properly defined in formatters, but it is also present in rlistings andrtables. Let’s take a look at the latter first.

setMethod("toString", "VTableTree", function(x,
                                             widths = NULL,
                                             col_gap = 3,
                                             hsep = horizontal_sep(x),
                                             indent_size = 2,
                                             tf_wrap = FALSE,
                                             max_width = NULL) {
  toString(
    matrix_form(x,
      indent_rownames = TRUE,
      indent_size = indent_size # Only modifies the rownames in matrix_form
    ),
    widths = widths, col_gap = col_gap,
    hsep = hsep,
    tf_wrap = tf_wrap,
    max_width = max_width
  )
})

This is only a wrapper/dispatcher to the core toString function in formatters, beside the indent_size specification. This is based on the “rendering-ready” class MatrixPrintForm that is produced by matrix_form. The latter is the first core transformation that we need to know to understand the printing process. All exporters and printers are based on MatrixPrintForm objects, hence any bug or problem needs to tracked down to this function or toString. If we take a look at toString for "listing_df" in rlistings, we will find a shallow wrapper that dispatches to MatrixPrintForm objects:

setMethod("toString", "listing_df", function(x, ...) {
  toString(matrix_form(x), ...)
})

Hence lets take a look at "matrix_form" (if there are quotes, it is an S4 function from now on). Beside generics and self calls (setMethod("matrix_form", "MatrixPrintForm", [...] obj)), rlistings and rtables have their own “constructor” of MatrixPrintForm (the real one can be found in formatters). Let’s start with the latter "matrix_form" which is dispatched when dealing with VTableTrees.

# Entering matrix_form for VTableTree
trace("matrix_form", signature = "VTableTree", tracer = browser, exit = browser)
matrix_form(tbl)
untrace("matrix_form", signature = "VTableTree")

Now lets see the newly commented code for matrix_form. With #-> I will comment some suggestions for further understandings.

setMethod(
  "matrix_form", "VTableTree",
  function(obj,
           indent_rownames = FALSE,
           expand_newlines = TRUE,
           indent_size = 2) {
    stopifnot(is(obj, "VTableTree"))

    #-> Read .tbl_header_mat and subfunctions (based largely on cinfo) it can help for understanding
    #   column structure and how it is printed (we can add a description of this process xxx)
    #   Note: it contains the display of column counts directives and specifics
    header_content <- .tbl_header_mat(obj) # first col are for row.names or topleft info
    nr_header <- nrow(header_content$body) # colcounts were added in .tbl_header_mat

    #-> As before, reading this function can help understanding how the content of the table is transformed
    #   in row content and how the structure of the table is preserved in a compact manner. It is complex
    #   function as it is a recursive one with the different dispatcher but following how different section_div
    #   are printed (with the dedicated assignment function) can help understanding the table structure and its
    #   row-wise transformation.
    # Summary of row contents - reprint_inds specifies which rows to reprint (hence the grouping)
    sr <- make_row_df(obj)

    # With get_formatted_cells we get relevant information inside the table tree
    body_content_strings <- if (NROW(sr) == 0) {
      character()
    } else {
      #-> get_formatted_cells is an interesting function to understand the structure of the table as
      #   it is design to extract only the "data" of the table as strings. Note how the label rows are
      #   taken from make_row_df instead. Check shell = TRUE afterwards to see how the format are retrieved.
      cbind(as.character(sr$label), get_formatted_cells(obj))
    }

    formats_strings <- if (NROW(sr) == 0) {
      character()
    } else {
      cbind("", get_formatted_cells(obj, shell = TRUE))
    }

    #-> Here spans are extracted for each row. Spans are rarely modified beyond its standard values.
    # Takes the flatten spans for each row and repeats them according to the number elements
    tsptmp <- lapply(collect_leaves(obj, TRUE, TRUE), function(rr) {
      sp <- row_cspans(rr)
      rep(sp, times = sp)
    })

    ## the 1 is for row labels
    body_spans <- if (nrow(obj) > 0) {
      cbind(1L, do.call(rbind, tsptmp))
    } else {
      matrix(1, nrow = 0, ncol = ncol(obj) + 1)
    }

    body_aligns <- if (NROW(sr) == 0) {
      character()
    } else {
      cbind("left", get_cell_aligns(obj)) #-> extracts align values for each cell
    }

    body <- rbind(header_content$body, body_content_strings)

    # Init column format for header (empty if not for column counts)
    hdr_fmt_blank <- matrix("",
      nrow = nrow(header_content$body),
      ncol = ncol(header_content$body)
    )
    # If column counts are displayed, add column count format
    if (disp_ccounts(obj)) {
      hdr_fmt_blank[nrow(hdr_fmt_blank), ] <- c("", rep(colcount_format(obj), ncol(obj)))
    }

    formats <- rbind(hdr_fmt_blank, formats_strings)

    spans <- rbind(header_content$span, body_spans)
    row.names(spans) <- NULL

    aligns <- rbind(
      matrix(rep("center", length(header_content$body)),
        nrow = nrow(header_content$body)
      ),
      body_aligns
    )

    aligns[, 1] <- "left" # row names and topleft (still needed for topleft)

    # Main indentation of the table rownames #-> Main indentation facility
    if (indent_rownames) {
      body[, 1] <- indent_string(body[, 1], c(rep(0, nr_header), sr$indent),
        incr = indent_size
      )
      formats[, 1] <- indent_string(formats[, 1], c(rep(0, nr_header), sr$indent),
        incr = indent_size
      )
    }

    #-> referential strings are added to the table. get_ref_matrix is the core of this process
    #   along with format_fnote_ref that in this case is used to format the reference string and their
    #   indices. Note that the footnotes for the header is taken from the output of .tbl_header_mat
    # Handling of references in header and body
    col_ref_strs <- matrix(vapply(header_content$footnotes, function(x) {
      if (length(x) == 0) {
        ""
      } else {
        paste(vapply(x, format_fnote_ref, ""), collapse = " ")
      }
    }, ""), ncol = ncol(body))
    body_ref_strs <- get_ref_matrix(obj)
    body <- matrix(
      paste0(
        body,
        rbind(
          col_ref_strs, #-> col_ref_strs are added to the body as a separate section
          body_ref_strs
        )
      ),
      nrow = nrow(body),
      ncol = ncol(body)
    )

    # Solve \n in titles # This is something that is relevant in toString - NO MORE USED HERE
    # if (any(grepl("\n", all_titles(obj)))) {
    #   if (any(grepl("\n", main_title(obj)))) {
    #     tmp_title_vec <- .quick_handle_nl(main_title(obj))
    #     main_title(obj) <- tmp_title_vec[1]
    #     subtitles(obj) <- c(tmp_title_vec[-1], .quick_handle_nl(subtitles(obj)))
    #   } else {
    #     subtitles(obj) <- .quick_handle_nl(subtitles(obj))
    #   }
    # }
    #
    # # Solve \n in footers
    # main_footer(obj) <- .quick_handle_nl(main_footer(obj))
    # prov_footer(obj) <- .quick_handle_nl(prov_footer(obj))

    #-> this is still under development as indicated by xxx. The idea is to allow \n also in peculiar
    #   cases, such as page titles and referential footnotes. The latter are resolved in toString (pagination
    #   will not count them as more than one line each), while for the former we do not have any coverage yet.
    # xxx \n in page titles are not working atm (I think)
    # ref_fnotes <- strsplit(get_formatted_fnotes(obj), "\n", fixed = TRUE)
    ref_fnotes <- get_formatted_fnotes(obj) # pagination will not count extra lines coming from here
    pag_titles <- page_titles(obj)

    MatrixPrintForm(
      strings = body, #-> FUNDAMENTAL: this is the matrix that contains all the cell strings
      spans = spans,
      aligns = aligns,
      formats = formats,
      ## display = display, purely a function of spans, handled in constructor now
      row_info = sr, #-> FUNDAMENTAL: this is the data.frame that contains all the information about the rows
      #   it is the most complex data brought forward into toString
      ## line_grouping handled internally now line_grouping = 1:nrow(body),
      ref_fnotes = ref_fnotes,
      nlines_header = nr_header, ## this is fixed internally
      nrow_header = nr_header,
      expand_newlines = expand_newlines,
      has_rowlabs = TRUE,
      has_topleft = TRUE, #-> I think topleft material is handled later in toString
      main_title = main_title(obj),
      subtitles = subtitles(obj),
      page_titles = pag_titles,
      main_footer = main_footer(obj),
      prov_footer = prov_footer(obj),
      table_inset = table_inset(obj),
      header_section_div = header_section_div(obj),
      horizontal_sep = horizontal_sep(obj),
      indent_size = indent_size
    )
  }
)

Now lets see the matrix_form in rlistings:

library(rlistings)
lsting <- as_listing(mtcars)
trace("matrix_form", signature = "listing_df", tracer = browser, exit = browser)
mf <- matrix_form(lsting)
untrace("matrix_form", signature = "listing_df")
setMethod(
  "matrix_form", "listing_df",
  rix_form <- function(obj, indent_rownames = FALSE) { #-> I have no idea why here there is an assignment xxx
    ##  we intentionally silently ignore indent_rownames because listings have
    ## no rownames, but formatters::vert_pag_indices calls matrix_form(obj, TRUE)
    ## unconditionally.

    # Keeping only displayed columns
    cols <- attr(obj, "listing_dispcols") # this is the list of columns to be displayed
    listing <- obj[, cols]
    atts <- attributes(obj)
    atts$names <- cols
    attributes(listing) <- atts
    keycols <- get_keycols(listing)

    bodymat <- matrix("",
      nrow = nrow(listing),
      ncol = ncol(listing)
    )

    colnames(bodymat) <- names(listing)

    # Print only first appearer of key columns if repeated
    curkey <- ""
    for (i in seq_along(keycols)) {
      kcol <- keycols[i]
      kcolvec <- listing[[kcol]]
      #-> format_value transforms the values of the column into strings
      kcolvec <- vapply(kcolvec, format_value, "", format = obj_format(kcolvec), na_str = obj_na_str(kcolvec))
      curkey <- paste0(curkey, kcolvec)
      disp <- c(TRUE, tail(curkey, -1) != head(curkey, -1)) #-> This condition only show the first appearer of a key
      bodymat[disp, kcol] <- kcolvec[disp]
    }

    # Print all other columns directly
    nonkeycols <- setdiff(names(listing), keycols)
    if (length(nonkeycols) > 0) {
      for (nonk in nonkeycols) {
        vec <- listing[[nonk]]
        vec <- vapply(vec, format_value, "", format = obj_format(vec), na_str = obj_na_str(vec))
        bodymat[, nonk] <- vec
      }
    }

    fullmat <- rbind(
      var_labels(listing, fill = TRUE), # Extracts the variable labels
      bodymat
    )

    colaligns <- rbind(
      rep("center", length(cols)), # Col names are always centered?
      matrix(sapply(listing, obj_align),
        ncol = length(cols),
        nrow = nrow(fullmat) - 1,
        byrow = TRUE
      )
    )

    MatrixPrintForm(
      strings = fullmat,
      spans = matrix(1,
        nrow = nrow(fullmat),
        ncol = ncol(fullmat)
      ),
      ref_fnotes = list(),
      aligns = colaligns,
      formats = matrix(1,
        nrow = nrow(fullmat),
        ncol = ncol(fullmat)
      ),
      row_info = make_row_df(obj),
      nlines_header = 1, ## XXX this is probably wrong!!!
      nrow_header = 1,
      has_topleft = FALSE,
      has_rowlabs = FALSE,
      expand_newlines = TRUE, # Always expand newlines, but this happens later!! XXX to fix
      main_title = main_title(obj),
      subtitles = subtitles(obj),
      page_titles = page_titles(obj),
      main_footer = main_footer(obj),
      prov_footer = prov_footer(obj)
    )
  }
)

We device here the good developer to search and understand the various methods associated with MatrixPrintForm objects. It is relevant to remember how this printed form is meant to

# Example quick table
summary_list <- function(x, ...) as.list(summary(x))
a_table <- qtable(ex_adsl, row_vars = "SEX", col_vars = "ARM", avar = "AGE", afun = summary_list)
tbl_methods <- methods(class = class(a_table))
mpf_methods <- methods(class = class(matrix_form(a_table))[1]) # it is a list of values

# Cleaning values
tbl_methods <- unique(sapply(strsplit(tbl_methods, ","), function(x) x[1]))
mpf_methods <- unique(sapply(strsplit(mpf_methods, ","), function(x) x[1]))
setdiff(tbl_methods, mpf_methods)
#  [1] "["                    "as.vector"            "cell_footnotes"      
#  [4] "cell_values"          "clayout"              "clear_indent_mods"   
#  [7] "col_counts"           "col_counts<-"         "col_footnotes"       
# [10] "col_info"             "col_info<-"           "col_total"           
# [13] "col_total<-"          "colcount_format"      "colcount_format<-"   
# [16] "colcount_na_str"      "colcount_na_str<-"    "colcount_visible"    
# [19] "colcount_visible<-"   "collect_leaves"       "coltree"             
# [22] "content_table"        "content_table<-"      "dim"                 
# [25] "do_forced_paginate"   "facet_colcount"       "facet_colcount<-"    
# [28] "fnotes_at_path<-"     "get_formatted_cells"  "head"                
# [31] "header_section_div"   "header_section_div<-" "horizontal_sep"      
# [34] "horizontal_sep<-"     "indent_mod"           "indent_mod<-"        
# [37] "insert_row_at_path"   "names"                "no_colinfo"          
# [40] "nrow"                 "obj_format"           "obj_format<-"        
# [43] "obj_label"            "obj_label<-"          "obj_na_str"          
# [46] "obj_na_str<-"         "obj_name"             "obj_name<-"          
# [49] "rbind"                "rbind2"               "rm_all_colcounts"    
# [52] "row_footnotes"        "row.names"            "show"                
# [55] "str"                  "tail"                 "top_left"            
# [58] "top_left<-"           "tree_children"        "tree_children<-"     
# [61] "tt_at_path"           "tt_at_path<-"         "value_at"            
# [64] "value_formats"
setdiff(mpf_methods, tbl_methods) # much less unique methods
# [1] "coerce"         "coerce<-"       "nlines"         "num_rep_cols"  
# [5] "num_rep_cols<-" "Ops"            "rawvalues"      "value_names"
intersect(tbl_methods, mpf_methods) # interesting to discover the different behaviors of same functions
#  [1] "[<-"           "main_footer"   "main_footer<-" "main_title"   
#  [5] "main_title<-"  "make_row_df"   "matrix_form"   "ncol"         
#  [9] "page_titles"   "page_titles<-" "prov_footer"   "prov_footer<-"
# [13] "section_div"   "section_div<-" "subtitles"     "subtitles<-"  
# [17] "table_inset"   "table_inset<-" "toString"

Let’s now take a look at the final function of all this: toString from formatters:

setMethod("toString", "MatrixPrintForm", function(x,
                                                  widths = NULL,
                                                  tf_wrap = FALSE,
                                                  max_width = NULL,
                                                  col_gap = mf_colgap(x),
                                                  hsep = NULL,
                                                  fontspec = font_spec(),
                                                  ttype_ok = FALSE) {
  # part 1: checks and widths/max width estimation for columns - propose_column_widths and .handle_max_width
  #
  # part 2: wrapping for the table - do_cell_fnotes_wrap
  #
  # part 3: column gap and cell widths calculations (after wrapping) - .calc_cell_widths
  #
  # part 4: collapse text body and wrapping titles/footers
  #
  # part 5: final cat()
})

We rely on the future developer to fill in the blanks in the above description and to follow up the various functions to their core mechanics.