Skip to contents

Introduction

Reformatting in dunlin consists in replacing predetermined values by another in particular variables for selected tables of a data set stored.

This is performed in two steps:

  1. A Reformatting Map (rule object) is created which specifies the correspondence between the old and the new values

  2. The reformatting itself is performed with the dunlin::reformat() function.

The Formatting Map Structure

The Reformatting Map is a rule object inheriting from character. Its names are the new values to be used, and its values are the old values to be used.

rule(A = "a", B = c("c", "d"))
#> Mapping of:
#> A  <-  a 
#> B  <-  c 
#> B  <-  d

This rule will replace “a” with “A”, replace “c” or “d” with “B”.

Calling reformat

reformat is a generic supports reformatting of character or factor. Reformatting for other types of variables is meaningless. reformat will also preserve the attributes of the original data, e.g. the data type or labels will be unchanged.

An example of reformatting character can be

r <- rule(A = "a", B = c("c", "d"))
reformat(c("a", "c", "d", NA), r)
#> [1] A    B    B    <NA>
#> Levels: A B

We can see that the NA values are not changed.

Now we test the factor reformatting:

r <- rule(A = "a", B = c("c", "d"))
reformat(factor(c("a", "c", "d", NA)), r)
#> [1] A    B    B    <NA>
#> Levels: A B

The NA values are also not changed. However, if we including reformatting for the NA, there is something different:

r <- rule(A = "a", B = c("c", "d"), C = NA)
reformat(factor(c("a", "c", "d", NA)), r)
#> [1] A B B C
#> Levels: A B C

Please note that the level for NA is always the last one, if that new level only has NA.

For dm objects, the format argument is actually a nested list of rule. The first layer indicates the table names, the second layer indicates the variables in that table.

The All keyword, in first layer, lower or Mixed case, can be used instead of a table name to indicate that a particular variable should be changed in every table where it appears.

Example of Reformatting Map

my_map <- list(
  # This is the Table Name.
  airlines = list(
    # This is the Variable Name.
    name = rule(
      "AE" = c("American Airlines Inc."),
      "Alaska and Hawaiian Airlines" = c("Alaska Airlines Inc.", "Hawaiian Airlines Inc.")
    )
  ),
  planes = list(
    manufacturer = rule(
      "Airbus" = "AIRBUS INDUSTRIE",
      "New Level" = "new_level",
      "<Missing>" = NA
    ),
    model = rule(
      "EMB-145" = c("EMB-145XR"),
      "Other 737" = c("737-824", "737-724", "737-732")
    )
  )
)

db <- dm::dm_nycflights13()

db_formatted <- reformat(db, my_map)
head(db_formatted$planes$model)
#> [1] EMB-145   A320-214  EMB-145LR A320-214  A320-214  EMB-145  
#> 55 Levels: 150 172N 65-A90 717-200 737-3H4 737-401 737-524 ... R66