Join Keys

Overview

teal.data gives the possibility to define individual keys per dataset and to define the relations to other datasets.

Each table can have a set of keys that inform its structure and relation to other tables by specifying:

columns consisting the primary key
merge keys, a concept similar to SQLs foreign key.

Usually, an application developer needs to specify the keys manually, but in case of datasets named according to the ADaM standard, teal can assign the keys on its own. Refer to vignette("including-adam-data-in-teal", package = "teal") for more information.

Primary key

Using the keys argument to the dataset (or for DDL a dataset_connector) function, we can specify column(s) of the dataset which together uniquely identify rows in the dataset.

library(teal.data)
library(dplyr)

iris_with_keys <- mutate(iris, id = factor(row_number()))

ds <- dataset(
  "IRIS",
  iris_with_keys,
  code = "IRIS <- iris %>% mutate(id = factor(row_number()))",
  keys = c("id")
)

ds$get_keys()

## [1] "id"

Merge keys

When passing multiple datasets to the cdisc_data function, dataset relationship are set using join_keys and join_key and these are used to merge datasets together within teal apps. For users familiar with SQL database schema, these relationships are symmetric and not as strict as SQL foreign key relationships as teal does not validate whether the values inserted into foreign key columns are present in the parent table.

For example:

library(teal.data)

data_1 <- data.frame(X = factor(1:10), Y = 21:30, Z = 1:10)
data_2 <- data.frame(W = factor(10:1), V = factor(5:14), M = rep(1:5, 2))
data_3 <- data.frame(V = factor(5:14), T = 4)

data <- teal_data(
  dataset("D1", data_1, code = "D1 <- data.frame(X = factor(1:10), Y = 21:30, Z = 1:10)"),
  dataset("D2", data_2, code = "D2 <- data.frame(W = factor(10:1), V = factor(5:14), M = rep(1:5, 2))"),
  dataset("D3", data_3, code = "D3 <- data.frame(V = factor(5:14), T = 4)"),
  join_keys = join_keys(
    join_key("D1", "D2", c("X" = "W")),
    join_key("D2", "D3", c("V" = "V"))
  )
)

The join_keys function is used to specify keys:

join_keys is a collection of multiple join_key entries
join_key specifies the relation between two datasets:
- dataset_1, dataset_2 - name of two datasets
- key - (optionally) named vector of column names

Note that it is assumed that join keys are symmetric, i.e. join_key("x", "y", "x_col" = "y_col") will enable merge from “x” to “y” and vice versa.

NEST CoreDev

2022-07-26

Overview

Primary key

Merge keys