Join Keys
NEST CoreDev
2022-07-26
join-keys.RmdOverview
teal.data gives the possibility to define individual
keys per dataset and to define the relations to other datasets.
Each table can have a set of keys that inform its structure and relation to other tables by specifying:
- columns consisting the primary key
- merge keys, a concept similar to
SQLs foreign key.
Usually, an application developer needs to specify the keys manually,
but in case of datasets named according to the ADaM
standard, teal can assign the keys on its own. Refer to
vignette("including-adam-data-in-teal", package = "teal")
for more information.
Primary key
Using the keys argument to the dataset (or
for DDL a dataset_connector) function, we can
specify column(s) of the dataset which together uniquely identify rows
in the dataset.
library(teal.data)
library(dplyr)
iris_with_keys <- mutate(iris, id = factor(row_number()))
ds <- dataset(
"IRIS",
iris_with_keys,
code = "IRIS <- iris %>% mutate(id = factor(row_number()))",
keys = c("id")
)
ds$get_keys()## [1] "id"
Merge keys
When passing multiple datasets to the cdisc_data
function, dataset relationship are set using join_keys and
join_key and these are used to merge datasets together
within teal apps. For users familiar with SQL
database schema, these relationships are symmetric and not as strict as
SQL foreign key relationships as teal does not
validate whether the values inserted into foreign key columns are
present in the parent table.
For example:
library(teal.data)
data_1 <- data.frame(X = factor(1:10), Y = 21:30, Z = 1:10)
data_2 <- data.frame(W = factor(10:1), V = factor(5:14), M = rep(1:5, 2))
data_3 <- data.frame(V = factor(5:14), T = 4)
data <- teal_data(
dataset("D1", data_1, code = "D1 <- data.frame(X = factor(1:10), Y = 21:30, Z = 1:10)"),
dataset("D2", data_2, code = "D2 <- data.frame(W = factor(10:1), V = factor(5:14), M = rep(1:5, 2))"),
dataset("D3", data_3, code = "D3 <- data.frame(V = factor(5:14), T = 4)"),
join_keys = join_keys(
join_key("D1", "D2", c("X" = "W")),
join_key("D2", "D3", c("V" = "V"))
)
)The join_keys function is used to specify keys:
-
join_keysis a collection of multiplejoin_keyentries -
join_keyspecifies the relation between two datasets:-
dataset_1,dataset_2- name of two datasets -
key- (optionally) named vector of column names
-
Note that it is assumed that join keys are symmetric,
i.e. join_key("x", "y", "x_col" = "y_col") will enable
merge from “x” to “y” and vice versa.