S3 generic for creating an information summary about the duplicate key values in a dataset

Usage

get_key_duplicates(dataset, keys = NULL)

# S3 method for TealDataset
get_key_duplicates(dataset, keys = NULL)

# S3 method for data.frame
get_key_duplicates(dataset, keys = NULL)

Arguments

dataset: TealDataset or data.frame a dataset, which will be tested
keys: character vector of variable names in dataset consisting the key or keys object, which does have a primary element with a vector of variable names in dataset consisting the key. Optional, default: NULL

Value

a tibble with variables consisting the key and row_no and duplicates_count columns

Details

The information summary provides row numbers and number of duplicates for each duplicated key value.

Note

Raises an exception when this function cannot determine the primary key columns of the tested object.

Examples

library(scda)

adsl <- synthetic_cdisc_data("latest")$adsl
# create a TealDataset with default keys
rel_adsl <- cdisc_dataset("ADSL", adsl)
get_key_duplicates(rel_adsl)
#> # A tibble: 0 × 4
#> # ℹ 4 variables: STUDYID <chr>, USUBJID <chr>, rows <chr>, n <int>

df <- as.data.frame(
  list(a = c("a", "a", "b", "b", "c"), b = c(1, 2, 3, 3, 4), c = c(1, 2, 3, 4, 5))
)
res <- get_key_duplicates(df, keys = c("a", "b")) # duplicated keys are in rows 3 and 4
print(res) # prints a tibble
#>   a b rows n
#> 1 b 3  3,4 2
if (FALSE) {
get_key_duplicates(df) # raises an exception, because keys are missing with no default
}