Skip to contents

Cutting data by group

Usage

cut_by_group(df, col_data, col_group, group, cat_col)

Arguments

df

(dataframe) with a column of data to be cut and a column specifying the group of each observation.

col_data

(character) the column containing the data to be cut.

col_group

(character) the column containing the names of the groups according to which the data should be split.

group

(nested list) providing for each parameter value that should be analyzed in a categorical way: the name of the parameter (character), a series of breakpoints (numeric) where the first breakpoints is typically -Inf and the last Inf, and a series of name which will describe each category (character).

cat_col

(character) the name of the new column in which the cut label should he stored.

Value

data.frame with a column containing categorical values.

Details

Function used to categorize numeric data stored in long format depending on their group. Intervals are closed on the right (and open on the left).

Examples

group <- list(
  list(
    "Height",
    c(-Inf, 150, 170, Inf),
    c("=<150", "150-170", ">170")
  ),
  list(
    "Weight",
    c(-Inf, 65, Inf),
    c("=<65", ">65")
  ),
  list(
    "Age",
    c(-Inf, 31, Inf),
    c("=<31", ">31")
  ),
  list(
    "PreCondition",
    c(-Inf, 1, Inf),
    c("=<1", "<1")
  )
)
data <- data.frame(
  SUBJECT = rep(letters[1:10], 4),
  PARAM = rep(c("Height", "Weight", "Age", "other"), each = 10),
  AVAL = c(rnorm(10, 165, 15), rnorm(10, 65, 5), runif(10, 18, 65), rnorm(10, 0, 1)),
  index = 1:40
)

cut_by_group(data, "AVAL", "PARAM", group, "my_new_categories")
#>    SUBJECT  PARAM         AVAL index my_new_categories
#> 1        a Height 172.54095606     1              >170
#> 2        b Height 173.00441614     2              >170
#> 3        c Height 168.59870554     3           150-170
#> 4        d Height 175.03595377     4              >170
#> 5        e Height 155.74592289     5           150-170
#> 6        f Height 175.52129490     6              >170
#> 7        g Height 179.97879427     7              >170
#> 8        h Height 184.83633513     8              >170
#> 9        i Height 134.50891177     9             =<150
#> 10       j Height 153.90255271    10           150-170
#> 11       a Weight  65.91158622    11               >65
#> 12       b Weight  62.46653355    12              =<65
#> 13       c Weight  66.18495009    13               >65
#> 14       d Weight  59.16965256    14              =<65
#> 15       e Weight  67.84969429    15               >65
#> 16       f Weight  60.82964463    16              =<65
#> 17       g Weight  56.90450390    17              =<65
#> 18       h Weight  63.52874795    18              =<65
#> 19       i Weight  64.79320247    19              =<65
#> 20       j Weight  65.18088195    20               >65
#> 21       a    Age  32.33849592    21               >31
#> 22       b    Age  24.94500545    22              =<31
#> 23       c    Age  32.85870900    23               >31
#> 24       d    Age  59.24493613    24               >31
#> 25       e    Age  53.13586305    25               >31
#> 26       f    Age  35.91108528    26               >31
#> 27       g    Age  24.75263667    27              =<31
#> 28       h    Age  33.64844490    28               >31
#> 29       i    Age  19.98112240    29              =<31
#> 30       j    Age  52.74132380    30               >31
#> 31       a  other   1.45235509    31              <NA>
#> 32       b  other  -0.59722374    32              <NA>
#> 33       c  other  -0.56014382    33              <NA>
#> 34       d  other   2.11658061    34              <NA>
#> 35       e  other   0.21469778    35              <NA>
#> 36       f  other   2.38031665    36              <NA>
#> 37       g  other   0.09243205    37              <NA>
#> 38       h  other  -0.49495036    38              <NA>
#> 39       i  other  -0.44873173    39              <NA>
#> 40       j  other   1.29003551    40              <NA>