Skip to contents

Cutting data by group

Usage

cut_by_group(df, col_data, col_group, group, cat_col)

Arguments

df

(dataframe) with a column of data to be cut and a column specifying the group of each observation.

col_data

(character) the column containing the data to be cut.

col_group

(character) the column containing the names of the groups according to which the data should be split.

group

(nested list) providing for each parameter value that should be analyzed in a categorical way: the name of the parameter (character), a series of breakpoints (numeric) where the first breakpoints is typically -Inf and the last Inf, and a series of name which will describe each category (character).

cat_col

(character) the name of the new column in which the cut label should he stored.

Value

data.frame with a column containing categorical values.

Details

Function used to categorize numeric data stored in long format depending on their group. Intervals are closed on the right (and open on the left).

Examples

group <- list(
  list(
    "Height",
    c(-Inf, 150, 170, Inf),
    c("=<150", "150-170", ">170")
  ),
  list(
    "Weight",
    c(-Inf, 65, Inf),
    c("=<65", ">65")
  ),
  list(
    "Age",
    c(-Inf, 31, Inf),
    c("=<31", ">31")
  ),
  list(
    "PreCondition",
    c(-Inf, 1, Inf),
    c("=<1", "<1")
  )
)
data <- data.frame(
  SUBJECT = rep(letters[1:10], 4),
  PARAM = rep(c("Height", "Weight", "Age", "other"), each = 10),
  AVAL = c(rnorm(10, 165, 15), rnorm(10, 65, 5), runif(10, 18, 65), rnorm(10, 0, 1)),
  index = 1:40
)

cut_by_group(data, "AVAL", "PARAM", group, "my_new_categories")
#>    SUBJECT  PARAM         AVAL index my_new_categories
#> 1        a Height 143.99934725     1             =<150
#> 2        b Height 168.82975582     2           150-170
#> 3        c Height 128.44104583     3             =<150
#> 4        d Height 164.91643070     4           150-170
#> 5        e Height 174.32329082     5              >170
#> 6        f Height 182.22617409     6              >170
#> 7        g Height 137.67273509     7             =<150
#> 8        h Height 161.29012047     8           150-170
#> 9        i Height 161.33700590     9           150-170
#> 10       j Height 160.75941827    10           150-170
#> 11       a Weight  62.23150308    11              =<65
#> 12       b Weight  68.14491021    12               >65
#> 13       c Weight  75.32512448    13               >65
#> 14       d Weight  56.84505299    14              =<65
#> 15       e Weight  67.56213475    15               >65
#> 16       f Weight  55.68494254    16              =<65
#> 17       g Weight  62.38993743    17              =<65
#> 18       h Weight  64.73699045    18              =<65
#> 19       i Weight  67.71498171    19               >65
#> 20       j Weight  60.42962586    20              =<65
#> 21       a    Age  49.96765713    21               >31
#> 22       b    Age  41.44574370    22               >31
#> 23       c    Age  48.15892938    23               >31
#> 24       d    Age  49.03336441    24               >31
#> 25       e    Age  22.51313543    25              =<31
#> 26       f    Age  53.98320771    26               >31
#> 27       g    Age  54.17471580    27               >31
#> 28       h    Age  64.56347868    28               >31
#> 29       i    Age  63.61448244    29               >31
#> 30       j    Age  36.29158975    30               >31
#> 31       a  other  -0.09744510    31              <NA>
#> 32       b  other  -0.93584735    32              <NA>
#> 33       c  other  -0.01595031    33              <NA>
#> 34       d  other  -0.82678895    34              <NA>
#> 35       e  other  -1.51239965    35              <NA>
#> 36       f  other   0.93536319    36              <NA>
#> 37       g  other   0.17648861    37              <NA>
#> 38       h  other   0.24368546    38              <NA>
#> 39       i  other   1.62354888    39              <NA>
#> 40       j  other   0.11203808    40              <NA>