Skip to contents

Cutting data by group

Usage

cut_by_group(df, col_data, col_group, group, cat_col)

Arguments

df

(dataframe) with a column of data to be cut and a column specifying the group of each observation.

col_data

(character) the column containing the data to be cut.

col_group

(character) the column containing the names of the groups according to which the data should be split.

group

(nested list) providing for each parameter value that should be analyzed in a categorical way: the name of the parameter (character), a series of breakpoints (numeric) where the first breakpoints is typically -Inf and the last Inf, and a series of name which will describe each category (character).

cat_col

(character) the name of the new column in which the cut label should he stored.

Value

data.frame with a column containing categorical values.

Details

Function used to categorize numeric data stored in long format depending on their group. Intervals are closed on the right (and open on the left).

Examples

group <- list(
  list(
    "Height",
    c(-Inf, 150, 170, Inf),
    c("=<150", "150-170", ">170")
  ),
  list(
    "Weight",
    c(-Inf, 65, Inf),
    c("=<65", ">65")
  ),
  list(
    "Age",
    c(-Inf, 31, Inf),
    c("=<31", ">31")
  ),
  list(
    "PreCondition",
    c(-Inf, 1, Inf),
    c("=<1", "<1")
  )
)
data <- data.frame(
  SUBJECT = rep(letters[1:10], 4),
  PARAM = rep(c("Height", "Weight", "Age", "other"), each = 10),
  AVAL = c(rnorm(10, 165, 15), rnorm(10, 65, 5), runif(10, 18, 65), rnorm(10, 0, 1)),
  index = 1:40
)

cut_by_group(data, "AVAL", "PARAM", group, "my_new_categories")
#>    SUBJECT  PARAM         AVAL index my_new_categories
#> 1        a Height 150.16181269     1           150-170
#> 2        b Height 160.83339608     2           150-170
#> 3        c Height 186.79351841     3              >170
#> 4        d Height 172.69350372     4              >170
#> 5        e Height 183.77809080     5              >170
#> 6        f Height 168.14531483     6           150-170
#> 7        g Height 146.19756317     7             =<150
#> 8        h Height 129.26675938     8             =<150
#> 9        i Height 209.74442996     9              >170
#> 10       j Height 152.85820981    10           150-170
#> 11       a Weight  64.06046664    11              =<65
#> 12       b Weight  63.43253866    12              =<65
#> 13       c Weight  66.39741127    13               >65
#> 14       d Weight  54.07526968    14              =<65
#> 15       e Weight  73.17783724    15               >65
#> 16       f Weight  63.29666004    16              =<65
#> 17       g Weight  64.07823378    17              =<65
#> 18       h Weight  69.47104528    18               >65
#> 19       i Weight  65.99411509    19               >65
#> 20       j Weight  68.19927414    20               >65
#> 21       a    Age  34.16067579    21               >31
#> 22       b    Age  59.77740839    22               >31
#> 23       c    Age  30.63245624    23              =<31
#> 24       d    Age  37.50739942    24               >31
#> 25       e    Age  60.43900822    25               >31
#> 26       f    Age  35.06835120    26               >31
#> 27       g    Age  40.31808378    27               >31
#> 28       h    Age  29.95390546    28              =<31
#> 29       i    Age  57.16675317    29               >31
#> 30       j    Age  44.83090528    30               >31
#> 31       a  other  -0.06248035    31              <NA>
#> 32       b  other  -0.54276481    32              <NA>
#> 33       c  other   0.70858711    33              <NA>
#> 34       d  other   0.06619230    34              <NA>
#> 35       e  other  -2.79775409    35              <NA>
#> 36       f  other  -0.67024602    36              <NA>
#> 37       g  other  -2.80488323    37              <NA>
#> 38       h  other  -0.51134810    38              <NA>
#> 39       i  other   0.56904403    39              <NA>
#> 40       j  other  -0.36327872    40              <NA>