Skip to contents

Cutting data by group

Usage

cut_by_group(df, col_data, col_group, group, cat_col)

Arguments

df

(dataframe) with a column of data to be cut and a column specifying the group of each observation.

col_data

(character) the column containing the data to be cut.

col_group

(character) the column containing the names of the groups according to which the data should be split.

group

(nested list) providing for each parameter value that should be analyzed in a categorical way: the name of the parameter (character), a series of breakpoints (numeric) where the first breakpoints is typically -Inf and the last Inf, and a series of name which will describe each category (character).

cat_col

(character) the name of the new column in which the cut label should he stored.

Value

data.frame with a column containing categorical values.

Details

Function used to categorize numeric data stored in long format depending on their group. Intervals are closed on the right (and open on the left).

Examples

group <- list(
  list(
    "Height",
    c(-Inf, 150, 170, Inf),
    c("=<150", "150-170", ">170")
  ),
  list(
    "Weight",
    c(-Inf, 65, Inf),
    c("=<65", ">65")
  ),
  list(
    "Age",
    c(-Inf, 31, Inf),
    c("=<31", ">31")
  ),
  list(
    "PreCondition",
    c(-Inf, 1, Inf),
    c("=<1", "<1")
  )
)
data <- data.frame(
  SUBJECT = rep(letters[1:10], 4),
  PARAM = rep(c("Height", "Weight", "Age", "other"), each = 10),
  AVAL = c(rnorm(10, 165, 15), rnorm(10, 65, 5), runif(10, 18, 65), rnorm(10, 0, 1)),
  index = 1:40
)

cut_by_group(data, "AVAL", "PARAM", group, "my_new_categories")
#>    SUBJECT  PARAM        AVAL index my_new_categories
#> 1        a Height 150.2474996     1           150-170
#> 2        b Height 160.2438199     2           150-170
#> 3        c Height 159.8346757     3           150-170
#> 4        d Height 161.4761335     4           150-170
#> 5        e Height 152.2026377     5           150-170
#> 6        f Height 177.0372322     6              >170
#> 7        g Height 162.1142851     7           150-170
#> 8        h Height 153.4093223     8           150-170
#> 9        i Height 143.3413229     9             =<150
#> 10       j Height 155.7109455    10           150-170
#> 11       a Weight  63.5581479    11              =<65
#> 12       b Weight  60.5120494    12              =<65
#> 13       c Weight  69.5858252    13               >65
#> 14       d Weight  56.6772157    14              =<65
#> 15       e Weight  71.2238393    15               >65
#> 16       f Weight  61.0294284    16              =<65
#> 17       g Weight  63.4576631    17              =<65
#> 18       h Weight  53.1406687    18              =<65
#> 19       i Weight  64.2872708    19              =<65
#> 20       j Weight  61.7442239    20              =<65
#> 21       a    Age  37.2859665    21               >31
#> 22       b    Age  24.5437195    22              =<31
#> 23       c    Age  34.7337781    23               >31
#> 24       d    Age  62.9851289    24               >31
#> 25       e    Age  56.6243635    25               >31
#> 26       f    Age  60.6574147    26               >31
#> 27       g    Age  24.9621549    27              =<31
#> 28       h    Age  49.4233013    28               >31
#> 29       i    Age  20.1213169    29              =<31
#> 30       j    Age  46.4139592    30               >31
#> 31       a  other   0.1677149    31              <NA>
#> 32       b  other  -0.1914269    32              <NA>
#> 33       c  other  -0.8308468    33              <NA>
#> 34       d  other   0.9193673    34              <NA>
#> 35       e  other  -0.2474007    35              <NA>
#> 36       f  other  -1.1968478    36              <NA>
#> 37       g  other  -0.8537118    37              <NA>
#> 38       h  other   1.7215138    38              <NA>
#> 39       i  other  -0.1913612    39              <NA>
#> 40       j  other   0.8258927    40              <NA>