Skip to contents

Cutting data by group

Usage

cut_by_group(df, col_data, col_group, group, cat_col)

Arguments

df

(dataframe) with a column of data to be cut and a column specifying the group of each observation.

col_data

(character) the column containing the data to be cut.

col_group

(character) the column containing the names of the groups according to which the data should be split.

group

(nested list) providing for each parameter value that should be analyzed in a categorical way: the name of the parameter (character), a series of breakpoints (numeric) where the first breakpoints is typically -Inf and the last Inf, and a series of name which will describe each category (character).

cat_col

(character) the name of the new column in which the cut label should he stored.

Value

data.frame with a column containing categorical values.

Details

Function used to categorize numeric data stored in long format depending on their group. Intervals are closed on the right (and open on the left).

Examples

group <- list(
  list(
    "Height",
    c(-Inf, 150, 170, Inf),
    c("=<150", "150-170", ">170")
  ),
  list(
    "Weight",
    c(-Inf, 65, Inf),
    c("=<65", ">65")
  ),
  list(
    "Age",
    c(-Inf, 31, Inf),
    c("=<31", ">31")
  ),
  list(
    "PreCondition",
    c(-Inf, 1, Inf),
    c("=<1", "<1")
  )
)
data <- data.frame(
  SUBJECT = rep(letters[1:10], 4),
  PARAM = rep(c("Height", "Weight", "Age", "other"), each = 10),
  AVAL = c(rnorm(10, 165, 15), rnorm(10, 65, 5), runif(10, 18, 65), rnorm(10, 0, 1)),
  index = 1:40
)

cut_by_group(data, "AVAL", "PARAM", group, "my_new_categories")
#>    SUBJECT  PARAM        AVAL index my_new_categories
#> 1        a Height 159.0732619     1           150-170
#> 2        b Height 159.2284416     2           150-170
#> 3        c Height 163.7512350     3           150-170
#> 4        d Height 172.3045039     4              >170
#> 5        e Height 188.4124760     5              >170
#> 6        f Height 187.4076603     6              >170
#> 7        g Height 189.8510632     7              >170
#> 8        h Height 161.1792696     8           150-170
#> 9        i Height 154.4218622     9           150-170
#> 10       j Height 163.4081554    10           150-170
#> 11       a Weight  61.1412696    11              =<65
#> 12       b Weight  63.1367702    12              =<65
#> 13       c Weight  61.4507167    13              =<65
#> 14       d Weight  63.0961367    14              =<65
#> 15       e Weight  62.3649034    15              =<65
#> 16       f Weight  61.7513039    16              =<65
#> 17       g Weight  66.0458990    17               >65
#> 18       h Weight  62.2540664    18              =<65
#> 19       i Weight  61.0092254    19              =<65
#> 20       j Weight  61.6235686    20              =<65
#> 21       a    Age  52.5372001    21               >31
#> 22       b    Age  36.8052489    22               >31
#> 23       c    Age  59.2945774    23               >31
#> 24       d    Age  39.6242842    24               >31
#> 25       e    Age  49.0555705    25               >31
#> 26       f    Age  64.9218613    26               >31
#> 27       g    Age  33.2100829    27               >31
#> 28       h    Age  52.1568582    28               >31
#> 29       i    Age  40.7964513    29               >31
#> 30       j    Age  24.9316137    30              =<31
#> 31       a  other  -0.5728756    31              <NA>
#> 32       b  other  -1.2852988    32              <NA>
#> 33       c  other  -0.5605073    33              <NA>
#> 34       d  other   0.8669373    34              <NA>
#> 35       e  other   1.3836194    35              <NA>
#> 36       f  other  -1.0536847    36              <NA>
#> 37       g  other   0.1168282    37              <NA>
#> 38       h  other   1.1334157    38              <NA>
#> 39       i  other   1.8814619    39              <NA>
#> 40       j  other  -1.1017715    40              <NA>