Skip to contents

Cutting data by group

Usage

cut_by_group(df, col_data, col_group, group, cat_col)

Arguments

df

(dataframe) with a column of data to be cut and a column specifying the group of each observation.

col_data

(character) the column containing the data to be cut.

col_group

(character) the column containing the names of the groups according to which the data should be split.

group

(nested list) providing for each parameter value that should be analyzed in a categorical way: the name of the parameter (character), a series of breakpoints (numeric) where the first breakpoints is typically -Inf and the last Inf, and a series of name which will describe each category (character).

cat_col

(character) the name of the new column in which the cut label should he stored.

Value

data.frame with a column containing categorical values.

Details

Function used to categorize numeric data stored in long format depending on their group. Intervals are closed on the right (and open on the left).

Examples

group <- list(
  list(
    "Height",
    c(-Inf, 150, 170, Inf),
    c("=<150", "150-170", ">170")
  ),
  list(
    "Weight",
    c(-Inf, 65, Inf),
    c("=<65", ">65")
  ),
  list(
    "Age",
    c(-Inf, 31, Inf),
    c("=<31", ">31")
  ),
  list(
    "PreCondition",
    c(-Inf, 1, Inf),
    c("=<1", "<1")
  )
)
data <- data.frame(
  SUBJECT = rep(letters[1:10], 4),
  PARAM = rep(c("Height", "Weight", "Age", "other"), each = 10),
  AVAL = c(rnorm(10, 165, 15), rnorm(10, 65, 5), runif(10, 18, 65), rnorm(10, 0, 1)),
  index = 1:40
)

cut_by_group(data, "AVAL", "PARAM", group, "my_new_categories")
#>    SUBJECT  PARAM         AVAL index my_new_categories
#> 1        a Height 156.80992083     1           150-170
#> 2        b Height 168.09901667     2           150-170
#> 3        c Height 174.96288988     3              >170
#> 4        d Height 183.09872831     4              >170
#> 5        e Height 175.11321985     5              >170
#> 6        f Height 165.69507018     6           150-170
#> 7        g Height 182.61657428     7              >170
#> 8        h Height 157.62087456     8           150-170
#> 9        i Height 161.69401328     9           150-170
#> 10       j Height 152.59079859    10           150-170
#> 11       a Weight  69.44600995    11               >65
#> 12       b Weight  60.67105442    12              =<65
#> 13       c Weight  62.26869371    13              =<65
#> 14       d Weight  62.47673470    14              =<65
#> 15       e Weight  62.54373579    15              =<65
#> 16       f Weight  62.51389555    16              =<65
#> 17       g Weight  73.72075375    17               >65
#> 18       h Weight  61.11199155    18              =<65
#> 19       i Weight  74.42327549    19               >65
#> 20       j Weight  64.86844826    20              =<65
#> 21       a    Age  24.28488057    21              =<31
#> 22       b    Age  63.50856034    22               >31
#> 23       c    Age  28.90032557    23              =<31
#> 24       d    Age  60.67899483    24               >31
#> 25       e    Age  21.40036283    25              =<31
#> 26       f    Age  34.10295123    26               >31
#> 27       g    Age  44.95555310    27               >31
#> 28       h    Age  45.00940858    28               >31
#> 29       i    Age  31.03620707    29               >31
#> 30       j    Age  40.26751096    30               >31
#> 31       a  other   0.61225707    31              <NA>
#> 32       b  other  -0.65437781    32              <NA>
#> 33       c  other  -0.69169274    33              <NA>
#> 34       d  other   1.30543121    34              <NA>
#> 35       e  other   0.18819548    35              <NA>
#> 36       f  other  -1.14279149    36              <NA>
#> 37       g  other  -0.92554354    37              <NA>
#> 38       h  other   0.89415139    38              <NA>
#> 39       i  other  -0.07845406    39              <NA>
#> 40       j  other  -0.44835361    40              <NA>