Skip to contents

Cutting data by group

Usage

cut_by_group(df, col_data, col_group, group, cat_col)

Arguments

df

(dataframe) with a column of data to be cut and a column specifying the group of each observation.

col_data

(character) the column containing the data to be cut.

col_group

(character) the column containing the names of the groups according to which the data should be split.

group

(nested list) providing for each parameter value that should be analyzed in a categorical way: the name of the parameter (character), a series of breakpoints (numeric) where the first breakpoints is typically -Inf and the last Inf, and a series of name which will describe each category (character).

cat_col

(character) the name of the new column in which the cut label should he stored.

Value

data.frame with a column containing categorical values.

Details

Function used to categorize numeric data stored in long format depending on their group. Intervals are closed on the right (and open on the left).

Examples

group <- list(
  list(
    "Height",
    c(-Inf, 150, 170, Inf),
    c("=<150", "150-170", ">170")
  ),
  list(
    "Weight",
    c(-Inf, 65, Inf),
    c("=<65", ">65")
  ),
  list(
    "Age",
    c(-Inf, 31, Inf),
    c("=<31", ">31")
  ),
  list(
    "PreCondition",
    c(-Inf, 1, Inf),
    c("=<1", "<1")
  )
)
data <- data.frame(
  SUBJECT = rep(letters[1:10], 4),
  PARAM = rep(c("Height", "Weight", "Age", "other"), each = 10),
  AVAL = c(rnorm(10, 165, 15), rnorm(10, 65, 5), runif(10, 18, 65), rnorm(10, 0, 1)),
  index = 1:40
)

cut_by_group(data, "AVAL", "PARAM", group, "my_new_categories")
#>    SUBJECT  PARAM        AVAL index my_new_categories
#> 1        a Height 158.0256321     1           150-170
#> 2        b Height 134.8375770     2             =<150
#> 3        c Height 163.8269248     3           150-170
#> 4        d Height 181.1809658     4              >170
#> 5        e Height 158.2002697     5           150-170
#> 6        f Height 180.0857037     6              >170
#> 7        g Height 185.1244417     7              >170
#> 8        h Height 162.2552601     8           150-170
#> 9        i Height 179.1550724     9              >170
#> 10       j Height 177.0174581    10              >170
#> 11       a Weight  71.5736921    11               >65
#> 12       b Weight  71.3025166    12               >65
#> 13       c Weight  61.4930877    13              =<65
#> 14       d Weight  64.5069865    14              =<65
#> 15       e Weight  61.4106998    15              =<65
#> 16       f Weight  63.6462025    16              =<65
#> 17       g Weight  52.9578114    17              =<65
#> 18       h Weight  60.7915086    18              =<65
#> 19       i Weight  59.6057811    19              =<65
#> 20       j Weight  69.0566868    20               >65
#> 21       a    Age  33.3679636    21               >31
#> 22       b    Age  57.0906721    22               >31
#> 23       c    Age  32.2697663    23               >31
#> 24       d    Age  52.5997950    24               >31
#> 25       e    Age  19.8774086    25              =<31
#> 26       f    Age  28.4293008    26              =<31
#> 27       g    Age  44.2549951    27               >31
#> 28       h    Age  52.5585256    28               >31
#> 29       i    Age  55.2896501    29               >31
#> 30       j    Age  39.5293510    30               >31
#> 31       a  other   3.0825757    31              <NA>
#> 32       b  other   0.8884903    32              <NA>
#> 33       c  other  -0.8847469    33              <NA>
#> 34       d  other   0.4442252    34              <NA>
#> 35       e  other  -0.1880123    35              <NA>
#> 36       f  other   0.3394190    36              <NA>
#> 37       g  other   0.3010045    37              <NA>
#> 38       h  other   1.4338486    38              <NA>
#> 39       i  other   1.2479989    39              <NA>
#> 40       j  other   0.6556551    40              <NA>