Skip to contents

Cutting data by group

Usage

cut_by_group(df, col_data, col_group, group, cat_col)

Arguments

df

(dataframe) with a column of data to be cut and a column specifying the group of each observation.

col_data

(character) the column containing the data to be cut.

col_group

(character) the column containing the names of the groups according to which the data should be split.

group

(nested list) providing for each parameter value that should be analyzed in a categorical way: the name of the parameter (character), a series of breakpoints (numeric) where the first breakpoints is typically -Inf and the last Inf, and a series of name which will describe each category (character).

cat_col

(character) the name of the new column in which the cut label should he stored.

Value

data.frame with a column containing categorical values.

Details

Function used to categorize numeric data stored in long format depending on their group. Intervals are closed on the right (and open on the left).

Examples

group <- list(
  list(
    "Height",
    c(-Inf, 150, 170, Inf),
    c("=<150", "150-170", ">170")
  ),
  list(
    "Weight",
    c(-Inf, 65, Inf),
    c("=<65", ">65")
  ),
  list(
    "Age",
    c(-Inf, 31, Inf),
    c("=<31", ">31")
  ),
  list(
    "PreCondition",
    c(-Inf, 1, Inf),
    c("=<1", "<1")
  )
)
data <- data.frame(
  SUBJECT = rep(letters[1:10], 4),
  PARAM = rep(c("Height", "Weight", "Age", "other"), each = 10),
  AVAL = c(rnorm(10, 165, 15), rnorm(10, 65, 5), runif(10, 18, 65), rnorm(10, 0, 1)),
  index = 1:40
)

cut_by_group(data, "AVAL", "PARAM", group, "my_new_categories")
#>    SUBJECT  PARAM         AVAL index my_new_categories
#> 1        a Height 180.04954412     1              >170
#> 2        b Height 155.29243981     2           150-170
#> 3        c Height 166.54698950     3           150-170
#> 4        d Height 158.18088597     4           150-170
#> 5        e Height 168.53865910     5           150-170
#> 6        f Height 169.75450171     6           150-170
#> 7        g Height 165.73316267     7           150-170
#> 8        h Height 185.06664963     8              >170
#> 9        i Height 142.53363801     9             =<150
#> 10       j Height 174.38478556    10              >170
#> 11       a Weight  64.87352209    11              =<65
#> 12       b Weight  68.44561770    12               >65
#> 13       c Weight  60.05176862    13              =<65
#> 14       d Weight  79.01475087    14               >65
#> 15       e Weight  70.85947986    15               >65
#> 16       f Weight  56.69117397    16              =<65
#> 17       g Weight  71.87304174    17               >65
#> 18       h Weight  71.76482809    18               >65
#> 19       i Weight  70.85120618    19               >65
#> 20       j Weight  58.78234229    20              =<65
#> 21       a    Age  42.98636198    21               >31
#> 22       b    Age  42.48290801    22               >31
#> 23       c    Age  51.12243954    23               >31
#> 24       d    Age  56.81777262    24               >31
#> 25       e    Age  60.45912947    25               >31
#> 26       f    Age  49.85953382    26               >31
#> 27       g    Age  30.84972786    27              =<31
#> 28       h    Age  19.70439206    28              =<31
#> 29       i    Age  27.66488205    29              =<31
#> 30       j    Age  59.39475082    30               >31
#> 31       a  other  -1.40973774    31              <NA>
#> 32       b  other  -1.03981800    32              <NA>
#> 33       c  other  -0.15204715    33              <NA>
#> 34       d  other   0.17790018    34              <NA>
#> 35       e  other  -0.05329684    35              <NA>
#> 36       f  other  -0.12565224    36              <NA>
#> 37       g  other   0.04966566    37              <NA>
#> 38       h  other   0.80916028    38              <NA>
#> 39       i  other   1.40301514    39              <NA>
#> 40       j  other  -1.30005815    40              <NA>