Skip to contents

Cutting data by group

Usage

cut_by_group(df, col_data, col_group, group, cat_col)

Arguments

df

(dataframe) with a column of data to be cut and a column specifying the group of each observation.

col_data

(character) the column containing the data to be cut.

col_group

(character) the column containing the names of the groups according to which the data should be split.

group

(nested list) providing for each parameter value that should be analyzed in a categorical way: the name of the parameter (character), a series of breakpoints (numeric) where the first breakpoints is typically -Inf and the last Inf, and a series of name which will describe each category (character).

cat_col

(character) the name of the new column in which the cut label should he stored.

Value

data.frame with a column containing categorical values.

Details

Function used to categorize numeric data stored in long format depending on their group. Intervals are closed on the right (and open on the left).

Examples

group <- list(
  list(
    "Height",
    c(-Inf, 150, 170, Inf),
    c("=<150", "150-170", ">170")
  ),
  list(
    "Weight",
    c(-Inf, 65, Inf),
    c("=<65", ">65")
  ),
  list(
    "Age",
    c(-Inf, 31, Inf),
    c("=<31", ">31")
  ),
  list(
    "PreCondition",
    c(-Inf, 1, Inf),
    c("=<1", "<1")
  )
)
data <- data.frame(
  SUBJECT = rep(letters[1:10], 4),
  PARAM = rep(c("Height", "Weight", "Age", "other"), each = 10),
  AVAL = c(rnorm(10, 165, 15), rnorm(10, 65, 5), runif(10, 18, 65), rnorm(10, 0, 1)),
  index = 1:40
)

cut_by_group(data, "AVAL", "PARAM", group, "my_new_categories")
#>    SUBJECT  PARAM         AVAL index my_new_categories
#> 1        a Height 145.48597641     1             =<150
#> 2        b Height 174.21891941     2              >170
#> 3        c Height 186.11868366     3              >170
#> 4        d Height 194.53476080     4              >170
#> 5        e Height 155.52932086     5           150-170
#> 6        f Height 159.80521395     6           150-170
#> 7        g Height 163.52569436     7           150-170
#> 8        h Height 169.27582627     8           150-170
#> 9        i Height 147.81475309     9             =<150
#> 10       j Height 158.37823725    10           150-170
#> 11       a Weight  62.38998980    11              =<65
#> 12       b Weight  61.61703175    12              =<65
#> 13       c Weight  69.08751683    13               >65
#> 14       d Weight  66.45896808    14               >65
#> 15       e Weight  70.56025768    15               >65
#> 16       f Weight  60.54845382    16              =<65
#> 17       g Weight  67.11466440    17               >65
#> 18       h Weight  65.84288898    18               >65
#> 19       i Weight  53.35591931    19              =<65
#> 20       j Weight  56.51948755    20              =<65
#> 21       a    Age  24.81214519    21              =<31
#> 22       b    Age  40.97783573    22               >31
#> 23       c    Age  61.04688822    23               >31
#> 24       d    Age  60.17100342    24               >31
#> 25       e    Age  47.91894905    25               >31
#> 26       f    Age  41.12210567    26               >31
#> 27       g    Age  49.12015713    27               >31
#> 28       h    Age  48.49803269    28               >31
#> 29       i    Age  36.90812458    29               >31
#> 30       j    Age  60.94854401    30               >31
#> 31       a  other  -1.32959941    31              <NA>
#> 32       b  other   0.92192641    32              <NA>
#> 33       c  other   0.51913123    33              <NA>
#> 34       d  other   0.73632896    34              <NA>
#> 35       e  other   1.22389050    35              <NA>
#> 36       f  other  -1.48207695    36              <NA>
#> 37       g  other   0.50114428    37              <NA>
#> 38       h  other  -0.59718282    38              <NA>
#> 39       i  other   0.19599642    39              <NA>
#> 40       j  other  -0.06521908    40              <NA>