Cutting data by group

cut_by_group(df, col_data, col_group, group, cat_col)

Arguments

df

(dataframe) with a column of data to be cut and a column specifying the group of each observation.

col_data

(character) the column containing the data to be cut.

col_group

(character) the column containing the names of the groups according to which the data should be split.

group

(nested list) providing for each parameter value that should be analyzed in a categorical way: the name of the parameter (character), a series of breakpoints (numeric) where the first breakpoints is typically -Inf and the last Inf, and a series of name which will describe each category (character).

cat_col

(character) the name of the new column in which the cut label should he stored.

Value

data.frame with a column containing categorical values.

Details

Function used to categorize numeric data stored in long format depending on their group. Intervals are closed on the right (and open on the left).

Examples

group <- list(
  list(
    "Height",
    c(-Inf, 150, 170, Inf),
    c("=<150", "150-170", ">170")
  ),
  list(
    "Weight",
    c(-Inf, 65, Inf),
    c("=<65", ">65")
  ),
  list(
    "Age",
    c(-Inf, 31, Inf),
    c("=<31", ">31")
  ),
  list(
    "PreCondition",
    c(-Inf, 1, Inf),
    c("=<1", "<1")
  )
)
data <- data.frame(
  SUBJECT = rep(letters[1:10], 4),
  PARAM = rep(c("Height", "Weight", "Age", "other"), each = 10),
  AVAL = c(rnorm(10, 165, 15), rnorm(10, 65, 5), runif(10, 18, 65), rnorm(10, 0, 1)),
  index = 1:40
)

cut_by_group(data, "AVAL", "PARAM", group, "my_new_categories")
#>    SUBJECT  PARAM        AVAL index my_new_categories
#> 1        a Height 187.1341491     1              >170
#> 2        b Height 153.3078723     2           150-170
#> 3        c Height 151.2048361     3           150-170
#> 4        d Height 155.0123587     4           150-170
#> 5        e Height 175.4797333     5              >170
#> 6        f Height 147.7682185     6             =<150
#> 7        g Height 138.0417551     7             =<150
#> 8        h Height 155.9934056     8           150-170
#> 9        i Height 179.0973148     9              >170
#> 10       j Height 179.3535136    10              >170
#> 11       a Weight  63.9618528    11              =<65
#> 12       b Weight  58.2993279    12              =<65
#> 13       c Weight  59.3171582    13              =<65
#> 14       d Weight  76.4275890    14               >65
#> 15       e Weight  58.9629390    15              =<65
#> 16       f Weight  61.9759158    16              =<65
#> 17       g Weight  69.7670461    17               >65
#> 18       h Weight  65.8997693    18               >65
#> 19       i Weight  63.6614369    19              =<65
#> 20       j Weight  66.4422921    20               >65
#> 21       a    Age  62.1932062    21               >31
#> 22       b    Age  63.6104956    22               >31
#> 23       c    Age  38.1670269    23               >31
#> 24       d    Age  20.4320762    24              =<31
#> 25       e    Age  45.9518090    25               >31
#> 26       f    Age  39.1018662    26               >31
#> 27       g    Age  38.4221379    27               >31
#> 28       h    Age  58.1394862    28               >31
#> 29       i    Age  27.9229415    29              =<31
#> 30       j    Age  29.7921841    30              =<31
#> 31       a  other   0.3999131    31              <NA>
#> 32       b  other   0.5777239    32              <NA>
#> 33       c  other  -0.2881940    33              <NA>
#> 34       d  other   0.6064985    34              <NA>
#> 35       e  other   0.7049552    35              <NA>
#> 36       f  other  -0.7548462    36              <NA>
#> 37       g  other  -3.1139111    37              <NA>
#> 38       h  other   0.3055656    38              <NA>
#> 39       i  other   0.3374558    39              <NA>
#> 40       j  other  -1.1842521    40              <NA>