Skip to contents

Cutting data by group

Usage

cut_by_group(df, col_data, col_group, group, cat_col)

Arguments

df

(dataframe) with a column of data to be cut and a column specifying the group of each observation.

col_data

(character) the column containing the data to be cut.

col_group

(character) the column containing the names of the groups according to which the data should be split.

group

(nested list) providing for each parameter value that should be analyzed in a categorical way: the name of the parameter (character), a series of breakpoints (numeric) where the first breakpoints is typically -Inf and the last Inf, and a series of name which will describe each category (character).

cat_col

(character) the name of the new column in which the cut label should he stored.

Value

data.frame with a column containing categorical values.

Details

Function used to categorize numeric data stored in long format depending on their group. Intervals are closed on the right (and open on the left).

Examples

group <- list(
  list(
    "Height",
    c(-Inf, 150, 170, Inf),
    c("=<150", "150-170", ">170")
  ),
  list(
    "Weight",
    c(-Inf, 65, Inf),
    c("=<65", ">65")
  ),
  list(
    "Age",
    c(-Inf, 31, Inf),
    c("=<31", ">31")
  ),
  list(
    "PreCondition",
    c(-Inf, 1, Inf),
    c("=<1", "<1")
  )
)
data <- data.frame(
  SUBJECT = rep(letters[1:10], 4),
  PARAM = rep(c("Height", "Weight", "Age", "other"), each = 10),
  AVAL = c(rnorm(10, 165, 15), rnorm(10, 65, 5), runif(10, 18, 65), rnorm(10, 0, 1)),
  index = 1:40
)

cut_by_group(data, "AVAL", "PARAM", group, "my_new_categories")
#>    SUBJECT  PARAM        AVAL index my_new_categories
#> 1        a Height 176.9146244     1              >170
#> 2        b Height 153.3026581     2           150-170
#> 3        c Height 154.5514753     3           150-170
#> 4        d Height 159.5682450     4           150-170
#> 5        e Height 182.8683123     5              >170
#> 6        f Height 150.2017331     6           150-170
#> 7        g Height 189.8739772     7              >170
#> 8        h Height 151.5200707     8           150-170
#> 9        i Height 153.2656339     9           150-170
#> 10       j Height 163.3621539    10           150-170
#> 11       a Weight  62.3106882    11              =<65
#> 12       b Weight  70.0034530    12               >65
#> 13       c Weight  65.7835296    13               >65
#> 14       d Weight  65.7323850    14               >65
#> 15       e Weight  56.4889786    15              =<65
#> 16       f Weight  60.7749554    16              =<65
#> 17       g Weight  63.0684203    17              =<65
#> 18       h Weight  51.0406078    18              =<65
#> 19       i Weight  70.5875707    19               >65
#> 20       j Weight  60.8885458    20              =<65
#> 21       a    Age  55.2364083    21               >31
#> 22       b    Age  63.2807238    22               >31
#> 23       c    Age  54.0174946    23               >31
#> 24       d    Age  59.6837275    24               >31
#> 25       e    Age  64.5872272    25               >31
#> 26       f    Age  53.2529663    26               >31
#> 27       g    Age  25.1464671    27              =<31
#> 28       h    Age  46.2596395    28               >31
#> 29       i    Age  59.7235524    29               >31
#> 30       j    Age  40.7569123    30               >31
#> 31       a  other  -3.0773942    31              <NA>
#> 32       b  other   1.1704720    32              <NA>
#> 33       c  other  -0.4694520    33              <NA>
#> 34       d  other  -0.8953286    34              <NA>
#> 35       e  other  -0.6432770    35              <NA>
#> 36       f  other  -0.2394497    36              <NA>
#> 37       g  other  -1.8775131    37              <NA>
#> 38       h  other  -0.8395763    38              <NA>
#> 39       i  other  -0.4741488    39              <NA>
#> 40       j  other  -0.4906048    40              <NA>