Skip to contents

[Stable]

The function add_quality_flags() adds quality flag information to a AnyHermesData object:

  • low_expression_flag: for each gene, counts how many samples don't pass a minimum expression Counts per Million (CPM) threshold. If too many, then it flags this gene as a "low expression" gene.

  • tech_failure_flag: first calculates the Pearson correlation matrix of the sample wise CPM values, resulting in a matrix measuring the correlation between samples. Then compares the average correlation per sample with a threshold - if it is too low, then the sample is flagged as a "technical failure" sample.

  • low_depth_flag: computes the library size (total number of counts) per sample. If this number is too low, the sample is flagged as a "low depth" sample.

Separate helper functions are internally used to create the flags, and separate getter functions allow easy access to the quality control flags in an object.

Usage

add_quality_flags(object, control = control_quality(), overwrite = FALSE)

h_low_expression_flag(object, control = control_quality())

h_low_depth_flag(object, control = control_quality())

h_tech_failure_flag(object, control = control_quality())

get_tech_failure(object)

get_low_depth(object)

get_low_expression(object)

Arguments

object

(AnyHermesData)
input.

control

(list)
list of settings (thresholds etc.) used to compute the quality control flags, produced by control_quality().

overwrite

(flag)
whether previously added flags may be overwritten.

Value

The input object with added quality flags.

Details

While object already has the variables mentioned above as part of the rowData and colData (as this is enforced by the validation method for AnyHermesData), they are usually still NA after the initial object creation.

Functions

  • h_low_expression_flag(): creates the low expression flag for genes given control settings.

  • h_low_depth_flag(): creates the low depth (library size) flag for samples given control settings.

  • h_tech_failure_flag(): creates the technical failure flag for samples given control settings.

  • get_tech_failure(): get the technical failure flags for all samples.

  • get_low_depth(): get the low depth failure flags for all samples.

  • get_low_expression(): get the low expression failure flags for all genes.

See also

Examples

# Adding default quality flags to `AnyHermesData` object.
object <- hermes_data
result <- add_quality_flags(object)
which(get_tech_failure(result) != get_tech_failure(object))
#> 06520046C0018R 
#>             10 
head(get_low_expression(result))
#>     GeneID:11185     GeneID:10677 GeneID:101928428 GeneID:100422835 
#>            FALSE            FALSE             TRUE             TRUE 
#> GeneID:102466731     GeneID:64881 
#>             TRUE             TRUE 
head(get_tech_failure(result))
#> 06520011B0023R 06520067C0018R 06520063C0043R 06520105C0017R 06520092C0017R 
#>          FALSE          FALSE          FALSE          FALSE          FALSE 
#> 06520103C0017R 
#>          FALSE 
head(get_low_depth(result))
#> 06520011B0023R 06520067C0018R 06520063C0043R 06520105C0017R 06520092C0017R 
#>          FALSE          FALSE          FALSE          FALSE          FALSE 
#> 06520103C0017R 
#>          FALSE 

# It is possible to overwrite flags if needed, which will trigger a message.
result2 <- add_quality_flags(result, control_quality(min_cpm = 1000), overwrite = TRUE)
#> previously have added quality flags, but overwriting now

# Separate calculation of low expression flag.
low_expr_flag <- h_low_expression_flag(
  object,
  control_quality(min_cpm = 500, min_cpm_prop = 0.9)
)
length(low_expr_flag) == nrow(object)
#> [1] TRUE
head(low_expr_flag)
#>     GeneID:11185     GeneID:10677 GeneID:101928428 GeneID:100422835 
#>             TRUE             TRUE             TRUE             TRUE 
#> GeneID:102466731     GeneID:64881 
#>             TRUE             TRUE 

# Separate calculation of low depth flag.
low_depth_flag <- h_low_depth_flag(object, control_quality(min_depth = 5))
length(low_depth_flag) == ncol(object)
#> [1] TRUE
head(low_depth_flag)
#> 06520011B0023R 06520067C0018R 06520063C0043R 06520105C0017R 06520092C0017R 
#>          FALSE          FALSE          FALSE          FALSE          FALSE 
#> 06520103C0017R 
#>          FALSE 

# Separate calculation of technical failure flag.
tech_failure_flag <- h_tech_failure_flag(object, control_quality(min_corr = 0.35))
length(tech_failure_flag) == ncol(object)
#> [1] TRUE
head(tech_failure_flag)
#> 06520011B0023R 06520067C0018R 06520063C0043R 06520105C0017R 06520092C0017R 
#>          FALSE          FALSE          FALSE          FALSE          FALSE 
#> 06520103C0017R 
#>          FALSE 
head(get_tech_failure(object))
#> 06520011B0023R 06520067C0018R 06520063C0043R 06520105C0017R 06520092C0017R 
#>          FALSE          FALSE          FALSE          FALSE          FALSE 
#> 06520103C0017R 
#>          FALSE 
head(get_low_depth(object))
#> 06520011B0023R 06520067C0018R 06520063C0043R 06520105C0017R 06520092C0017R 
#>          FALSE          FALSE           TRUE          FALSE          FALSE 
#> 06520103C0017R 
#>          FALSE 
head(get_low_expression(object))
#>     GeneID:11185     GeneID:10677 GeneID:101928428 GeneID:100422835 
#>            FALSE             TRUE             TRUE             TRUE 
#> GeneID:102466731     GeneID:64881 
#>             TRUE             TRUE