The function add_quality_flags()
adds quality flag information to a AnyHermesData
object:
low_expression_flag
: for each gene, counts how many samples don't pass a minimum expression Counts per Million (CPM
) threshold. If too many, then it flags this gene as a "low expression" gene.tech_failure_flag
: first calculates the Pearson correlation matrix of the sample wiseCPM
values, resulting in a matrix measuring the correlation between samples. Then compares the average correlation per sample with a threshold - if it is too low, then the sample is flagged as a "technical failure" sample.low_depth_flag
: computes the library size (total number of counts) per sample. If this number is too low, the sample is flagged as a "low depth" sample.
Separate helper functions are internally used to create the flags, and
separate getter
functions allow easy access to the quality control flags in an object.
Usage
add_quality_flags(object, control = control_quality(), overwrite = FALSE)
h_low_expression_flag(object, control = control_quality())
h_low_depth_flag(object, control = control_quality())
h_tech_failure_flag(object, control = control_quality())
get_tech_failure(object)
get_low_depth(object)
get_low_expression(object)
Arguments
- object
(
AnyHermesData
)
input.- control
(
list
)
list of settings (thresholds etc.) used to compute the quality control flags, produced bycontrol_quality()
.- overwrite
(
flag
)
whether previously added flags may be overwritten.
Details
While object
already has the variables mentioned above as part of the
rowData
and colData
(as this is enforced by the validation
method for AnyHermesData
), they are usually still NA
after the initial
object creation.
Functions
h_low_expression_flag()
: creates the low expression flag for genes given control settings.h_low_depth_flag()
: creates the low depth (library size) flag for samples given control settings.h_tech_failure_flag()
: creates the technical failure flag for samples given control settings.get_tech_failure()
: get the technical failure flags for all samples.get_low_depth()
: get the low depth failure flags for all samples.get_low_expression()
: get the low expression failure flags for all genes.
See also
control_quality()
for the detailed settings specifications;set_tech_failure()
to manually flag samples as technical failures.
Examples
# Adding default quality flags to `AnyHermesData` object.
object <- hermes_data
result <- add_quality_flags(object)
which(get_tech_failure(result) != get_tech_failure(object))
#> 06520046C0018R
#> 10
head(get_low_expression(result))
#> GeneID:11185 GeneID:10677 GeneID:101928428 GeneID:100422835
#> FALSE FALSE TRUE TRUE
#> GeneID:102466731 GeneID:64881
#> TRUE TRUE
head(get_tech_failure(result))
#> 06520011B0023R 06520067C0018R 06520063C0043R 06520105C0017R 06520092C0017R
#> FALSE FALSE FALSE FALSE FALSE
#> 06520103C0017R
#> FALSE
head(get_low_depth(result))
#> 06520011B0023R 06520067C0018R 06520063C0043R 06520105C0017R 06520092C0017R
#> FALSE FALSE FALSE FALSE FALSE
#> 06520103C0017R
#> FALSE
# It is possible to overwrite flags if needed, which will trigger a message.
result2 <- add_quality_flags(result, control_quality(min_cpm = 1000), overwrite = TRUE)
#> previously have added quality flags, but overwriting now
# Separate calculation of low expression flag.
low_expr_flag <- h_low_expression_flag(
object,
control_quality(min_cpm = 500, min_cpm_prop = 0.9)
)
length(low_expr_flag) == nrow(object)
#> [1] TRUE
head(low_expr_flag)
#> GeneID:11185 GeneID:10677 GeneID:101928428 GeneID:100422835
#> TRUE TRUE TRUE TRUE
#> GeneID:102466731 GeneID:64881
#> TRUE TRUE
# Separate calculation of low depth flag.
low_depth_flag <- h_low_depth_flag(object, control_quality(min_depth = 5))
length(low_depth_flag) == ncol(object)
#> [1] TRUE
head(low_depth_flag)
#> 06520011B0023R 06520067C0018R 06520063C0043R 06520105C0017R 06520092C0017R
#> FALSE FALSE FALSE FALSE FALSE
#> 06520103C0017R
#> FALSE
# Separate calculation of technical failure flag.
tech_failure_flag <- h_tech_failure_flag(object, control_quality(min_corr = 0.35))
length(tech_failure_flag) == ncol(object)
#> [1] TRUE
head(tech_failure_flag)
#> 06520011B0023R 06520067C0018R 06520063C0043R 06520105C0017R 06520092C0017R
#> FALSE FALSE FALSE FALSE FALSE
#> 06520103C0017R
#> FALSE
head(get_tech_failure(object))
#> 06520011B0023R 06520067C0018R 06520063C0043R 06520105C0017R 06520092C0017R
#> FALSE FALSE FALSE FALSE FALSE
#> 06520103C0017R
#> FALSE
head(get_low_depth(object))
#> 06520011B0023R 06520067C0018R 06520063C0043R 06520105C0017R 06520092C0017R
#> FALSE FALSE TRUE FALSE FALSE
#> 06520103C0017R
#> FALSE
head(get_low_expression(object))
#> GeneID:11185 GeneID:10677 GeneID:101928428 GeneID:100422835
#> FALSE TRUE TRUE TRUE
#> GeneID:102466731 GeneID:64881
#> TRUE TRUE