The normalize()
method is normalizing the input AnyHermesData
according to one or more
specified normalization methods. The results are saved as additional assays
in the object.
Possible normalization methods (which are implemented with separate helper functions):
cpm
: Counts per Million (CPM). Separately by sample, the original counts of the genes
are divided by the library size of this sample, and multiplied by one million. This is the
appropriate normalization for between-sample comparisons.
rpkm
: Reads per Kilobase of transcript per Million reads mapped (RPKM). Each gene count is
divided by the gene size (in kilobases) and then again divided by the library sizes of each
sample (in millions). This allows for within-sample comparisons, as it takes
into account the gene sizes - longer genes will always have more counts than shorter genes.
tpm
: Transcripts per Million (TPM). This addresses the problem of RPKM being inconsistent
across samples (which can be seen that the sum of all RPKM values will vary from sample to
sample). Therefore here we divide the RPKM by the sum of all RPKM values for each sample,
and multiply by one million.
voom
: VOOM normalization. This is essentially just a slight variation of CPM where
a prior_count
of 0.5 is combined with lib_sizes
increased by 1 for each sample. Note that
this is not required for the corresponding differential expression analysis, but just provided
as a complementary experimental normalization approach here.
vst
: Variance stabilizing transformation. This is to transform the normalized
count data for all genes into approximately homoskedastic values (having constant variance).
rlog
: The transformation to the log2 scale values with approximately homoskedastic values.
# S4 method for AnyHermesData
normalize(
object,
methods = c("cpm", "rpkm", "tpm", "voom", "vst"),
control = control_normalize(),
...
)
h_cpm(object, control = control_normalize())
h_rpkm(object, control = control_normalize())
h_tpm(object, control = control_normalize())
h_voom(object, control = control_normalize())
h_vst(object, control = control_normalize())
h_rlog(object, control = control_normalize())
(AnyHermesData
)
object to normalize.
(character
)
which normalization methods to use, see details.
(named list
)
settings produced by control_normalize()
.
not used.
The AnyHermesData
object with additional assays containing the normalized counts.
The control
is saved in the metadata
of the object for future reference.
h_cpm
: calculates the Counts per Million (CPM) normalized counts.
h_rpkm
: calculates the Reads per Kilobase per Million (RPKM) normalized counts.
h_tpm
: calculates the Transcripts per Million (TPM) normalized counts.
h_vst
: variance stabilizing transformation (vst
) from DESeq2
package.
h_rlog
: regularized log transformation (rlog
) from DESeq2
package.
control_normalize()
to define the normalization method settings.
a <- hermes_data
# By default, log values are used with a prior count of 1 added to original counts.
result <- normalize(a)
assayNames(result)
#> [1] "counts" "cpm" "rpkm" "tpm" "voom" "vst"
tpm <- assay(result, "tpm")
tpm[1:3, 1:3]
#> 06520011B0023R 06520067C0018R 06520063C0043R
#> GeneID:11185 0.8626554 3.993877 3.591742
#> GeneID:10677 7.0670966 4.054655 3.289554
#> GeneID:101928428 0.0000000 0.000000 0.000000
# We can also work on original scale.
result_orig <- normalize(a, control = control_normalize(log = FALSE))
tpm_orig <- assay(result_orig, "tpm")
tpm_orig[1:3, 1:3]
#> 06520011B0023R 06520067C0018R 06520063C0043R
#> GeneID:11185 0.8183821 14.93224 11.05653
#> GeneID:10677 133.0936048 15.61778 8.77810
#> GeneID:101928428 0.0000000 0.00000 0.00000
# Separate calculation of the CPM normalized counts.
counts_cpm <- h_cpm(a)
str(counts_cpm)
#> num [1:5085, 1:20] -0.453 8.252 -2.453 -2.453 -2.453 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#> ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...
# Separate calculation of the RPKM normalized counts.
counts_rpkm <- h_rpkm(a)
str(counts_rpkm)
#> num [1:5085, 1:20] -2.904 4.027 0.404 1.12 1.323 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#> ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...
# Separate calculation of the TPM normalized counts.
counts_tpm <- h_tpm(a)
str(counts_tpm)
#> num [1:5085, 1:20] 0.863 7.067 0 0 0 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#> ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...
# Separate calculation of the VOOM normalized counts.
counts_voom <- h_voom(a)
str(counts_voom)
#> num [1:5085, 1:20] -0.645 8.251 -3.453 -3.453 -3.453 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#> ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...
# Separate calculation of the vst transformation.
counts_vst <- h_vst(a)
str(counts_vst)
#> num [1:5085, 1:20] 3.03 10.6 1.78 1.78 1.78 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#> ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...
# Separate calculation of the rlog transformation.
counts_rlog <- h_rlog(a)
str(counts_rlog)
#> num [1:5085, 1:20] 2.97 10.13 0 0 0 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#> ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...
#> - attr(*, "betaPriorVar")= num 7.01
#> - attr(*, "intercept")= num [1:5085, 1] 5.26 8.49 -Inf -Inf -Inf ...