The normalize()
method is normalizing the input AnyHermesData
according to one or more
specified normalization methods. The results are saved as additional assays
in the object.
Possible normalization methods (which are implemented with separate helper functions):
cpm: Counts per Million (
CPM
). Separately by sample, the original counts of the genes are divided by the library size of this sample, and multiplied by one million. This is the appropriate normalization for between-sample comparisons.rpkm: Reads per Kilobase of transcript per Million reads mapped (RPKM). Each gene count is divided by the gene size (in kilobases) and then again divided by the library sizes of each sample (in millions). This allows for within-sample comparisons, as it takes into account the gene sizes - longer genes will always have more counts than shorter genes.
tpm: Transcripts per Million (TPM). This addresses the problem of RPKM being inconsistent across samples (which can be seen that the sum of all RPKM values will vary from sample to sample). Therefore here we divide the RPKM by the sum of all RPKM values for each sample, and multiply by one million.
voom: VOOM normalization. This is essentially just a slight variation of
CPM
where aprior_count
of 0.5 is combined withlib_sizes
increased by 1 for each sample. Note that this is not required for the corresponding differential expression analysis, but just provided as a complementary experimental normalization approach here.vst
: Variance stabilizing transformation. This is to transform the normalized count data for all genes into approximately homoskedastic values (having constant variance).rlog
: The transformation to the log2 scale values with approximately homoskedastic values.
Usage
# S4 method for AnyHermesData
normalize(
object,
methods = c("cpm", "rpkm", "tpm", "voom", "vst"),
control = control_normalize(),
...
)
h_cpm(object, control = control_normalize())
h_rpkm(object, control = control_normalize())
h_tpm(object, control = control_normalize())
h_voom(object, control = control_normalize())
h_vst(object, control = control_normalize())
h_rlog(object, control = control_normalize())
Arguments
- object
(
AnyHermesData
)
object to normalize.- methods
(
character
)
which normalization methods to use, see details.- control
(named
list
)
settings produced bycontrol_normalize()
.- ...
not used.
Value
The AnyHermesData
object with additional assays containing the normalized counts.
The control
is saved in the metadata
of the object for future reference.
Functions
h_cpm()
: calculates the Counts per Million (CPM
) normalized counts.h_rpkm()
: calculates the Reads per Kilobase per Million (RPKM) normalized counts.h_tpm()
: calculates the Transcripts per Million (TPM) normalized counts.h_vst()
: variance stabilizing transformation (vst
) fromDESeq2
package.h_rlog()
: regularized log transformation (rlog
) fromDESeq2
package.
See also
control_normalize()
to define the normalization method settings.
Examples
a <- hermes_data
# By default, log values are used with a prior count of 1 added to original counts.
result <- normalize(a)
assayNames(result)
#> [1] "counts" "cpm" "rpkm" "tpm" "voom" "vst"
tpm <- assay(result, "tpm")
tpm[1:3, 1:3]
#> 06520011B0023R 06520067C0018R 06520063C0043R
#> GeneID:11185 0.8626554 3.993877 3.591742
#> GeneID:10677 7.0670966 4.054655 3.289554
#> GeneID:101928428 0.0000000 0.000000 0.000000
# We can also work on original scale.
result_orig <- normalize(a, control = control_normalize(log = FALSE))
tpm_orig <- assay(result_orig, "tpm")
tpm_orig[1:3, 1:3]
#> 06520011B0023R 06520067C0018R 06520063C0043R
#> GeneID:11185 0.8183821 14.93224 11.05653
#> GeneID:10677 133.0936048 15.61778 8.77810
#> GeneID:101928428 0.0000000 0.00000 0.00000
# Separate calculation of the CPM normalized counts.
counts_cpm <- h_cpm(a)
str(counts_cpm)
#> num [1:5085, 1:20] -0.453 8.252 -2.453 -2.453 -2.453 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#> ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...
# Separate calculation of the RPKM normalized counts.
counts_rpkm <- h_rpkm(a)
str(counts_rpkm)
#> num [1:5085, 1:20] -2.904 4.027 0.404 1.12 1.323 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#> ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...
# Separate calculation of the TPM normalized counts.
counts_tpm <- h_tpm(a)
str(counts_tpm)
#> num [1:5085, 1:20] 0.863 7.067 0 0 0 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#> ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...
# Separate calculation of the VOOM normalized counts.
counts_voom <- h_voom(a)
str(counts_voom)
#> num [1:5085, 1:20] -0.645 8.251 -3.453 -3.453 -3.453 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#> ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...
# Separate calculation of the vst transformation.
counts_vst <- h_vst(a)
str(counts_vst)
#> num [1:5085, 1:20] 3.03 10.6 1.78 1.78 1.78 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#> ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...
# Separate calculation of the rlog transformation.
counts_rlog <- h_rlog(a)
str(counts_rlog)
#> num [1:5085, 1:20] 2.97 10.13 0 0 0 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#> ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...
#> - attr(*, "betaPriorVar")= num 7.01
#> - attr(*, "intercept")= num [1:5085, 1] 5.26 8.49 -Inf -Inf -Inf ...