Skip to contents

[Stable]

The normalize() method is normalizing the input AnyHermesData according to one or more specified normalization methods. The results are saved as additional assays in the object.

Possible normalization methods (which are implemented with separate helper functions):

  • cpm: Counts per Million (CPM). Separately by sample, the original counts of the genes are divided by the library size of this sample, and multiplied by one million. This is the appropriate normalization for between-sample comparisons.

  • rpkm: Reads per Kilobase of transcript per Million reads mapped (RPKM). Each gene count is divided by the gene size (in kilobases) and then again divided by the library sizes of each sample (in millions). This allows for within-sample comparisons, as it takes into account the gene sizes - longer genes will always have more counts than shorter genes.

  • tpm: Transcripts per Million (TPM). This addresses the problem of RPKM being inconsistent across samples (which can be seen that the sum of all RPKM values will vary from sample to sample). Therefore here we divide the RPKM by the sum of all RPKM values for each sample, and multiply by one million.

  • voom: VOOM normalization. This is essentially just a slight variation of CPM where a prior_count of 0.5 is combined with lib_sizes increased by 1 for each sample. Note that this is not required for the corresponding differential expression analysis, but just provided as a complementary experimental normalization approach here.

  • vst: Variance stabilizing transformation. This is to transform the normalized count data for all genes into approximately homoskedastic values (having constant variance).

  • rlog: The transformation to the log2 scale values with approximately homoskedastic values.

Usage

# S4 method for AnyHermesData
normalize(
  object,
  methods = c("cpm", "rpkm", "tpm", "voom", "vst"),
  control = control_normalize(),
  ...
)

h_cpm(object, control = control_normalize())

h_rpkm(object, control = control_normalize())

h_tpm(object, control = control_normalize())

h_voom(object, control = control_normalize())

h_vst(object, control = control_normalize())

h_rlog(object, control = control_normalize())

Arguments

object

(AnyHermesData)
object to normalize.

methods

(character)
which normalization methods to use, see details.

control

(named list)
settings produced by control_normalize().

...

not used.

Value

The AnyHermesData object with additional assays containing the normalized counts. The control is saved in the metadata of the object for future reference.

Functions

  • h_cpm(): calculates the Counts per Million (CPM) normalized counts.

  • h_rpkm(): calculates the Reads per Kilobase per Million (RPKM) normalized counts.

  • h_tpm(): calculates the Transcripts per Million (TPM) normalized counts.

  • h_voom(): calculates the VOOM normalized counts. [Experimental]

  • h_vst(): variance stabilizing transformation (vst) from DESeq2 package.

  • h_rlog(): regularized log transformation (rlog) from DESeq2 package.

See also

control_normalize() to define the normalization method settings.

Examples

a <- hermes_data

# By default, log values are used with a prior count of 1 added to original counts.
result <- normalize(a)
assayNames(result)
#> [1] "counts" "cpm"    "rpkm"   "tpm"    "voom"   "vst"   
tpm <- assay(result, "tpm")
tpm[1:3, 1:3]
#>                  06520011B0023R 06520067C0018R 06520063C0043R
#> GeneID:11185          0.8626554       3.993877       3.591742
#> GeneID:10677          7.0670966       4.054655       3.289554
#> GeneID:101928428      0.0000000       0.000000       0.000000

# We can also work on original scale.
result_orig <- normalize(a, control = control_normalize(log = FALSE))
tpm_orig <- assay(result_orig, "tpm")
tpm_orig[1:3, 1:3]
#>                  06520011B0023R 06520067C0018R 06520063C0043R
#> GeneID:11185          0.8183821       14.93224       11.05653
#> GeneID:10677        133.0936048       15.61778        8.77810
#> GeneID:101928428      0.0000000        0.00000        0.00000

# Separate calculation of the CPM normalized counts.
counts_cpm <- h_cpm(a)
str(counts_cpm)
#>  num [1:5085, 1:20] -0.453 8.252 -2.453 -2.453 -2.453 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#>   ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...

# Separate calculation of the RPKM normalized counts.
counts_rpkm <- h_rpkm(a)
str(counts_rpkm)
#>  num [1:5085, 1:20] -2.904 4.027 0.404 1.12 1.323 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#>   ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...

# Separate calculation of the TPM normalized counts.
counts_tpm <- h_tpm(a)
str(counts_tpm)
#>  num [1:5085, 1:20] 0.863 7.067 0 0 0 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#>   ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...

# Separate calculation of the VOOM normalized counts.
counts_voom <- h_voom(a)
str(counts_voom)
#>  num [1:5085, 1:20] -0.645 8.251 -3.453 -3.453 -3.453 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#>   ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...

# Separate calculation of the vst transformation.
counts_vst <- h_vst(a)
str(counts_vst)
#>  num [1:5085, 1:20] 3.03 10.6 1.78 1.78 1.78 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#>   ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...

# Separate calculation of the rlog transformation.
counts_rlog <- h_rlog(a)
str(counts_rlog)
#>  num [1:5085, 1:20] 2.97 10.13 0 0 0 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:5085] "GeneID:11185" "GeneID:10677" "GeneID:101928428" "GeneID:100422835" ...
#>   ..$ : chr [1:20] "06520011B0023R" "06520067C0018R" "06520063C0043R" "06520105C0017R" ...
#>  - attr(*, "betaPriorVar")= num 7.01
#>  - attr(*, "intercept")= num [1:5085, 1] 5.26 8.49 -Inf -Inf -Inf ...