Showcases the use of SEtools to merge objects of the SummarizedExperiment class.
SEtools 1.16.0
The SEtools package is a set of convenience functions for the Bioconductor class SummarizedExperiment. It facilitates merging, melting, and plotting SummarizedExperiment
objects.
NOTE that the heatmap-related and melting functions have been moved to a standalone package, sechm.
The old sehm
function of SEtools
should be considered deprecated, and most SEtools
functions are conserved for legacy/reproducibility reasons (or until they find a better home).
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SEtools")
Or, to install the latest development version:
BiocManager::install("plger/SEtools")
To showcase the main functions, we will use an example object which contains (a subset of) whole-hippocampus RNAseq of mice after different stressors:
suppressPackageStartupMessages({
library(SummarizedExperiment)
library(SEtools)
})
data("SE", package="SEtools")
SE
## class: SummarizedExperiment
## dim: 100 20
## metadata(0):
## assays(2): counts logcpm
## rownames(100): Egr1 Nr4a1 ... CH36-200G6.4 Bhlhe22
## rowData names(2): meanCPM meanTPM
## colnames(20): HC.Homecage.1 HC.Homecage.2 ... HC.Swim.4 HC.Swim.5
## colData names(2): Region Condition
This is taken from Floriou-Servou et al., Biol Psychiatry 2018.
se1 <- SE[,1:10]
se2 <- SE[,11:20]
se3 <- mergeSEs( list(se1=se1, se2=se2) )
se3
## class: SummarizedExperiment
## dim: 100 20
## metadata(3): se1 se2 anno_colors
## assays(2): counts logcpm
## rownames(100): AC139063.2 Actr6 ... Zfp667 Zfp930
## rowData names(2): meanCPM meanTPM
## colnames(20): se1.HC.Homecage.1 se1.HC.Homecage.2 ... se2.HC.Swim.4
## se2.HC.Swim.5
## colData names(3): Dataset Region Condition
All assays were merged, along with rowData and colData slots.
By default, row z-scores are calculated for each object when merging. This can be prevented with:
se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE)
If more than one assay is present, one can specify a different scaling behavior for each assay:
se3 <- mergeSEs( list(se1=se1, se2=se2), use.assays=c("counts", "logcpm"), do.scale=c(FALSE, TRUE))
Differences to the cbind
method include prefixes added to column names, optional scaling, handling of metadata (e.g. for sechm
)
It is also possible to merge by rowData columns, which are specified through the mergeBy
argument.
In this case, one can have one-to-many and many-to-many mappings, in which case two behaviors are possible:
aggFun
, the features of each object will by aggregated by mergeBy
using this function before merging.rowData(se1)$metafeature <- sample(LETTERS,nrow(se1),replace = TRUE)
rowData(se2)$metafeature <- sample(LETTERS,nrow(se2),replace = TRUE)
se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE, mergeBy="metafeature", aggFun=median)
## Aggregating the objects by metafeature
## Merging...
sechm::sechm(se3, features=row.names(se3))
A single SE can also be aggregated by using the aggSE
function:
se1b <- aggSE(se1, by = "metafeature")
## Aggregation methods for each assay:
## counts: sum; logcpm: expsum
se1b
## class: SummarizedExperiment
## dim: 26 10
## metadata(0):
## assays(2): counts logcpm
## rownames(26): A B ... Y Z
## rowData names(0):
## colnames(10): HC.Homecage.1 HC.Homecage.2 ... HC.Handling.4
## HC.Handling.5
## colData names(2): Region Condition
If the aggregation function(s) are not specified, aggSE
will try to guess decent aggregation functions from the assay names.
This is similar to scuttle::sumCountsAcrossFeatures
, but preserves other SE slots.
Calculate an assay of log-foldchanges to the controls:
SE <- log2FC(SE, fromAssay="logcpm", controls=SE$Condition=="Homecage")
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] SEtools_1.16.0 sechm_1.10.0
## [3] ComplexHeatmap_2.18.0 SummarizedExperiment_1.32.0
## [5] Biobase_2.62.0 GenomicRanges_1.54.0
## [7] GenomeInfoDb_1.38.0 IRanges_2.36.0
## [9] S4Vectors_0.40.0 BiocGenerics_0.48.0
## [11] MatrixGenerics_1.14.0 matrixStats_1.0.0
## [13] BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.1.3 bitops_1.0-7 rlang_1.1.1
## [4] magrittr_2.0.3 clue_0.3-65 GetoptLong_1.0.5
## [7] RSQLite_2.3.1 compiler_4.3.1 mgcv_1.9-0
## [10] png_0.1-8 vctrs_0.6.4 sva_3.50.0
## [13] stringr_1.5.0 pkgconfig_2.0.3 shape_1.4.6
## [16] crayon_1.5.2 fastmap_1.1.1 magick_2.8.1
## [19] XVector_0.42.0 ca_0.71.1 utf8_1.2.4
## [22] rmarkdown_2.25 bit_4.0.5 xfun_0.40
## [25] zlibbioc_1.48.0 cachem_1.0.8 jsonlite_1.8.7
## [28] blob_1.2.4 DelayedArray_0.28.0 BiocParallel_1.36.0
## [31] parallel_4.3.1 cluster_2.1.4 R6_2.5.1
## [34] bslib_0.5.1 stringi_1.7.12 RColorBrewer_1.1-3
## [37] limma_3.58.0 genefilter_1.84.0 jquerylib_0.1.4
## [40] Rcpp_1.0.11 bookdown_0.36 iterators_1.0.14
## [43] knitr_1.44 splines_4.3.1 Matrix_1.6-1.1
## [46] tidyselect_1.2.0 abind_1.4-5 yaml_2.3.7
## [49] TSP_1.2-4 doParallel_1.0.17 codetools_0.2-19
## [52] curl_5.1.0 lattice_0.22-5 tibble_3.2.1
## [55] KEGGREST_1.42.0 evaluate_0.22 Rtsne_0.16
## [58] survival_3.5-7 zip_2.3.0 Biostrings_2.70.0
## [61] circlize_0.4.15 pillar_1.9.0 BiocManager_1.30.22
## [64] foreach_1.5.2 generics_0.1.3 RCurl_1.98-1.12
## [67] ggplot2_3.4.4 munsell_0.5.0 scales_1.2.1
## [70] xtable_1.8-4 glue_1.6.2 pheatmap_1.0.12
## [73] tools_4.3.1 data.table_1.14.8 annotate_1.80.0
## [76] openxlsx_4.2.5.2 locfit_1.5-9.8 registry_0.5-1
## [79] XML_3.99-0.14 Cairo_1.6-1 seriation_1.5.1
## [82] AnnotationDbi_1.64.0 edgeR_4.0.0 colorspace_2.1-0
## [85] nlme_3.1-163 GenomeInfoDbData_1.2.11 randomcoloR_1.1.0.1
## [88] cli_3.6.1 fansi_1.0.5 S4Arrays_1.2.0
## [91] dplyr_1.1.3 V8_4.4.0 gtable_0.3.4
## [94] DESeq2_1.42.0 sass_0.4.7 digest_0.6.33
## [97] SparseArray_1.2.0 rjson_0.2.21 memoise_2.0.1
## [100] htmltools_0.5.6.1 lifecycle_1.0.3 httr_1.4.7
## [103] GlobalOptions_0.1.2 statmod_1.5.0 bit64_4.0.5