MouseAgingData 0.99.8
Install the package using Bioconductor. Start R and enter:
# if(!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
# BiocManager::install("MouseAgingData")
Now, load the package and dependencies used in the vignette.
library(scran)
library(scater)
library(ggplot2)
library(bluster)
library(SingleCellExperiment)
library(ExperimentHub)
library(MouseAgingData)
Single-cell sequencing technology can reveal intricate details about individual cells, allowing researchers to interrogate the genetic make up of cells within a heterogeneous sample. Single-cell sequencing can provide insights into various aspects of cellular biology, such as characterization of cell populations, identification of rare cell types, and quantification of expression levels in cell types across experimental treatments. Given the wide utility, single-cell sequencing has expanded scientific knowledge in various fields, including cancer research, immunology, developmental biology, neurobiology, and microbiology.
There are several methods for generating single-cell sequencing data which can extract information (DNA or RNA) from a cell. These include, but are not limited to:
Droplet-based platforms: such as 10x Genomics Chromium system, inDrop, Drop-seq, and Seq-Well, which use microfluidic devices to isolate individual cells into tiny droplets along with unique barcoded beads.
Plate or microwell-based methods: such as the Smart-seq2 protocol or the C1 system by Fluidigm, respectively. These platforms employ microfluidic chips or multi-well arrays to capture and process individual cells. Unlike droplet-based platforms, these cells are manually or automatically sorted into individual wells of the plate.
The MouseAgingData
package provides analysis-ready data from an aging mouse
brain parabiosis single cell study by Ximerakis & Holton et al.,
(2023). The contents of the package
can be accessed by querying ExperimentHub with the package name.
Ximerakis & Holton et al. investigated how heterochronic parabiosis (joining of the circulatory systems) affects the mouse brain in terms of aging and rejuvenation. They identified gene signatures attributed to aging in specific cell-types. They focus especially on brain endothelial cells, which showed dynamic transcriptional changes that affect vascular structure and function.
The parabiosis single cell RNA-seq (Ximerakis, Holton et al Nature Aging 2023) includes 105,329 cells, 31 cell types across 8 OX, 8 YX, 7 YY, 9 YO, 7 OO, 11 OY animals, and 20905 features.
This vignette performs a simple analysis of the parabiosis 10X Genomics single-cell data set, following the Quick Start Workflow of Single-Cell Analysis in the OSCA Bioconductor Book.
Briefly, it walks through the process of quality control, normalization, various forms of dimensionality reduction, clustering, detection of marker genes, and annotation of cell types. PCA, UMAP, and tSNE coordinates used in the study were provided by the authors for visualization.
sce <- parabiosis10X()
#> see ?MouseAgingData and browseVignettes('MouseAgingData') for documentation
#> loading from cache
# View the data
sce
#> class: SingleCellExperiment
#> dim: 20905 105329
#> metadata(1): cell_colors
#> assays(1): counts
#> rownames(20905): Xkr4 Gm37381 ... DHRSX CAAA01147332.1
#> rowData names(2): geneID HVG
#> colnames: NULL
#> colData names(10): barcode nCount_RNA ... cell_type subpopulation
#> reducedDimNames(3): PCA UMAP TSNE
#> mainExpName: NULL
#> altExpNames(0):
Do some checking to make sure the data loaded correctly and is what we expected.
# Sample metadata
head(colData(sce))
#> DataFrame with 6 rows and 10 columns
#> barcode nCount_RNA nFeature_RNA animal batch animal_type
#> <character> <numeric> <integer> <factor> <factor> <factor>
#> 1 AAACCTGGTCAGTGGA 2100.06 815 OO1L Batch1 OO
#> 2 AAACCTGGTGTCAATC 4356.88 3120 OO1L Batch1 OO
#> 3 AAACCTGTCAAACCAC 2679.97 1208 OO1L Batch1 OO
#> 4 AAACCTGTCGTTACAG 3647.74 2137 OO1L Batch1 OO
#> 5 AAACGGGCACGAGAGT 1904.85 703 OO1L Batch1 OO
#> 6 AAAGATGAGCGTAGTG 3732.96 2247 OO1L Batch1 OO
#> percent_mito percent_ribo cell_type subpopulation
#> <numeric> <numeric> <factor> <factor>
#> 1 1.253203 5.81833 OPC qOPC
#> 2 0.510883 3.48925 NendC NendC_3
#> 3 0.789625 3.67955 OPC qOPC
#> 4 0.607773 3.99532 GABA GABA_3
#> 5 1.746996 8.52778 EC EC_1
#> 6 0.652196 3.85105 GABA GABA_13
# Includes cell colors from the original paper
metadata(sce)
#> $cell_colors
#> OPC OLG OEG NSC
#> "olivedrab4" "olivedrab3" "olivedrab1" "royalblue4"
#> ARP ASC EPC HypEPC
#> "steelblue4" "steelblue1" "lightgoldenrod4" "lightgoldenrod3"
#> TNC CPC NRP ImmN
#> "lightgoldenrod2" "gold" "darkmagenta" "purple3"
#> GABA DOPA GLUT CHOL
#> "mediumorchid3" "violetred3" "palevioletred" "violet"
#> NendC EC PC VSMC
#> "lightpink" "sienna4" "sienna3" "sienna1"
#> Hb_VC VLMC ABC MG
#> "peru" "peachpuff4" "peachpuff3" "red4"
#> MAC MNC DC NEUT
#> "red3" "tomato3" "red1" "tomato1"
#> T_cell NK B_cell
#> "salmon3" "indianred2" "coral"
In this step, we can explore and visualize mitochondrial content and read count. However, the authors have already removed low-quality cells and animals so we will skip this section in this vignette. For more details on their workflow, one can refer to the original article Ximerakis & Holton et al. (2023). The OSCA Bioconductor book also provides several examples of quality control steps as well.
Normalize the expression counts. For the purposes of demonstration, we’ll subset
this SingleCellExperiment
object down to the first 1000 cells.
sce_subset <- sce[, 1:1000]
set.seed(101000110)
clusters <- quickCluster(sce_subset)
sce_subset <- computeSumFactors(sce_subset, clusters=clusters)
sce_subset <- logNormCounts(sce_subset)
logcounts(sce_subset)[1:10, 1:10]
#> 10 x 10 sparse Matrix of class "dgCMatrix"
#>
#> Xkr4 . . . . . . . . . .
#> Gm37381 . . . . . . . . . .
#> Rp1 . . . . . . . . . .
#> Sox17 . . . . . . . . 1.5643042 .
#> Mrpl15 . 0.5233882 . . . . . . 0.5746703 .
#> Lypla1 . . . . . . . . 0.5746703 .
#> Gm37988 . . . . . . . . . .
#> Tcea1 1.837044 . . . . . . 1.337954 . .
#> Rgs20 . . . . . 0.7554552 . . . .
#> Gm16041 . . . . . . . . . .
At this point in a typical workflow, we could select an appropriate set of
highly variable genes (HVGs), say the top 10% of genes with the highest
variability in expression. Below is an example of how to do this with our
subsetted SingleCellExperiment
example.
dec <- modelGeneVar(sce_subset)
hvg <- getTopHVGs(dec, prop=0.1)
However, a logical index showing the 2000 HVG included in the original study
conducted by the authors can also be accessed in the original
SingleCellExperiment
object in the rowData()
slot.
head(rowData(sce))
#> DataFrame with 6 rows and 2 columns
#> geneID HVG
#> <character> <factor>
#> Xkr4 Xkr4 FALSE
#> Gm37381 Gm37381 FALSE
#> Rp1 Rp1 FALSE
#> Sox17 Sox17 FALSE
#> Mrpl15 Mrpl15 FALSE
#> Lypla1 Lypla1 FALSE
Below is a method for running a Principal Components Analysis using our previously defined HVGs. Since this step can take a significant amount of time to compute, we will again just apply it to our subset of 1000 cells as demonstration.
# Since we already have PCA coords from our authors, we will add these computed
# PCA coords under a different name "osca_PCA"
set.seed(1234)
sce_subset <- runPCA(sce_subset, ncomponents=25, subset_row=hvg,
name = "osca_PCA")
# Show the names of the elements in the ReducedDims slot
reducedDims(sce_subset)
#> List of length 4
#> names(4): PCA UMAP TSNE osca_PCA
As mentioned, the authors have also provided us with the first 50 PCs used in
their study within the full SingleCellExperiment
object. Let’s take a look.
reducedDim(sce_subset, "PCA")[1:5, 1:5]
#> PC1 PC2 PC3 PC4 PC5
#> [1,] 3.7896377 -2.0354706 -1.3315832 0.3672115 1.0848995
#> [2,] 1.5902497 -5.5788552 -10.3563694 3.4867199 -0.3234046
#> [3,] 3.8162534 -3.3919824 -1.8970444 0.7165197 1.2934682
#> [4,] 1.7208498 -4.2866718 -9.3561901 2.3150841 -0.4629331
#> [5,] -0.7078829 -0.6319163 0.4068132 -18.8230666 2.9753249
At this point, we could take the PCs that were previously computed and do some clustering of cells based on expression profiles. More details are provided in the OSCA book here. Let’s do some clustering with our subsetted object as an example.
colLabels(sce_subset) <- clusterCells(sce_subset, use.dimred='osca_PCA',
BLUSPARAM=NNGraphParam(cluster.fun="louvain"))
For this dataset, the authors have already provided us with their exact UMAP and
tSNE coordinates, as well as their color scheme representing the cell types from
their paper. This can be accessed in the metadata slot of the
SingleCellExperiment
object with the metadata()
function. To consistently
recreate their figures, let’s plot using their provided coordinates.
# Generate color map matching cell type to colors in publication
cell.color <- metadata(sce)$cell_color
gg <- plotUMAP(sce, color_by = "cell_type", text_by = "cell_type")
gg + theme(legend.title=element_blank()) +
scale_color_manual(values=c(cell.color))
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
This plot is a recreation of Fig. 2C from Ximerakis & Holton et al. 2023.
We can also plot a tSNE with their provided coordinates.
gg <- plotTSNE(sce, color_by = "cell_type", text_by = "cell_type")
gg + theme(legend.title=element_blank()) +
scale_color_manual(values=c(cell.color))
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
If you would like to create your own UMAP and tSNE plots, please refer back to the OSCA Bioconductor book for more details.
Ximerakis & Holton et al. (2023) Heterochronic parabiosis reprograms the mouse brain transcriptome by shifting aging signatures in multiple cell types. 3, 327–345. DOI:https://doi.org/10.1038/s43587-023-00373-6.
sessionInfo()
#> R Under development (unstable) (2024-01-16 r85808)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] MouseAgingData_0.99.8 ExperimentHub_2.11.1
#> [3] AnnotationHub_3.11.1 BiocFileCache_2.11.1
#> [5] dbplyr_2.4.0 bluster_1.13.0
#> [7] scater_1.31.2 ggplot2_3.4.4
#> [9] scran_1.31.2 scuttle_1.13.0
#> [11] SingleCellExperiment_1.25.0 SummarizedExperiment_1.33.3
#> [13] Biobase_2.63.0 GenomicRanges_1.55.3
#> [15] GenomeInfoDb_1.39.6 IRanges_2.37.1
#> [17] S4Vectors_0.41.3 BiocGenerics_0.49.1
#> [19] MatrixGenerics_1.15.0 matrixStats_1.2.0
#> [21] BiocStyle_2.31.0
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.2.1 bitops_1.0-7
#> [3] gridExtra_2.3 rlang_1.1.3
#> [5] magrittr_2.0.3 compiler_4.4.0
#> [7] RSQLite_2.3.5 DelayedMatrixStats_1.25.1
#> [9] png_0.1-8 vctrs_0.6.5
#> [11] pkgconfig_2.0.3 crayon_1.5.2
#> [13] fastmap_1.1.1 XVector_0.43.1
#> [15] labeling_0.4.3 utf8_1.2.4
#> [17] rmarkdown_2.25 ggbeeswarm_0.7.2
#> [19] purrr_1.0.2 bit_4.0.5
#> [21] xfun_0.42 zlibbioc_1.49.0
#> [23] cachem_1.0.8 beachmat_2.19.1
#> [25] jsonlite_1.8.8 blob_1.2.4
#> [27] highr_0.10 DelayedArray_0.29.1
#> [29] BiocParallel_1.37.0 irlba_2.3.5.1
#> [31] parallel_4.4.0 cluster_2.1.6
#> [33] R6_2.5.1 bslib_0.6.1
#> [35] limma_3.59.3 jquerylib_0.1.4
#> [37] Rcpp_1.0.12 bookdown_0.37
#> [39] knitr_1.45 Matrix_1.6-5
#> [41] igraph_2.0.1.1 tidyselect_1.2.0
#> [43] abind_1.4-5 yaml_2.3.8
#> [45] viridis_0.6.5 codetools_0.2-19
#> [47] curl_5.2.0 lattice_0.22-5
#> [49] tibble_3.2.1 KEGGREST_1.43.0
#> [51] withr_3.0.0 evaluate_0.23
#> [53] Biostrings_2.71.2 filelock_1.0.3
#> [55] pillar_1.9.0 BiocManager_1.30.22
#> [57] generics_0.1.3 RCurl_1.98-1.14
#> [59] BiocVersion_3.19.1 sparseMatrixStats_1.15.0
#> [61] munsell_0.5.0 scales_1.3.0
#> [63] glue_1.7.0 metapod_1.11.1
#> [65] tools_4.4.0 BiocNeighbors_1.21.2
#> [67] ScaledMatrix_1.11.0 locfit_1.5-9.8
#> [69] cowplot_1.1.3 grid_4.4.0
#> [71] AnnotationDbi_1.65.2 edgeR_4.1.16
#> [73] colorspace_2.1-0 GenomeInfoDbData_1.2.11
#> [75] beeswarm_0.4.0 BiocSingular_1.19.0
#> [77] vipor_0.4.7 cli_3.6.2
#> [79] rsvd_1.0.5 rappdirs_0.3.3
#> [81] fansi_1.0.6 S4Arrays_1.3.3
#> [83] viridisLite_0.4.2 dplyr_1.1.4
#> [85] gtable_0.3.4 sass_0.4.8
#> [87] digest_0.6.34 SparseArray_1.3.4
#> [89] ggrepel_0.9.5 dqrng_0.3.2
#> [91] farver_2.1.1 memoise_2.0.1
#> [93] htmltools_0.5.7 lifecycle_1.0.4
#> [95] httr_1.4.7 mime_0.12
#> [97] statmod_1.5.0 bit64_4.0.5