# Chimeric mouse embryo (10X Genomics) ## Introduction This performs an analysis of the @pijuansala2019single dataset on mouse gastrulation. Here, we examine chimeric embryos at the E8.5 stage of development where td-Tomato-positive embryonic stem cells (ESCs) were injected into a wild-type blastocyst. ## Data loading ```r library(MouseGastrulationData) sce.chimera <- WTChimeraData(samples=5:10) sce.chimera ``` ``` ## class: SingleCellExperiment ## dim: 29453 20935 ## metadata(0): ## assays(1): counts ## rownames(29453): ENSMUSG00000051951 ENSMUSG00000089699 ... ## ENSMUSG00000095742 tomato-td ## rowData names(2): ENSEMBL SYMBOL ## colnames(20935): cell_9769 cell_9770 ... cell_30702 cell_30703 ## colData names(11): cell barcode ... doub.density sizeFactor ## reducedDimNames(2): pca.corrected.E7.5 pca.corrected.E8.5 ## mainExpName: NULL ## altExpNames(0): ``` ```r library(scater) rownames(sce.chimera) <- uniquifyFeatureNames( rowData(sce.chimera)$ENSEMBL, rowData(sce.chimera)$SYMBOL) ``` ## Quality control Quality control on the cells has already been performed by the authors, so we will not repeat it here. We additionally remove cells that are labelled as stripped nuclei or doublets. ```r drop <- sce.chimera$celltype.mapped %in% c("stripped", "Doublet") sce.chimera <- sce.chimera[,!drop] ``` ## Normalization We use the pre-computed size factors in `sce.chimera`. ```r sce.chimera <- logNormCounts(sce.chimera) ``` ## Variance modelling We retain all genes with any positive biological component, to preserve as much signal as possible across a very heterogeneous dataset. ```r library(scran) dec.chimera <- modelGeneVar(sce.chimera, block=sce.chimera$sample) chosen.hvgs <- dec.chimera$bio > 0 ``` ```r par(mfrow=c(1,2)) blocked.stats <- dec.chimera$per.block for (i in colnames(blocked.stats)) { current <- blocked.stats[[i]] plot(current$mean, current$total, main=i, pch=16, cex=0.5, xlab="Mean of log-expression", ylab="Variance of log-expression") curfit <- metadata(current) curve(curfit$trend(x), col='dodgerblue', add=TRUE, lwd=2) } ```
Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

(\#fig:unref-pijuan-var-1)Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

(\#fig:unref-pijuan-var-2)Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

(\#fig:unref-pijuan-var-3)Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

## Merging We use a hierarchical merge to first merge together replicates with the same genotype, and then merge samples across different genotypes. ```r library(batchelor) set.seed(01001001) merged <- correctExperiments(sce.chimera, batch=sce.chimera$sample, subset.row=chosen.hvgs, PARAM=FastMnnParam( merge.order=list( list(1,3,5), # WT (3 replicates) list(2,4,6) # td-Tomato (3 replicates) ) ) ) ``` We use the percentage of variance lost as a diagnostic: ```r metadata(merged)$merge.info$lost.var ``` ``` ## 5 6 7 8 9 10 ## [1,] 0.000e+00 0.0204433 0.000e+00 0.0169567 0.000000 0.000000 ## [2,] 0.000e+00 0.0007389 0.000e+00 0.0004409 0.000000 0.015474 ## [3,] 3.090e-02 0.0000000 2.012e-02 0.0000000 0.000000 0.000000 ## [4,] 9.024e-05 0.0000000 8.272e-05 0.0000000 0.018047 0.000000 ## [5,] 4.321e-03 0.0072518 4.124e-03 0.0078280 0.003831 0.007786 ``` ## Clustering ```r g <- buildSNNGraph(merged, use.dimred="corrected") clusters <- igraph::cluster_louvain(g) colLabels(merged) <- factor(clusters$membership) ``` We examine the distribution of cells across clusters and samples. ```r table(Cluster=colLabels(merged), Sample=merged$sample) ``` ``` ## Sample ## Cluster 5 6 7 8 9 10 ## 1 77 19 56 50 131 60 ## 2 148 37 133 110 231 216 ## 3 98 16 165 128 369 273 ## 4 185 115 328 593 460 547 ## 5 135 72 322 594 296 778 ## 6 212 53 344 203 536 612 ## 7 149 73 86 86 163 383 ## 8 133 97 110 66 162 313 ## 9 84 21 79 35 170 213 ## 10 174 45 219 182 211 381 ## 11 97 19 36 18 50 35 ## 12 111 41 45 35 40 147 ## 13 123 64 62 51 63 140 ## 14 157 78 130 104 164 436 ## 15 110 69 72 96 127 253 ## 16 43 35 82 80 85 354 ## 17 77 43 191 118 329 487 ## 18 47 22 82 51 87 130 ## 19 39 41 50 48 128 125 ## 20 1 5 0 84 0 66 ## 21 18 7 13 17 20 37 ## 22 58 29 90 79 81 188 ## 23 9 7 18 13 30 27 ## 24 11 15 20 9 47 57 ## 25 2 1 7 3 77 138 ## 26 0 2 0 51 0 5 ``` ## Dimensionality reduction We use an external algorithm to compute nearest neighbors for greater speed. ```r merged <- runTSNE(merged, dimred="corrected", external_neighbors=TRUE) merged <- runUMAP(merged, dimred="corrected", external_neighbors=TRUE) ``` ```r gridExtra::grid.arrange( plotTSNE(merged, colour_by="label", text_by="label", text_col="red"), plotTSNE(merged, colour_by="batch") ) ```
Obligatory $t$-SNE plots of the Pijuan-Sala chimeric mouse embryo dataset, where each point represents a cell and is colored according to the assigned cluster (top) or sample of origin (bottom).

(\#fig:unref-pijuan-tsne)Obligatory $t$-SNE plots of the Pijuan-Sala chimeric mouse embryo dataset, where each point represents a cell and is colored according to the assigned cluster (top) or sample of origin (bottom).

## Session Info {-}
``` R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.4 LTS Matrix products: default BLAS: /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] batchelor_1.12.0 scran_1.24.0 [3] scater_1.24.0 ggplot2_3.3.6 [5] scuttle_1.6.2 MouseGastrulationData_1.10.0 [7] SpatialExperiment_1.6.0 SingleCellExperiment_1.18.0 [9] SummarizedExperiment_1.26.1 Biobase_2.56.0 [11] GenomicRanges_1.48.0 GenomeInfoDb_1.32.2 [13] IRanges_2.30.0 S4Vectors_0.34.0 [15] BiocGenerics_0.42.0 MatrixGenerics_1.8.0 [17] matrixStats_0.62.0 BiocStyle_2.24.0 [19] rebook_1.6.0 loaded via a namespace (and not attached): [1] AnnotationHub_3.4.0 BiocFileCache_2.4.0 [3] igraph_1.3.1 BiocParallel_1.30.2 [5] digest_0.6.29 BumpyMatrix_1.4.0 [7] htmltools_0.5.2 viridis_0.6.2 [9] magick_2.7.3 fansi_1.0.3 [11] magrittr_2.0.3 memoise_2.0.1 [13] ScaledMatrix_1.4.0 cluster_2.1.3 [15] limma_3.52.1 Biostrings_2.64.0 [17] R.utils_2.11.0 colorspace_2.0-3 [19] blob_1.2.3 rappdirs_0.3.3 [21] ggrepel_0.9.1 xfun_0.31 [23] dplyr_1.0.9 crayon_1.5.1 [25] RCurl_1.98-1.6 jsonlite_1.8.0 [27] graph_1.74.0 glue_1.6.2 [29] gtable_0.3.0 zlibbioc_1.42.0 [31] XVector_0.36.0 DelayedArray_0.22.0 [33] BiocSingular_1.12.0 DropletUtils_1.16.0 [35] Rhdf5lib_1.18.2 HDF5Array_1.24.0 [37] scales_1.2.0 DBI_1.1.2 [39] edgeR_3.38.1 Rcpp_1.0.8.3 [41] viridisLite_0.4.0 xtable_1.8-4 [43] dqrng_0.3.0 bit_4.0.4 [45] rsvd_1.0.5 ResidualMatrix_1.6.0 [47] metapod_1.4.0 httr_1.4.3 [49] dir.expiry_1.4.0 ellipsis_0.3.2 [51] farver_2.1.0 pkgconfig_2.0.3 [53] XML_3.99-0.9 R.methodsS3_1.8.1 [55] uwot_0.1.11 CodeDepends_0.6.5 [57] sass_0.4.1 dbplyr_2.1.1 [59] locfit_1.5-9.5 utf8_1.2.2 [61] labeling_0.4.2 tidyselect_1.1.2 [63] rlang_1.0.2 later_1.3.0 [65] AnnotationDbi_1.58.0 munsell_0.5.0 [67] BiocVersion_3.15.2 tools_4.2.0 [69] cachem_1.0.6 cli_3.3.0 [71] generics_0.1.2 RSQLite_2.2.14 [73] ExperimentHub_2.4.0 evaluate_0.15 [75] stringr_1.4.0 fastmap_1.1.0 [77] yaml_2.3.5 knitr_1.39 [79] bit64_4.0.5 purrr_0.3.4 [81] KEGGREST_1.36.0 sparseMatrixStats_1.8.0 [83] mime_0.12 R.oo_1.24.0 [85] compiler_4.2.0 beeswarm_0.4.0 [87] filelock_1.0.2 curl_4.3.2 [89] png_0.1-7 interactiveDisplayBase_1.34.0 [91] statmod_1.4.36 tibble_3.1.7 [93] bslib_0.3.1 stringi_1.7.6 [95] highr_0.9 lattice_0.20-45 [97] bluster_1.6.0 Matrix_1.4-1 [99] vctrs_0.4.1 pillar_1.7.0 [101] lifecycle_1.0.1 rhdf5filters_1.8.0 [103] BiocManager_1.30.18 jquerylib_0.1.4 [105] BiocNeighbors_1.14.0 cowplot_1.1.1 [107] bitops_1.0-7 irlba_2.3.5 [109] httpuv_1.6.5 R6_2.5.1 [111] bookdown_0.26 promises_1.2.0.1 [113] gridExtra_2.3 vipor_0.4.5 [115] codetools_0.2-18 assertthat_0.2.1 [117] rhdf5_2.40.0 rjson_0.2.21 [119] withr_2.5.0 GenomeInfoDbData_1.2.8 [121] parallel_4.2.0 grid_4.2.0 [123] beachmat_2.12.0 rmarkdown_2.14 [125] DelayedMatrixStats_1.18.0 Rtsne_0.16 [127] shiny_1.7.1 ggbeeswarm_0.6.0 ```