Chapter 12 Bach mouse mammary gland (10X Genomics)
12.1 Introduction
This performs an analysis of the Bach et al. (2017) 10X Genomics dataset, from which we will consider a single sample of epithelial cells from the mouse mammary gland during gestation.
12.3 Quality control
is.mito <- rowData(sce.mam)$SEQNAME == "MT"
stats <- perCellQCMetrics(sce.mam, subsets=list(Mito=which(is.mito)))
qc <- quickPerCellQC(stats, percent_subsets="subsets_Mito_percent")
sce.mam <- sce.mam[,!qc$discard]
colData(unfiltered) <- cbind(colData(unfiltered), stats)
unfiltered$discard <- qc$discard
plotColData(unfiltered, y="sum", colour_by="discard") +
scale_y_log10() + ggtitle("Total count"),
plotColData(unfiltered, y="detected", colour_by="discard") +
scale_y_log10() + ggtitle("Detected features"),
plotColData(unfiltered, y="subsets_Mito_percent",
colour_by="discard") + ggtitle("Mito percent"),

Figure 12.1: Distribution of each QC metric across cells in the Bach mammary gland dataset. Each point represents a cell and is colored according to whether that cell was discarded.

Figure 12.2: Percentage of mitochondrial reads in each cell in the Bach mammary gland dataset compared to its total count. Each point represents a cell and is colored according to whether that cell was discarded.
## low_lib_size low_n_features high_subsets_Mito_percent
## 0 0 143
## discard
## 143
12.4 Normalization
clusters <- quickCluster(sce.mam)
sce.mam <- computeSumFactors(sce.mam, clusters=clusters)
sce.mam <- logNormCounts(sce.mam)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.264 0.520 0.752 1.000 1.207 10.790
plot(librarySizeFactors(sce.mam), sizeFactors(sce.mam), pch=16,
xlab="Library size factors", ylab="Deconvolution factors", log="xy")

Figure 12.3: Relationship between the library size factors and the deconvolution size factors in the Bach mammary gland dataset.
12.5 Variance modelling
We use a Poisson-based technical trend to capture more genuine biological variation in the biological component.
dec.mam <- modelGeneVarByPoisson(sce.mam)
top.mam <- getTopHVGs(dec.mam, prop=0.1)
plot(dec.mam$mean, dec.mam$total, pch=16, cex=0.5,
xlab="Mean of log-expression", ylab="Variance of log-expression")
curfit <- metadata(dec.mam)
curve(curfit$trend(x), col='dodgerblue', add=TRUE, lwd=2)

Figure 12.4: Per-gene variance as a function of the mean for the log-expression values in the Bach mammary gland dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to simulated Poisson counts.
12.6 Dimensionality reduction
sce.mam <- denoisePCA(sce.mam, technical=dec.mam, subset.row=top.mam)
sce.mam <- runTSNE(sce.mam, dimred="PCA")
## [1] 15
12.7 Clustering
We use a higher k
to obtain coarser clusters (for use in doubletCluster()
later). <- buildSNNGraph(sce.mam, use.dimred="PCA", k=25)
colLabels(sce.mam) <- factor(igraph::cluster_walktrap($membership)
## 1 2 3 4 5 6 7 8 9 10
## 550 847 639 477 54 88 39 22 32 24

Figure 12.5: Obligatory \(t\)-SNE plot of the Bach mammary gland dataset, where each point represents a cell and is colored according to the assigned cluster.
