Single Cells with edgeR

This vignette demonstrates usage of Glimma on a single cell dataset. The data here comes from brain cells from the Zeisel A et al. (2015) study on mouse brain single cells. We will use the MDS plot to perform unsupervised clustering of the cells. A pseudo-bulk single cell aggregation approach with edgeR will be used to test for differential expression, and two styles of MA plots will be used to investigate the results.

This is a simplified workflow not intended to represent best practice, but to produce reasonable looking plots in the minimal amount of code. Please refer to a resource such Orchestrating Single-Cell Analysis with Bioconductor (OSCA) for appropriate workflows for analysis.

We start by loading in the data using the scRNAseq package.

library(scRNAseq)
library(scater)
library(scran)
library(Glimma)
library(edgeR)

sce <- ZeiselBrainData(ensembl=TRUE)
#> snapshotDate(): 2021-10-18
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
#> snapshotDate(): 2021-10-18
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
#> snapshotDate(): 2021-10-20
#> loading from cache
#> require("ensembldb")
#> Warning: Unable to map 1565 of 20006 requested IDs.

Once the data is loaded in we follow the OSCA procedure for identifying highly variable genes for creating a multi-dimensional scaling (MDS) plot. We use the functions provided by scran to identify the most highly variable genes rather than the algorithm within glimmaMDS, as scran is tailored towards single cells.

sce <- logNormCounts(sce)

var_mod <- modelGeneVar(sce)
hvg_genes <- getTopHVGs(var_mod, n=500)
hvg_sce <- sce[hvg_genes, ]

hvg_sce <- logNormCounts(hvg_sce)

Choosing to colour the MDS plot using level1class reveals separation between cell types.

glimmaMDS(
    exprs(hvg_sce),
    groups = colData(hvg_sce)
)

To demonstrate the MA plot we will perform a differential expression analysis using the pseudo-bulk approach. This involves creating pseudo-bulk samples by aggregating single cells as an analogue of biological replicates. Here the pseudo-bulk samples will be generated from combinations of level1class and level2class, the cells belonging to unique combinations of the two factors will be aggregated into samples.

colData(sce)$pb_group <-
    paste0(colData(sce)$level1class,
           "_",
           colData(sce)$level2class)

sce_counts <- counts(sce)
pb_counts <- t(rowsum(t(sce_counts), colData(sce)$pb_group))

pb_samples <- colnames(pb_counts)
pb_samples <- gsub("astrocytes_ependymal", "astrocytes-ependymal", pb_samples)
pb_split <- do.call(rbind, strsplit(pb_samples, "_"))
pb_sample_anno <- data.frame(
    sample = pb_samples,
    cell_type = pb_split[, 1],
    sample_group = pb_split[, 2]
)

With the pseudo-bulk annotations and counts we can construct a DGEList object.

pb_dge <- DGEList(
    counts = pb_counts,
    samples = pb_sample_anno,
    group = pb_sample_anno$cell_type
)

pb_dge <- calcNormFactors(pb_dge)

With this we perform differential expression analysis between “pyramidal SS” and “pyramidal CA1” samples using edgeR’s generalised linear models.

design <- model.matrix(~0 + cell_type, data = pb_dge$samples)
colnames(design) <- make.names(gsub("cell_type", "", colnames(design)))

pb_dge <- estimateDisp(pb_dge, design)

contr <- makeContrasts("pyramidal.SS - pyramidal.CA1", levels = design)

pb_fit <- glmFit(pb_dge, design)
pb_lrt <- glmLRT(pb_fit, contrast = contr)

The results of this analysis can be visualised using glimmaMA() as it would be for bulk RNA-seq.

glimmaMA(pb_lrt, dge = pb_dge)

An alternative view of the data can be constructed using the single cells in the expression plot rather than the pseudo-bulk samples. Since the MA plot is related to the expressions by only the genes in the rows, another expression matrix containing the same genes can be substituted in as below.

We construct a new DGE list from the raw single cell counts, then filter it down to just the cells used in our comparison and further down-sampled to 100 cells. This is done because Glimma does not handle a large number of cells well, the limit being a few hundred for most computers. Sampling still provides an approximate representation of the data without computation strain.

The code is not evaluated here to keep the vignette compact.

sc_dge <- DGEList(
    counts = sce_counts,
    group = colData(sce)$level1class
)

sc_dge <- sc_dge[, colData(sce)$level1class %in% c("pyramidal CA1", "pyramidal SS")]

glimmaMA(
    pb_lrt,
    dge = sc_dge[, sample(1:ncol(sc_dge), 100)]
)

Session Info

sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] ensembldb_2.18.0            AnnotationFilter_1.18.0    
#>  [3] GenomicFeatures_1.46.0      AnnotationDbi_1.56.0       
#>  [5] AnnotationHub_3.2.0         BiocFileCache_2.2.0        
#>  [7] dbplyr_2.1.1                scran_1.22.0               
#>  [9] scater_1.22.0               ggplot2_3.3.5              
#> [11] scuttle_1.4.0               scRNAseq_2.7.2             
#> [13] SingleCellExperiment_1.16.0 DESeq2_1.34.0              
#> [15] SummarizedExperiment_1.24.0 Biobase_2.54.0             
#> [17] MatrixGenerics_1.6.0        matrixStats_0.61.0         
#> [19] GenomicRanges_1.46.0        GenomeInfoDb_1.30.0        
#> [21] IRanges_2.28.0              S4Vectors_0.32.0           
#> [23] BiocGenerics_0.40.0         edgeR_3.36.0               
#> [25] limma_3.50.0                Glimma_2.4.0               
#> 
#> loaded via a namespace (and not attached):
#>   [1] igraph_1.2.7                  lazyeval_0.2.2               
#>   [3] splines_4.1.1                 BiocParallel_1.28.0          
#>   [5] digest_0.6.28                 htmltools_0.5.2              
#>   [7] viridis_0.6.2                 fansi_0.5.0                  
#>   [9] magrittr_2.0.1                memoise_2.0.0                
#>  [11] ScaledMatrix_1.2.0            cluster_2.1.2                
#>  [13] Biostrings_2.62.0             annotate_1.72.0              
#>  [15] prettyunits_1.1.1             colorspace_2.0-2             
#>  [17] blob_1.2.2                    rappdirs_0.3.3               
#>  [19] ggrepel_0.9.1                 xfun_0.27                    
#>  [21] dplyr_1.0.7                   crayon_1.4.1                 
#>  [23] RCurl_1.98-1.5                jsonlite_1.7.2               
#>  [25] genefilter_1.76.0             survival_3.2-13              
#>  [27] glue_1.4.2                    gtable_0.3.0                 
#>  [29] zlibbioc_1.40.0               XVector_0.34.0               
#>  [31] DelayedArray_0.20.0           BiocSingular_1.10.0          
#>  [33] scales_1.1.1                  DBI_1.1.1                    
#>  [35] Rcpp_1.0.7                    viridisLite_0.4.0            
#>  [37] xtable_1.8-4                  progress_1.2.2               
#>  [39] dqrng_0.3.0                   bit_4.0.4                    
#>  [41] rsvd_1.0.5                    metapod_1.2.0                
#>  [43] htmlwidgets_1.5.4             httr_1.4.2                   
#>  [45] RColorBrewer_1.1-2            ellipsis_0.3.2               
#>  [47] pkgconfig_2.0.3               XML_3.99-0.8                 
#>  [49] sass_0.4.0                    locfit_1.5-9.4               
#>  [51] utf8_1.2.2                    tidyselect_1.1.1             
#>  [53] rlang_0.4.12                  later_1.3.0                  
#>  [55] munsell_0.5.0                 BiocVersion_3.14.0           
#>  [57] tools_4.1.1                   cachem_1.0.6                 
#>  [59] generics_0.1.1                RSQLite_2.2.8                
#>  [61] ExperimentHub_2.2.0           evaluate_0.14                
#>  [63] stringr_1.4.0                 fastmap_1.1.0                
#>  [65] yaml_2.2.1                    knitr_1.36                   
#>  [67] bit64_4.0.5                   purrr_0.3.4                  
#>  [69] KEGGREST_1.34.0               sparseMatrixStats_1.6.0      
#>  [71] mime_0.12                     xml2_1.3.2                   
#>  [73] biomaRt_2.50.0                compiler_4.1.1               
#>  [75] beeswarm_0.4.0                filelock_1.0.2               
#>  [77] curl_4.3.2                    png_0.1-7                    
#>  [79] interactiveDisplayBase_1.32.0 statmod_1.4.36               
#>  [81] tibble_3.1.5                  geneplotter_1.72.0           
#>  [83] bslib_0.3.1                   stringi_1.7.5                
#>  [85] bluster_1.4.0                 lattice_0.20-45              
#>  [87] ProtGenerics_1.26.0           Matrix_1.3-4                 
#>  [89] vctrs_0.3.8                   pillar_1.6.4                 
#>  [91] lifecycle_1.0.1               BiocManager_1.30.16          
#>  [93] jquerylib_0.1.4               BiocNeighbors_1.12.0         
#>  [95] bitops_1.0-7                  irlba_2.3.3                  
#>  [97] httpuv_1.6.3                  rtracklayer_1.54.0           
#>  [99] R6_2.5.1                      BiocIO_1.4.0                 
#> [101] promises_1.2.0.1              gridExtra_2.3                
#> [103] vipor_0.4.5                   assertthat_0.2.1             
#> [105] rjson_0.2.20                  withr_2.4.2                  
#> [107] GenomicAlignments_1.30.0      Rsamtools_2.10.0             
#> [109] GenomeInfoDbData_1.2.7        parallel_4.1.1               
#> [111] hms_1.1.1                     grid_4.1.1                   
#> [113] beachmat_2.10.0               rmarkdown_2.11               
#> [115] DelayedMatrixStats_1.16.0     shiny_1.7.1                  
#> [117] ggbeeswarm_0.6.0              restfulr_0.0.13