scater 1.8.4
This document provides some examples of the data visualisation functions available in scater.
plotExpression
: plot cell expression levels for one or more genes;plotReducedDim
: plot (and/or calculate) reduced dimension coordinates;We will demonstrate on some example data generated below:
library(scater)
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- normalize(example_sce)
example_sce
## class: SingleCellExperiment
## dim: 2000 40
## metadata(1): log.exprs.offset
## assays(2): counts logcounts
## rownames(2000): Gene_0001 Gene_0002 ... Gene_1999 Gene_2000
## rowData names(0):
## colnames(40): Cell_001 Cell_002 ... Cell_039 Cell_040
## colData names(4): Cell Mutation_Status Cell_Cycle Treatment
## reducedDimNames(0):
## spikeNames(0):
The plotExpression
function makes it easy to plot expression values for a subset of genes or features.
This can be particularly useful for further examination of features identified from differential expression testing, pseudotime analysis or other analyses.
By default, it uses expression values in the "logcounts"
assay, but this can be changed through the exprs_values
argument.
plotExpression(example_sce, rownames(example_sce)[1:6],
x = "Mutation_Status", exprs_values = "logcounts")
Setting x
will determine the covariate to be shown on the x-axis.
This can be a field in the column metadata or the name of a feature (to obtain the expression profile across cells).
Categorical covariates will yield grouped violins as shown above, with one panel per feature.
By comparison, continuous covariates will generate a scatter plot in each panel, as shown below.
plotExpression(example_sce, rownames(example_sce)[1:6],
x = "Gene_0001")
The points can also be coloured, shaped or resized by the column metadata or expression values.
plotExpression(example_sce, rownames(example_sce)[1:6],
colour_by = "Cell_Cycle", shape_by = "Mutation_Status",
size_by = "Gene_0002")
For categorical x
, we can also show the median expression level per group on the plot to summarise the distribution of expression values:
plotExpression(example_sce, rownames(example_sce)[7:12],
x = "Mutation_Status", exprs_values = "counts",
colour = "Cell_Cycle", show_median = TRUE,
xlab = "Mutation Status", log = TRUE)
Directly plotting the gene expression without any x
or other visual parameters will generate a set of grouped violin plots, coloured in an aesthetically pleasing manner.
plotExpression(example_sce, rownames(example_sce)[1:6])
reducedDims
slotThe SingleCellExperiment
object has a reducedDims
slot, where coordinates for reduced dimension representations of the cells can be stored.
These can be accessed using the reducedDim
and reducedDims
functions, which are described in more detail in the SingleCellExperiment documentation.
In the code below, we perform a principal components analysis (PCA) and store the results in the "PCA"
slot.
example_sce <- runPCA(example_sce)
reducedDimNames(example_sce)
## [1] "PCA"
Any reduced dimension results can be plotted using the plotReducedDim
function:
plotReducedDim(example_sce, use_dimred = "PCA",
colour_by = "Treatment", shape_by = "Mutation_Status")
We can also colour and size points by the expression of particular features:
plotReducedDim(example_sce, use_dimred = "PCA",
colour_by = "Gene_1000", size_by = "Gene_0500")
The plotPCA
function makes it easy to produce a PCA plot directly from a SingleCellExperiment
object, which is useful for visualising the relationships between cells.
The default plot shows the first two principal components, if "PCA"
is already in the reducedDims
slot.
plotPCA(example_sce)
If pre-existing "PCA"
results are not present, the function will automatically call runPCA
to generate the results prior to plotting.
However, it may be preferable for users to call runPCA
manually if multiple plots are to be generated from the same results.
This avoids re-calculation of the reduced dimension results, which can be time-consuming for very large data sets.
By default, runPCA
performs PCA on the log-counts using the 500 features with the most variable expression across all cells.
The number of most-variable features used can be changed with the ntop
argument.
Alternatively, a specific set of features to use for PCA can be defined with the feature_set
argument.
This is demonstrated with the feature controls below, to identify technical factors of variation:.
example_sce2 <- runPCA(example_sce,
feature_set = rowData(example_sce)$is_feature_control)
plotPCA(example_sce2)
Multiple components can be plotted in a series of pairwise plots. When more than two components are plotted, the diagonal boxes in the scatter plot matrix show the density for each component.
example_sce <- runPCA(example_sce, ncomponents=20)
plotPCA(example_sce, ncomponents = 4, colour_by = "Treatment",
shape_by = "Mutation_Status")
As shown above, various metadata variables can be used to define the colour, shape and size of points in the scatter plot. We can also use the colour and size of point in the plot to reflect feature expression values.
plotPCA(example_sce, colour_by = "Gene_0001", size_by = "Gene_1000")
t-distributed stochastic neighbour embedding (t-SNE) is widely used for visualizing complex single-cell data sets.
The same procedure described for PCA plots can be applied to generate t-SNE plots using plotTSNE
, with coordinates obtained using runTSNE
via the Rtsne package.
We strongly recommend generating plots with different random seeds and perplexity values, to ensure that any conclusions are robust to different visualizations.
# Perplexity of 10 just chosen here arbitrarily.
example_sce <- runTSNE(example_sce, perplexity=10, rand_seed=1000)
plotTSNE(example_sce, colour_by = "Gene_0001", size_by = "Gene_1000")
It is also possible to use the pre-existing PCA results as input into the t-SNE algorithm. This is useful as it improves speed by using a low-rank approximation of the expression matrix; and reduces random noise, by focusing on the major factors of variation. The code below uses the first 10 principal components to perform the t-SNE.
example_sce <- runTSNE(example_sce, perplexity=10, rand_seed=1000,
use_dimred="PCA", n_dimred = 10)
plotTSNE(example_sce, colour_by="Treatment")
Users can force plotTSNE
to call runTSNE
by setting rerun=TRUE
, even when "TSNE"
already exists in the input SingleCellExperiment
object.
Users can also pass parameters for runTSNE
directly to plotTSNE
via the run.args
argument.
The same applies for the other plot*
and run*
arguments.
Again, the same can be done for diffusion maps using plotDiffusionMap
, with coordinates obtained using runDiffusionMap
via the destiny package.
example_sce <- runDiffusionMap(example_sce)
plotDiffusionMap(example_sce, colour_by = "Gene_0001", size_by = "Gene_1000")
sessionInfo()
## R version 3.5.1 Patched (2018-07-12 r74967)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.7-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.7-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] scater_1.8.4 SingleCellExperiment_1.2.0
## [3] SummarizedExperiment_1.10.1 DelayedArray_0.6.4
## [5] BiocParallel_1.14.2 matrixStats_0.54.0
## [7] GenomicRanges_1.32.6 GenomeInfoDb_1.16.0
## [9] IRanges_2.14.10 S4Vectors_0.18.3
## [11] Biobase_2.40.0 BiocGenerics_0.26.0
## [13] ggplot2_3.0.0 knitr_1.20
## [15] BiocStyle_2.8.2
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.13 ggbeeswarm_0.6.0
## [3] colorspace_1.3-2 RcppEigen_0.3.3.4.0
## [5] rjson_0.2.20 class_7.3-14
## [7] rio_0.5.10 rprojroot_1.3-2
## [9] XVector_0.20.0 proxy_0.4-22
## [11] tximport_1.8.0 robustbase_0.93-2
## [13] shinydashboard_0.7.0 shiny_1.1.0
## [15] compiler_3.5.1 backports_1.1.2
## [17] assertthat_0.2.0 Matrix_1.2-14
## [19] lazyeval_0.2.1 limma_3.36.2
## [21] later_0.7.3 htmltools_0.3.6
## [23] tools_3.5.1 bindrcpp_0.2.2
## [25] igraph_1.2.2 gtable_0.2.0
## [27] glue_1.3.0 GenomeInfoDbData_1.1.0
## [29] reshape2_1.4.3 dplyr_0.7.6
## [31] ggthemes_4.0.0 Rcpp_0.12.18
## [33] carData_3.0-1 cellranger_1.1.0
## [35] DelayedMatrixStats_1.2.0 lmtest_0.9-36
## [37] xfun_0.3 laeken_0.4.6
## [39] stringr_1.3.1 openxlsx_4.1.0
## [41] mime_0.5 edgeR_3.22.3
## [43] DEoptimR_1.0-8 zlibbioc_1.26.0
## [45] MASS_7.3-50 zoo_1.8-3
## [47] scales_1.0.0 VIM_4.7.0
## [49] hms_0.4.2 promises_1.0.1
## [51] rhdf5_2.24.0 yaml_2.2.0
## [53] curl_3.2 gridExtra_2.3
## [55] stringi_1.2.4 e1071_1.7-0
## [57] destiny_2.10.2 TTR_0.23-3
## [59] boot_1.3-20 zip_1.0.0
## [61] rlang_0.2.1 pkgconfig_2.0.1
## [63] bitops_1.0-6 evaluate_0.11
## [65] lattice_0.20-35 purrr_0.2.5
## [67] Rhdf5lib_1.2.1 bindr_0.1.1
## [69] labeling_0.3 cowplot_0.9.3
## [71] tidyselect_0.2.4 plyr_1.8.4
## [73] magrittr_1.5 bookdown_0.7
## [75] R6_2.2.2 pillar_1.3.0
## [77] haven_1.1.2 foreign_0.8-71
## [79] withr_2.1.2 xts_0.11-0
## [81] scatterplot3d_0.3-41 abind_1.4-5
## [83] RCurl_1.95-4.11 sp_1.3-1
## [85] nnet_7.3-12 tibble_1.4.2
## [87] crayon_1.3.4 car_3.0-0
## [89] rmarkdown_1.10 viridis_0.5.1
## [91] locfit_1.5-9.1 grid_3.5.1
## [93] readxl_1.1.0 data.table_1.11.4
## [95] forcats_0.3.0 vcd_1.4-4
## [97] digest_0.6.15 xtable_1.8-2
## [99] httpuv_1.4.5 munsell_0.5.0
## [101] beeswarm_0.2.3 viridisLite_0.3.0
## [103] smoother_1.1 vipor_0.4.5