velociraptor 1.4.0
This package provides a lightweight interface between the Bioconductor SingleCellExperiment
data structure
and the scvelo Python package for RNA velocity calculations.
The interface is comparable to that of many other SingleCellExperiment
-compatible functions,
allowing users to plug in RNA velocity calculations into the existing Bioconductor analysis framework.
To demonstrate, we will use a data set from Hermann et al. (2018), provided via the scRNAseq package.
This data set contains gene-wise estimates of spliced and unspliced UMI counts for 2,325 mouse spermatogenic cells.
library(scRNAseq)
sce <- HermannSpermatogenesisData()
sce
## class: SingleCellExperiment
## dim: 54448 2325
## metadata(0):
## assays(2): spliced unspliced
## rownames(54448): ENSMUSG00000102693.1 ENSMUSG00000064842.1 ...
## ENSMUSG00000064369.1 ENSMUSG00000064372.1
## rowData names(0):
## colnames(2325): CCCATACTCCGAAGAG AATCCAGTCATCTGCC ... ATCCACCCACCACCAG
## ATTGGTGGTTACCGAT
## colData names(1): celltype
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
The full data set requires up to 12 GB of memory for the example usage presented in this vignette. For demonstration purposes, we downsample the data set to the first 500 cells. Feel free to skip this downsampling step if you have access to sufficient memory.
sce <- sce[, 1:500]
We assume that feature selection has already been performed by the user using any method (see here for some suggestions). In this case, we will use the variance of log-expressions from scran to select the top 2000 genes.
library(scuttle)
sce <- logNormCounts(sce, assay.type=1)
library(scran)
dec <- modelGeneVar(sce)
top.hvgs <- getTopHVGs(dec, n=2000)
We can plug these choices into the scvelo()
function with our SingleCellExperiment
object.
By default, scvelo()
uses the steady-state approach to estimate velocities,
though the stochastic and dynamical models implemented in scvelo can also be used by modifying the mode
argument.
library(velociraptor)
velo.out <- scvelo(sce, subset.row=top.hvgs, assay.X="spliced")
velo.out
## class: SingleCellExperiment
## dim: 2000 500
## metadata(4): neighbors velocity_params velocity_graph
## velocity_graph_neg
## assays(6): X spliced ... Mu velocity
## rownames(2000): ENSMUSG00000117819.1 ENSMUSG00000081984.3 ...
## ENSMUSG00000022965.8 ENSMUSG00000094660.2
## rowData names(3): velocity_gamma velocity_r2 velocity_genes
## colnames(500): CCCATACTCCGAAGAG AATCCAGTCATCTGCC ... CACCTTGTCGTAGGAG
## TTCCCAGAGACTAAGT
## colData names(7): velocity_self_transition root_cells ...
## velocity_confidence velocity_confidence_transition
## reducedDimNames(1): X_pca
## mainExpName: NULL
## altExpNames(0):
In the above call, we use the "spliced"
count matrix as a proxy for the typical exonic count matrix.
Technically, the latter is not required for the velocity estimation, but scvelo needs to perform a PCA and nearest neighbors search,
and we want to ensure that the neighbors detected inside the function are consistent with the rest of the analysis workflow (performed on the exonic counts).
There are some subtle differences between the spliced count matrix and the typical exonic count matrix - see ?scvelo
for some commentary about this -
but the spliced counts are generally a satisfactory replacement if the latter is not available.
The scvelo()
function produces a SingleCellExperiment
containing all of the outputs of the calculation in Python.
Of particular interest is the velocity_pseudotime
vector that captures the relative progression of each cell
along the biological process driving the velocity vectors.
We can visualize this effect below in a \(t\)-SNE plot generated by scater on the top HVGs.
library(scater)
set.seed(100)
sce <- runPCA(sce, subset_row=top.hvgs)
sce <- runTSNE(sce, dimred="PCA")
sce$velocity_pseudotime <- velo.out$velocity_pseudotime
plotTSNE(sce, colour_by="velocity_pseudotime")
It is also straightforward to embed the velocity vectors into our desired low-dimensional space, as shown below for the \(t\)-SNE coordinates. This uses a grid-based approach to summarize the per-cell vectors into local representatives for effective visualization.
embedded <- embedVelocity(reducedDim(sce, "TSNE"), velo.out)
grid.df <- gridVectors(reducedDim(sce, "TSNE"), embedded)
library(ggplot2)
plotTSNE(sce, colour_by="velocity_pseudotime") +
geom_segment(data=grid.df, mapping=aes(x=start.1, y=start.2,
xend=end.1, yend=end.2), arrow=arrow(length=unit(0.05, "inches")))
And that’s it, really.
scvelo()
interally performs a PCA step that we can bypass by supplying our own PC coordinates.
Indeed, it is often the case that we have already performed PCA in the earlier analysis steps,
so we can just re-use those results to (i) save time and (ii) improve consistency with the other steps.
Here, we computed the PCA coordinates in runPCA()
above, so let’s just recycle that:
# Only setting assay.X= for the initial AnnData creation,
# it is not actually used in any further steps.
velo.out2 <- scvelo(sce, assay.X=1, subset.row=top.hvgs, use.dimred="PCA")
velo.out2
## class: SingleCellExperiment
## dim: 2000 500
## metadata(4): neighbors velocity_params velocity_graph
## velocity_graph_neg
## assays(6): X spliced ... Mu velocity
## rownames(2000): ENSMUSG00000117819.1 ENSMUSG00000081984.3 ...
## ENSMUSG00000022965.8 ENSMUSG00000094660.2
## rowData names(3): velocity_gamma velocity_r2 velocity_genes
## colnames(500): CCCATACTCCGAAGAG AATCCAGTCATCTGCC ... CACCTTGTCGTAGGAG
## TTCCCAGAGACTAAGT
## colData names(7): velocity_self_transition root_cells ...
## velocity_confidence velocity_confidence_transition
## reducedDimNames(1): X_pca
## mainExpName: NULL
## altExpNames(0):
We also provide an option to use the scvelo pipeline without modification,
i.e., relying on their normalization and feature selection.
This sacrifices consistency with other Bioconductor workflows but enables perfect mimicry of a pure Python-based analysis.
In this case, arguments like subset.row=
are simply ignored.
velo.out3 <- scvelo(sce, assay.X=1, use.theirs=TRUE)
velo.out3
## class: SingleCellExperiment
## dim: 54448 500
## metadata(5): pca neighbors velocity_params velocity_graph
## velocity_graph_neg
## assays(6): X spliced ... Mu velocity
## rownames(54448): ENSMUSG00000102693.1 ENSMUSG00000064842.1 ...
## ENSMUSG00000064369.1 ENSMUSG00000064372.1
## rowData names(4): velocity_gamma velocity_r2 velocity_genes varm
## colnames(500): CCCATACTCCGAAGAG AATCCAGTCATCTGCC ... CACCTTGTCGTAGGAG
## TTCCCAGAGACTAAGT
## colData names(11): initial_size_spliced initial_size_unspliced ...
## velocity_confidence velocity_confidence_transition
## reducedDimNames(1): X_pca
## mainExpName: NULL
## altExpNames(0):
Advanced users can tinker with the settings of individual scvelo steps
by setting named lists of arguments in the scvelo.params=
argument.
For example, to tinker with the behavior of the recover_dynamics
step, we could do:
velo.out4 <- scvelo(sce, assay.X=1, subset.row=top.hvgs,
scvelo.params=list(recover_dynamics=list(max_iter=20)))
velo.out4
## class: SingleCellExperiment
## dim: 2000 500
## metadata(4): neighbors velocity_params velocity_graph
## velocity_graph_neg
## assays(6): X spliced ... Mu velocity
## rownames(2000): ENSMUSG00000117819.1 ENSMUSG00000081984.3 ...
## ENSMUSG00000022965.8 ENSMUSG00000094660.2
## rowData names(3): velocity_gamma velocity_r2 velocity_genes
## colnames(500): CCCATACTCCGAAGAG AATCCAGTCATCTGCC ... CACCTTGTCGTAGGAG
## TTCCCAGAGACTAAGT
## colData names(7): velocity_self_transition root_cells ...
## velocity_confidence velocity_confidence_transition
## reducedDimNames(1): X_pca
## mainExpName: NULL
## altExpNames(0):
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scater_1.22.0 ggplot2_3.3.5
## [3] velociraptor_1.4.0 scran_1.22.0
## [5] scuttle_1.4.0 scRNAseq_2.7.2
## [7] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
## [9] Biobase_2.54.0 GenomicRanges_1.46.0
## [11] GenomeInfoDb_1.30.0 IRanges_2.28.0
## [13] S4Vectors_0.32.0 BiocGenerics_0.40.0
## [15] MatrixGenerics_1.6.0 matrixStats_0.61.0
## [17] knitr_1.36 BiocStyle_2.22.0
##
## loaded via a namespace (and not attached):
## [1] AnnotationHub_3.2.0 BiocFileCache_2.2.0
## [3] igraph_1.2.7 lazyeval_0.2.2
## [5] BiocParallel_1.28.0 digest_0.6.28
## [7] ensembldb_2.18.0 htmltools_0.5.2
## [9] magick_2.7.3 viridis_0.6.2
## [11] fansi_0.5.0 magrittr_2.0.1
## [13] memoise_2.0.0 ScaledMatrix_1.2.0
## [15] cluster_2.1.2 limma_3.50.0
## [17] Biostrings_2.62.0 prettyunits_1.1.1
## [19] colorspace_2.0-2 ggrepel_0.9.1
## [21] blob_1.2.2 rappdirs_0.3.3
## [23] xfun_0.27 dplyr_1.0.7
## [25] crayon_1.4.1 RCurl_1.98-1.5
## [27] jsonlite_1.7.2 glue_1.4.2
## [29] gtable_0.3.0 zlibbioc_1.40.0
## [31] XVector_0.34.0 DelayedArray_0.20.0
## [33] BiocSingular_1.10.0 scales_1.1.1
## [35] DBI_1.1.1 edgeR_3.36.0
## [37] Rcpp_1.0.7 viridisLite_0.4.0
## [39] xtable_1.8-4 progress_1.2.2
## [41] reticulate_1.22 dqrng_0.3.0
## [43] bit_4.0.4 rsvd_1.0.5
## [45] metapod_1.2.0 httr_1.4.2
## [47] dir.expiry_1.2.0 ellipsis_0.3.2
## [49] farver_2.1.0 pkgconfig_2.0.3
## [51] XML_3.99-0.8 sass_0.4.0
## [53] dbplyr_2.1.1 locfit_1.5-9.4
## [55] utf8_1.2.2 here_1.0.1
## [57] labeling_0.4.2 tidyselect_1.1.1
## [59] rlang_0.4.12 later_1.3.0
## [61] AnnotationDbi_1.56.0 munsell_0.5.0
## [63] BiocVersion_3.14.0 tools_4.1.1
## [65] cachem_1.0.6 cli_3.0.1
## [67] generics_0.1.1 RSQLite_2.2.8
## [69] ExperimentHub_2.2.0 evaluate_0.14
## [71] stringr_1.4.0 fastmap_1.1.0
## [73] yaml_2.2.1 bit64_4.0.5
## [75] purrr_0.3.4 KEGGREST_1.34.0
## [77] AnnotationFilter_1.18.0 sparseMatrixStats_1.6.0
## [79] mime_0.12 xml2_1.3.2
## [81] biomaRt_2.50.0 compiler_4.1.1
## [83] beeswarm_0.4.0 filelock_1.0.2
## [85] curl_4.3.2 png_0.1-7
## [87] interactiveDisplayBase_1.32.0 tibble_3.1.5
## [89] statmod_1.4.36 bslib_0.3.1
## [91] stringi_1.7.5 highr_0.9
## [93] basilisk.utils_1.6.0 GenomicFeatures_1.46.0
## [95] lattice_0.20-45 bluster_1.4.0
## [97] ProtGenerics_1.26.0 Matrix_1.3-4
## [99] vctrs_0.3.8 pillar_1.6.4
## [101] lifecycle_1.0.1 BiocManager_1.30.16
## [103] jquerylib_0.1.4 BiocNeighbors_1.12.0
## [105] cowplot_1.1.1 bitops_1.0-7
## [107] irlba_2.3.3 httpuv_1.6.3
## [109] rtracklayer_1.54.0 R6_2.5.1
## [111] BiocIO_1.4.0 bookdown_0.24
## [113] promises_1.2.0.1 gridExtra_2.3
## [115] vipor_0.4.5 zellkonverter_1.4.0
## [117] assertthat_0.2.1 rprojroot_2.0.2
## [119] rjson_0.2.20 withr_2.4.2
## [121] GenomicAlignments_1.30.0 Rsamtools_2.10.0
## [123] GenomeInfoDbData_1.2.7 parallel_4.1.1
## [125] hms_1.1.1 grid_4.1.1
## [127] beachmat_2.10.0 basilisk_1.6.0
## [129] rmarkdown_2.11 DelayedMatrixStats_1.16.0
## [131] Rtsne_0.15 shiny_1.7.1
## [133] ggbeeswarm_0.6.0 restfulr_0.0.13
Hermann, Brian P, Keren Cheng, Anukriti Singh, Lorena Roa-De La Cruz, Kazadi N Mutoji, I-Chung Chen, Heidi Gildersleeve, et al. 2018. “The Mammalian Spermatogenesis Single-Cell Transcriptome, from Spermatogonial Stem Cells to Spermatids.” Cell Rep. 25 (6): 1650–1667.e8.