stabilize
regress_out
run
model_search
spatial_patterns
model_search
and spatial_patterns
sessionInfo
SpatialDE by Svensson et al., 2018, is a method to identify spatially variable genes (SVGs) in spatially resolved transcriptomics data.
You can install spatialDE from Bioconductor with the following code:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("spatialDE")
Reproducing the example analysis from the original SpatialDE Python package.
library(spatialDE)
library(ggplot2)
Files originally retrieved from SpatialDE GitHub repository from the following links: https://github.com/Teichlab/SpatialDE/blob/master/Analysis/MouseOB/data/Rep11_MOB_0.csv https://github.com/Teichlab/SpatialDE/blob/master/Analysis/MouseOB/MOB_sample_info.csv
# Expression file used in python SpatialDE.
data("Rep11_MOB_0")
# Sample Info file used in python SpatialDE
data("MOB_sample_info")
The Rep11_MOB_0
object contains spatial expression data for
16218 genes on 262 spots, with genes as rows
and spots as columns.
Rep11_MOB_0[1:5, 1:5]
#> 16.92x9.015 16.945x11.075 16.97x10.118 16.939x12.132 16.949x13.055
#> Nrf1 1 0 0 1 0
#> Zbtb5 1 0 1 0 0
#> Ccnl1 1 3 1 1 0
#> Lrrfip1 2 2 0 0 3
#> Bbs1 1 2 0 4 0
dim(Rep11_MOB_0)
#> [1] 16218 262
The MOB_sample_info
object contains a data.frame
with coordinates for each
spot.
head(MOB_sample_info)
x <dbl> | y <dbl> | total_counts <int> | ||
---|---|---|---|---|
16.92x9.015 | 16.920 | 9.015 | 18790 | |
16.945x11.075 | 16.945 | 11.075 | 36990 | |
16.97x10.118 | 16.970 | 10.118 | 12471 | |
16.939x12.132 | 16.939 | 12.132 | 22703 | |
16.949x13.055 | 16.949 | 13.055 | 18641 | |
16.942x15.088 | 16.942 | 15.088 | 23408 |
Rep11_MOB_0 <- Rep11_MOB_0[rowSums(Rep11_MOB_0) >= 3, ]
Rep11_MOB_0 <- Rep11_MOB_0[, row.names(MOB_sample_info)]
MOB_sample_info$total_counts <- colSums(Rep11_MOB_0)
head(MOB_sample_info)
x <dbl> | y <dbl> | total_counts <dbl> | ||
---|---|---|---|---|
16.92x9.015 | 16.920 | 9.015 | 18790 | |
16.945x11.075 | 16.945 | 11.075 | 36990 | |
16.97x10.118 | 16.970 | 10.118 | 12471 | |
16.939x12.132 | 16.939 | 12.132 | 22703 | |
16.949x13.055 | 16.949 | 13.055 | 18641 | |
16.942x15.088 | 16.942 | 15.088 | 23408 |
MOB_sample_info
X <- MOB_sample_info[, c("x", "y")]
head(X)
x <dbl> | y <dbl> | |||
---|---|---|---|---|
16.92x9.015 | 16.920 | 9.015 | ||
16.945x11.075 | 16.945 | 11.075 | ||
16.97x10.118 | 16.970 | 10.118 | ||
16.939x12.132 | 16.939 | 12.132 | ||
16.949x13.055 | 16.949 | 13.055 | ||
16.942x15.088 | 16.942 | 15.088 |
stabilize
The SpatialDE method assumes normally distributed data, so we stabilize the variance of the negative binomial distributed counts data using Anscombe’s approximation.
The stabilize()
function takes as input a data.frame
of expression values with samples in columns and genes in rows. Thus, in this case, we have to transpose the data.
norm_expr <- stabilize(Rep11_MOB_0)
#> + /home/biocbuild/.cache/R/basilisk/1.19.0/0/bin/conda create --yes --prefix /home/biocbuild/.cache/R/basilisk/1.19.0/spatialDE/1.13.0/env 'python=3.10.14' --quiet -c conda-forge --override-channels
#> + /home/biocbuild/.cache/R/basilisk/1.19.0/0/bin/conda install --yes --prefix /home/biocbuild/.cache/R/basilisk/1.19.0/spatialDE/1.13.0/env 'python=3.10.14' -c conda-forge --override-channels
#> + /home/biocbuild/.cache/R/basilisk/1.19.0/0/bin/conda install --yes --prefix /home/biocbuild/.cache/R/basilisk/1.19.0/spatialDE/1.13.0/env -c conda-forge 'python=3.10.14' 'numpy=1.23.5' 'scipy=1.9.3' 'patsy=0.5.3' 'pandas=1.5.2' --override-channels
norm_expr[1:5, 1:5]
#> 16.92x9.015 16.945x11.075 16.97x10.118 16.939x12.132 16.949x13.055
#> Nrf1 1.227749 0.8810934 0.8810934 1.2277491 0.8810934
#> Zbtb5 1.227749 0.8810934 1.2277491 0.8810934 0.8810934
#> Ccnl1 1.227749 1.6889027 1.2277491 1.2277491 0.8810934
#> Lrrfip1 1.484676 1.4846765 0.8810934 0.8810934 1.6889027
#> Bbs1 1.227749 1.4846765 0.8810934 1.8584110 0.8810934
regress_out
Next, we account for differences in library size between the samples by regressing out the effect of the total counts for each gene using linear regression.
resid_expr <- regress_out(norm_expr, sample_info = MOB_sample_info)
resid_expr[1:5, 1:5]
#> 16.92x9.015 16.945x11.075 16.97x10.118 16.939x12.132 16.949x13.055
#> Nrf1 -0.75226761 -1.2352000 -1.0164479 -0.7903289 -1.0973214
#> Zbtb5 0.09242373 -0.3323719 0.1397144 -0.2760560 -0.2533134
#> Ccnl1 -2.77597164 -2.5903783 -2.6092013 -2.8529340 -3.1193883
#> Lrrfip1 -1.92331333 -2.1578718 -2.3849405 -2.5924072 -1.7163300
#> Bbs1 -1.12186064 -1.0266476 -1.3706460 -0.5363646 -1.4666155
run
To reduce running time, the SpatialDE test is run on a subset of 1000 genes. Running it on the complete data set takes about 10 minutes.
# For this example, run spatialDE on the first 1000 genes
sample_resid_expr <- head(resid_expr, 1000)
results <- spatialDE::run(sample_resid_expr, coordinates = X)
head(results[order(results$qval), ])
FSV <dbl> | M <dbl> | g <chr> | l <dbl> | max_delta <dbl> | max_ll <dbl> | max_mu_hat <dbl> | max_s2_t_hat <dbl> | model <chr> | ||
---|---|---|---|---|---|---|---|---|---|---|
752 | 0.5869020 | 4 | Fabp7 | 1.135190 | 0.6872704 | -154.7691 | 5.036316 | 3.6039400 | SE | |
671 | 0.5276876 | 4 | Apc | 1.135190 | 0.8739619 | -146.9456 | -3.922771 | 2.1552117 | SE | |
884 | 0.3651030 | 4 | Kcnh3 | 1.907609 | 1.6225593 | -177.5658 | -8.066532 | 3.7081686 | SE | |
668 | 0.3837007 | 4 | Pcp4 | 1.135190 | 1.5683362 | -127.1354 | -11.866414 | 16.9252387 | SE | |
722 | 0.3345161 | 4 | Penk | 1.135190 | 1.9424986 | -163.0520 | -9.491798 | 10.3640716 | SE | |
725 | 0.3931224 | 4 | Frzb | 1.135190 | 1.5073475 | -217.1284 | 1.180284 | 0.3109513 | SE |
model_search
Finally, we can classify the DE genes to interpetable DE classes using the model_search
function.
We apply the model search on filtered DE results, using a threshold of 0.05 for the Q-values.
de_results <- results[results$qval < 0.05, ]
ms_results <- model_search(
sample_resid_expr,
coordinates = X,
de_results = de_results
)
# To show ms_results sorted on qvalue, uncomment the following line
# head(ms_results[order(ms_results$qval), ])
head(ms_results)
BIC <dbl> | FSV <dbl> | LLR <dbl> | M <dbl> | g <chr> | l <dbl> | max_delta <dbl> | max_ll <dbl> | ||
---|---|---|---|---|---|---|---|---|---|
0 | 273.4206 | 0.06585344 | NaN | 4 | X2900097C17Rik | 5.386796 | 14.481719 | -125.5890 | |
1 | 267.1570 | 0.10520888 | NaN | 4 | Pcp4 | 9.052138 | 8.896112 | -122.4571 | |
2 | 333.8619 | 0.10540242 | NaN | 4 | Rbfox3 | 9.052138 | 8.877856 | -155.8096 | |
3 | 370.2613 | 0.09234754 | NaN | 4 | Sez6 | 9.052138 | 10.280760 | -174.0093 | |
4 | 378.8001 | 0.08317351 | NaN | 4 | Gng13 | 9.052138 | 11.530100 | -178.2787 | |
5 | 394.8754 | 0.10227380 | NaN | 4 | Gng4 | 9.052138 | 9.181433 | -186.3163 |
spatial_patterns
Furthermore, we can group spatially variable genes (SVGs) into spatial patterns using automatic expression histology (AEH).
sp <- spatial_patterns(
sample_resid_expr,
coordinates = X,
de_results = de_results,
n_patterns = 4L, length = 1.5
)
sp$pattern_results
g <chr> | pattern <dbl> | membership <dbl> | ||
---|---|---|---|---|
595 | X2900097C17Rik | 0 | 1 | |
660 | Eef1a1 | 0 | 1 | |
667 | Pmepa1 | 2 | 1 | |
668 | Pcp4 | 1 | 1 | |
671 | Apc | 3 | 1 | |
696 | Rbfox3 | 1 | 1 | |
722 | Penk | 1 | 1 | |
725 | Frzb | 2 | 1 | |
752 | Fabp7 | 2 | 1 | |
791 | Gm13889 | 1 | 1 |
Visualizing one of the most significant genes.
gene <- "Pcp4"
ggplot(data = MOB_sample_info, aes(x = x, y = y, color = norm_expr[gene, ])) +
geom_point(size = 7) +
ggtitle(gene) +
scale_color_viridis_c() +
labs(color = gene)
As an alternative to the previous figure, we can plot multiple genes using the normalized expression values.
ordered_de_results <- de_results[order(de_results$qval), ]
multiGenePlots(norm_expr,
coordinates = X,
ordered_de_results[1:6, ]$g,
point_size = 4,
viridis_option = "D",
dark_theme = FALSE
)
FSV_sig(results, ms_results)
#> Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps
The SpatialDE workflow can also be executed with a SpatialExperiment object as input.
library(SpatialExperiment)
# For SpatialExperiment object, we neeed to transpose the counts matrix in order
# to have genes on rows and spot on columns.
# For this example, run spatialDE on the first 1000 genes
partial_counts <- head(Rep11_MOB_0, 1000)
spe <- SpatialExperiment(
assays = list(counts = partial_counts),
spatialData = DataFrame(MOB_sample_info[, c("x", "y")]),
spatialCoordsNames = c("x", "y")
)
out <- spatialDE(spe, assay_type = "counts", verbose = FALSE)
head(out[order(out$qval), ])
FSV <dbl> | M <dbl> | g <chr> | l <dbl> | max_delta <dbl> | max_ll <dbl> | max_mu_hat <dbl> | max_s2_t_hat <dbl> | model <chr> | ||
---|---|---|---|---|---|---|---|---|---|---|
759 | 0.5853287 | 4 | Fabp7 | 1.135190 | 0.6917423 | -157.8641 | 4.805770 | 3.2961235 | SE | |
684 | 0.5215592 | 4 | Apc | 1.135190 | 0.8957046 | -152.4253 | -2.852122 | 1.1971964 | SE | |
886 | 0.3425473 | 4 | Kcnh3 | 1.907609 | 1.7908397 | -201.1959 | -7.162902 | 2.9230555 | SE | |
681 | 0.3671680 | 4 | Pcp4 | 1.135190 | 1.6829213 | -124.8416 | -9.209852 | 10.0688118 | SE | |
735 | 0.3820350 | 4 | Frzb | 1.135190 | 1.5794319 | -239.8355 | 1.156500 | 0.3240445 | SE | |
832 | 0.2378540 | 4 | Gng13 | 1.907609 | 2.9897888 | -204.1019 | 3.648425 | 0.7506023 | SE |
We can plot spatial patterns of multiples genes using the spe
object.
spe_results <- out[out$qval < 0.05, ]
ordered_spe_results <- spe_results[order(spe_results$qval), ]
multiGenePlots(spe,
assay_type = "counts",
ordered_spe_results[1:6, ]$g,
point_size = 4,
viridis_option = "D",
dark_theme = FALSE
)
model_search
and spatial_patterns
msearch <- modelSearch(spe,
de_results = out, qval_thresh = 0.05,
verbose = FALSE
)
head(msearch)
BIC <dbl> | FSV <dbl> | LLR <dbl> | M <dbl> | g <chr> | l <dbl> | max_delta <dbl> | max_ll <dbl> | ||
---|---|---|---|---|---|---|---|---|---|
0 | 287.6679 | 0.06406448 | NaN | 4 | X2900097C17Rik | 5.386796 | 14.914618 | -132.7126 | |
1 | 263.5043 | 0.09590705 | NaN | 4 | Pcp4 | 9.052138 | 9.860377 | -120.6308 | |
2 | 370.8671 | 0.09681127 | NaN | 4 | Rbfox3 | 9.052138 | 9.758510 | -174.3122 | |
3 | 428.9359 | 0.08063297 | NaN | 4 | Gng13 | 9.052138 | 11.926341 | -203.3466 | |
4 | 401.7737 | 0.14867725 | NaN | 4 | Kcnh3 | 9.052138 | 5.989364 | -189.7655 | |
5 | 479.4804 | 0.36943862 | 14.01272 | 4 | Pmepa1 | 1.135190 | 1.666576 | -228.6188 |
spatterns <- spatialPatterns(spe,
de_results = de_results, qval_thresh = 0.05,
n_patterns = 4L, length = 1.5, verbose = FALSE
)
spatterns$pattern_results
g <chr> | pattern <dbl> | membership <dbl> | ||
---|---|---|---|---|
595 | X2900097C17Rik | 2 | 1.0000000 | |
660 | Eef1a1 | 3 | 1.0000000 | |
667 | Pmepa1 | 1 | 1.0000000 | |
668 | Pcp4 | 0 | 1.0000000 | |
671 | Apc | 3 | 1.0000000 | |
696 | Rbfox3 | 0 | 1.0000000 | |
722 | Penk | 0 | 1.0000000 | |
725 | Frzb | 1 | 1.0000000 | |
752 | Fabp7 | 1 | 1.0000000 | |
791 | Gm13889 | 3 | 0.9999996 |
sessionInfo
#> [1] "2024-10-29 19:41:45 EDT"
#> R Under development (unstable) (2024-10-21 r87258)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] SpatialExperiment_1.17.0 SingleCellExperiment_1.29.0
#> [3] SummarizedExperiment_1.37.0 Biobase_2.67.0
#> [5] GenomicRanges_1.59.0 GenomeInfoDb_1.43.0
#> [7] IRanges_2.41.0 S4Vectors_0.45.0
#> [9] BiocGenerics_0.53.0 MatrixGenerics_1.19.0
#> [11] matrixStats_1.4.1 ggplot2_3.5.1
#> [13] spatialDE_1.13.0 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 dir.expiry_1.15.0 rjson_0.2.23
#> [4] xfun_0.48 bslib_0.8.0 ggrepel_0.9.6
#> [7] lattice_0.22-6 vctrs_0.6.5 tools_4.5.0
#> [10] generics_0.1.3 parallel_4.5.0 tibble_3.2.1
#> [13] fansi_1.0.6 highr_0.11 pkgconfig_2.0.3
#> [16] Matrix_1.7-1 checkmate_2.3.2 lifecycle_1.0.4
#> [19] GenomeInfoDbData_1.2.13 farver_2.1.2 compiler_4.5.0
#> [22] tinytex_0.53 munsell_0.5.1 htmltools_0.5.8.1
#> [25] sass_0.4.9 yaml_2.3.10 pillar_1.9.0
#> [28] crayon_1.5.3 jquerylib_0.1.4 cachem_1.1.0
#> [31] DelayedArray_0.33.0 magick_2.8.5 abind_1.4-8
#> [34] basilisk_1.19.0 tidyselect_1.2.1 digest_0.6.37
#> [37] dplyr_1.1.4 bookdown_0.41 labeling_0.4.3
#> [40] fastmap_1.2.0 grid_4.5.0 colorspace_2.1-1
#> [43] cli_3.6.3 SparseArray_1.7.0 magrittr_2.0.3
#> [46] S4Arrays_1.7.0 utf8_1.2.4 withr_3.0.2
#> [49] backports_1.5.0 filelock_1.0.3 scales_1.3.0
#> [52] UCSC.utils_1.3.0 rmarkdown_2.28 XVector_0.47.0
#> [55] httr_1.4.7 gridExtra_2.3 reticulate_1.39.0
#> [58] png_0.1-8 evaluate_1.0.1 knitr_1.48
#> [61] basilisk.utils_1.19.0 viridisLite_0.4.2 rlang_1.1.4
#> [64] Rcpp_1.0.13 glue_1.8.0 BiocManager_1.30.25
#> [67] jsonlite_1.8.9 R6_2.5.1 zlibbioc_1.53.0