• 1 Introduction
  • 2 Installation
  • 3 Example: Mouse Olfactory Bulb
    • 3.1 Load data
    • 3.2 stabilize
    • 3.3 regress_out
    • 3.4 run
    • 3.5 model_search
    • 3.6 spatial_patterns
    • 3.7 Plots
    • 3.8 Plot Fraction Spatial Variance vs Q-value
  • 4 SpatialExperiment integration
    • 4.1 Plot Spatial Patterns of Multiple Genes (using SpatialExperiment object)
    • 4.2 Classify spatially variable genes with model_search and spatial_patterns
  • sessionInfo

1 Introduction

SpatialDE by Svensson et al., 2018, is a method to identify spatially variable genes (SVGs) in spatially resolved transcriptomics data.

2 Installation

You can install spatialDE from Bioconductor with the following code:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("spatialDE")

3 Example: Mouse Olfactory Bulb

Reproducing the example analysis from the original SpatialDE Python package.

library(spatialDE)
library(ggplot2)

3.1 Load data

Files originally retrieved from SpatialDE GitHub repository from the following links: https://github.com/Teichlab/SpatialDE/blob/master/Analysis/MouseOB/data/Rep11_MOB_0.csv https://github.com/Teichlab/SpatialDE/blob/master/Analysis/MouseOB/MOB_sample_info.csv

# Expression file used in python SpatialDE. 
data("Rep11_MOB_0")

# Sample Info file used in python SpatialDE
data("MOB_sample_info")

The Rep11_MOB_0 object contains spatial expression data for 16218 genes on 262 spots, with genes as rows and spots as columns.

Rep11_MOB_0[1:5, 1:5]
#>         16.92x9.015 16.945x11.075 16.97x10.118 16.939x12.132 16.949x13.055
#> Nrf1              1             0            0             1             0
#> Zbtb5             1             0            1             0             0
#> Ccnl1             1             3            1             1             0
#> Lrrfip1           2             2            0             0             3
#> Bbs1              1             2            0             4             0
dim(Rep11_MOB_0)
#> [1] 16218   262

The MOB_sample_info object contains a data.frame with coordinates for each spot.

head(MOB_sample_info)
ABCDEFGHIJ0123456789
 
 
x
<dbl>
y
<dbl>
total_counts
<int>
16.92x9.01516.9209.01518790
16.945x11.07516.94511.07536990
16.97x10.11816.97010.11812471
16.939x12.13216.93912.13222703
16.949x13.05516.94913.05518641
16.942x15.08816.94215.08823408

3.1.1 Filter out pratically unobserved genes

Rep11_MOB_0 <- Rep11_MOB_0[rowSums(Rep11_MOB_0) >= 3, ]

3.1.2 Get total_counts for every spot

Rep11_MOB_0 <- Rep11_MOB_0[, row.names(MOB_sample_info)]
MOB_sample_info$total_counts <- colSums(Rep11_MOB_0)
head(MOB_sample_info)
ABCDEFGHIJ0123456789
 
 
x
<dbl>
y
<dbl>
total_counts
<dbl>
16.92x9.01516.9209.01518790
16.945x11.07516.94511.07536990
16.97x10.11816.97010.11812471
16.939x12.13216.93912.13222703
16.949x13.05516.94913.05518641
16.942x15.08816.94215.08823408

3.1.3 Get coordinates from MOB_sample_info

X <- MOB_sample_info[, c("x", "y")]
head(X)
ABCDEFGHIJ0123456789
 
 
x
<dbl>
y
<dbl>
16.92x9.01516.9209.015
16.945x11.07516.94511.075
16.97x10.11816.97010.118
16.939x12.13216.93912.132
16.949x13.05516.94913.055
16.942x15.08816.94215.088

3.2 stabilize

The SpatialDE method assumes normally distributed data, so we stabilize the variance of the negative binomial distributed counts data using Anscombe’s approximation. The stabilize() function takes as input a data.frame of expression values with samples in columns and genes in rows. Thus, in this case, we have to transpose the data.

norm_expr <- stabilize(Rep11_MOB_0)
#> + /home/biocbuild/.cache/R/basilisk/1.19.0/0/bin/conda create --yes --prefix /home/biocbuild/.cache/R/basilisk/1.19.0/spatialDE/1.13.0/env 'python=3.10.14' --quiet -c conda-forge --override-channels
#> + /home/biocbuild/.cache/R/basilisk/1.19.0/0/bin/conda install --yes --prefix /home/biocbuild/.cache/R/basilisk/1.19.0/spatialDE/1.13.0/env 'python=3.10.14' -c conda-forge --override-channels
#> + /home/biocbuild/.cache/R/basilisk/1.19.0/0/bin/conda install --yes --prefix /home/biocbuild/.cache/R/basilisk/1.19.0/spatialDE/1.13.0/env -c conda-forge 'python=3.10.14' 'numpy=1.23.5' 'scipy=1.9.3' 'patsy=0.5.3' 'pandas=1.5.2' --override-channels
norm_expr[1:5, 1:5]
#>         16.92x9.015 16.945x11.075 16.97x10.118 16.939x12.132 16.949x13.055
#> Nrf1       1.227749     0.8810934    0.8810934     1.2277491     0.8810934
#> Zbtb5      1.227749     0.8810934    1.2277491     0.8810934     0.8810934
#> Ccnl1      1.227749     1.6889027    1.2277491     1.2277491     0.8810934
#> Lrrfip1    1.484676     1.4846765    0.8810934     0.8810934     1.6889027
#> Bbs1       1.227749     1.4846765    0.8810934     1.8584110     0.8810934

3.3 regress_out

Next, we account for differences in library size between the samples by regressing out the effect of the total counts for each gene using linear regression.

resid_expr <- regress_out(norm_expr, sample_info = MOB_sample_info)
resid_expr[1:5, 1:5]
#>         16.92x9.015 16.945x11.075 16.97x10.118 16.939x12.132 16.949x13.055
#> Nrf1    -0.75226761    -1.2352000   -1.0164479    -0.7903289    -1.0973214
#> Zbtb5    0.09242373    -0.3323719    0.1397144    -0.2760560    -0.2533134
#> Ccnl1   -2.77597164    -2.5903783   -2.6092013    -2.8529340    -3.1193883
#> Lrrfip1 -1.92331333    -2.1578718   -2.3849405    -2.5924072    -1.7163300
#> Bbs1    -1.12186064    -1.0266476   -1.3706460    -0.5363646    -1.4666155

3.4 run

To reduce running time, the SpatialDE test is run on a subset of 1000 genes. Running it on the complete data set takes about 10 minutes.

# For this example, run spatialDE on the first 1000 genes
sample_resid_expr <- head(resid_expr, 1000)

results <- spatialDE::run(sample_resid_expr, coordinates = X)
head(results[order(results$qval), ])
ABCDEFGHIJ0123456789
 
 
FSV
<dbl>
M
<dbl>
g
<chr>
l
<dbl>
max_delta
<dbl>
max_ll
<dbl>
max_mu_hat
<dbl>
max_s2_t_hat
<dbl>
model
<chr>
7520.58690204Fabp71.1351900.6872704-154.76915.0363163.6039400SE
6710.52768764Apc1.1351900.8739619-146.9456-3.9227712.1552117SE
8840.36510304Kcnh31.9076091.6225593-177.5658-8.0665323.7081686SE
6680.38370074Pcp41.1351901.5683362-127.1354-11.86641416.9252387SE
7220.33451614Penk1.1351901.9424986-163.0520-9.49179810.3640716SE
7250.39312244Frzb1.1351901.5073475-217.12841.1802840.3109513SE

3.6 spatial_patterns

Furthermore, we can group spatially variable genes (SVGs) into spatial patterns using automatic expression histology (AEH).

sp <- spatial_patterns(
    sample_resid_expr,
    coordinates = X,
    de_results = de_results,
    n_patterns = 4L, length = 1.5
)
sp$pattern_results
ABCDEFGHIJ0123456789
 
 
g
<chr>
pattern
<dbl>
membership
<dbl>
595X2900097C17Rik01
660Eef1a101
667Pmepa121
668Pcp411
671Apc31
696Rbfox311
722Penk11
725Frzb21
752Fabp721
791Gm1388911

3.7 Plots

Visualizing one of the most significant genes.

gene <- "Pcp4"

ggplot(data = MOB_sample_info, aes(x = x, y = y, color = norm_expr[gene, ])) +
    geom_point(size = 7) +
    ggtitle(gene) +
    scale_color_viridis_c() +
    labs(color = gene)

3.7.1 Plot Spatial Patterns of Multiple Genes

As an alternative to the previous figure, we can plot multiple genes using the normalized expression values.

ordered_de_results <- de_results[order(de_results$qval), ]

multiGenePlots(norm_expr,
    coordinates = X,
    ordered_de_results[1:6, ]$g,
    point_size = 4,
    viridis_option = "D",
    dark_theme = FALSE
)

3.8 Plot Fraction Spatial Variance vs Q-value

FSV_sig(results, ms_results)
#> Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps

4 SpatialExperiment integration

The SpatialDE workflow can also be executed with a SpatialExperiment object as input.

library(SpatialExperiment)

# For SpatialExperiment object, we neeed to transpose the counts matrix in order
# to have genes on rows and spot on columns. 
# For this example, run spatialDE on the first 1000 genes

partial_counts <- head(Rep11_MOB_0, 1000)

spe <- SpatialExperiment(
  assays = list(counts = partial_counts),
  spatialData = DataFrame(MOB_sample_info[, c("x", "y")]),
  spatialCoordsNames = c("x", "y")
)

out <- spatialDE(spe, assay_type = "counts", verbose = FALSE)
head(out[order(out$qval), ])
ABCDEFGHIJ0123456789
 
 
FSV
<dbl>
M
<dbl>
g
<chr>
l
<dbl>
max_delta
<dbl>
max_ll
<dbl>
max_mu_hat
<dbl>
max_s2_t_hat
<dbl>
model
<chr>
7590.58532874Fabp71.1351900.6917423-157.86414.8057703.2961235SE
6840.52155924Apc1.1351900.8957046-152.4253-2.8521221.1971964SE
8860.34254734Kcnh31.9076091.7908397-201.1959-7.1629022.9230555SE
6810.36716804Pcp41.1351901.6829213-124.8416-9.20985210.0688118SE
7350.38203504Frzb1.1351901.5794319-239.83551.1565000.3240445SE
8320.23785404Gng131.9076092.9897888-204.10193.6484250.7506023SE

4.1 Plot Spatial Patterns of Multiple Genes (using SpatialExperiment object)

We can plot spatial patterns of multiples genes using the spe object.

spe_results <- out[out$qval < 0.05, ]

ordered_spe_results <- spe_results[order(spe_results$qval), ]

multiGenePlots(spe,
    assay_type = "counts",
    ordered_spe_results[1:6, ]$g,
    point_size = 4,
    viridis_option = "D",
    dark_theme = FALSE
)

4.2 Classify spatially variable genes with model_search and spatial_patterns

msearch <- modelSearch(spe,
    de_results = out, qval_thresh = 0.05,
    verbose = FALSE
)

head(msearch)
ABCDEFGHIJ0123456789
 
 
BIC
<dbl>
FSV
<dbl>
LLR
<dbl>
M
<dbl>
g
<chr>
l
<dbl>
max_delta
<dbl>
max_ll
<dbl>
0287.66790.06406448NaN4X2900097C17Rik5.38679614.914618-132.7126
1263.50430.09590705NaN4Pcp49.0521389.860377-120.6308
2370.86710.09681127NaN4Rbfox39.0521389.758510-174.3122
3428.93590.08063297NaN4Gng139.05213811.926341-203.3466
4401.77370.14867725NaN4Kcnh39.0521385.989364-189.7655
5479.48040.3694386214.012724Pmepa11.1351901.666576-228.6188
spatterns <- spatialPatterns(spe,
    de_results = de_results, qval_thresh = 0.05,
    n_patterns = 4L, length = 1.5, verbose = FALSE
)

spatterns$pattern_results
ABCDEFGHIJ0123456789
 
 
g
<chr>
pattern
<dbl>
membership
<dbl>
595X2900097C17Rik21.0000000
660Eef1a131.0000000
667Pmepa111.0000000
668Pcp401.0000000
671Apc31.0000000
696Rbfox301.0000000
722Penk01.0000000
725Frzb11.0000000
752Fabp711.0000000
791Gm1388930.9999996

sessionInfo

Session info
#> [1] "2024-10-29 19:41:45 EDT"
#> R Under development (unstable) (2024-10-21 r87258)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] SpatialExperiment_1.17.0    SingleCellExperiment_1.29.0
#>  [3] SummarizedExperiment_1.37.0 Biobase_2.67.0             
#>  [5] GenomicRanges_1.59.0        GenomeInfoDb_1.43.0        
#>  [7] IRanges_2.41.0              S4Vectors_0.45.0           
#>  [9] BiocGenerics_0.53.0         MatrixGenerics_1.19.0      
#> [11] matrixStats_1.4.1           ggplot2_3.5.1              
#> [13] spatialDE_1.13.0            BiocStyle_2.35.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6            dir.expiry_1.15.0       rjson_0.2.23           
#>  [4] xfun_0.48               bslib_0.8.0             ggrepel_0.9.6          
#>  [7] lattice_0.22-6          vctrs_0.6.5             tools_4.5.0            
#> [10] generics_0.1.3          parallel_4.5.0          tibble_3.2.1           
#> [13] fansi_1.0.6             highr_0.11              pkgconfig_2.0.3        
#> [16] Matrix_1.7-1            checkmate_2.3.2         lifecycle_1.0.4        
#> [19] GenomeInfoDbData_1.2.13 farver_2.1.2            compiler_4.5.0         
#> [22] tinytex_0.53            munsell_0.5.1           htmltools_0.5.8.1      
#> [25] sass_0.4.9              yaml_2.3.10             pillar_1.9.0           
#> [28] crayon_1.5.3            jquerylib_0.1.4         cachem_1.1.0           
#> [31] DelayedArray_0.33.0     magick_2.8.5            abind_1.4-8            
#> [34] basilisk_1.19.0         tidyselect_1.2.1        digest_0.6.37          
#> [37] dplyr_1.1.4             bookdown_0.41           labeling_0.4.3         
#> [40] fastmap_1.2.0           grid_4.5.0              colorspace_2.1-1       
#> [43] cli_3.6.3               SparseArray_1.7.0       magrittr_2.0.3         
#> [46] S4Arrays_1.7.0          utf8_1.2.4              withr_3.0.2            
#> [49] backports_1.5.0         filelock_1.0.3          scales_1.3.0           
#> [52] UCSC.utils_1.3.0        rmarkdown_2.28          XVector_0.47.0         
#> [55] httr_1.4.7              gridExtra_2.3           reticulate_1.39.0      
#> [58] png_0.1-8               evaluate_1.0.1          knitr_1.48             
#> [61] basilisk.utils_1.19.0   viridisLite_0.4.2       rlang_1.1.4            
#> [64] Rcpp_1.0.13             glue_1.8.0              BiocManager_1.30.25    
#> [67] jsonlite_1.8.9          R6_2.5.1                zlibbioc_1.53.0