1 Introduction

2 Installation

3 Example: Mouse Olfactory Bulb

3.1 Load data
3.2 stabilize
3.3 regress_out
3.4 run
3.5 model_search
3.6 spatial_patterns
3.7 Plots
3.8 Plot Fraction Spatial Variance vs Q-value

4 SpatialExperiment integration

4.1 Plot Spatial Patterns of Multiple Genes (using SpatialExperiment object)
4.2 Classify spatially variable genes with model_search and spatial_patterns

sessionInfo

1 Introduction

SpatialDE by Svensson et al., 2018, is a method to identify spatially variable genes (SVGs) in spatially resolved transcriptomics data.

2 Installation

You can install spatialDE from Bioconductor with the following code:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("spatialDE")

3 Example: Mouse Olfactory Bulb

Reproducing the example analysis from the original SpatialDE Python package.

library(spatialDE)
library(ggplot2)

3.1 Load data

Files originally retrieved from SpatialDE GitHub repository from the following links: https://github.com/Teichlab/SpatialDE/blob/master/Analysis/MouseOB/data/Rep11_MOB_0.csv https://github.com/Teichlab/SpatialDE/blob/master/Analysis/MouseOB/MOB_sample_info.csv

# Expression file used in python SpatialDE. 
data("Rep11_MOB_0")

# Sample Info file used in python SpatialDE
data("MOB_sample_info")

The Rep11_MOB_0 object contains spatial expression data for 16218 genes on 262 spots, with genes as rows and spots as columns.

Rep11_MOB_0[1:5, 1:5]
#>         16.92x9.015 16.945x11.075 16.97x10.118 16.939x12.132 16.949x13.055
#> Nrf1              1             0            0             1             0
#> Zbtb5             1             0            1             0             0
#> Ccnl1             1             3            1             1             0
#> Lrrfip1           2             2            0             0             3
#> Bbs1              1             2            0             4             0
dim(Rep11_MOB_0)
#> [1] 16218   262

The MOB_sample_info object contains a data.frame with coordinates for each spot.

head(MOB_sample_info)

ABCDEFGHIJ0123456789

	x <dbl>	y <dbl>	total_counts <int>
16.92x9.015	16.920	9.015	18790
16.945x11.075	16.945	11.075	36990
16.97x10.118	16.970	10.118	12471
16.939x12.132	16.939	12.132	22703
16.949x13.055	16.949	13.055	18641
16.942x15.088	16.942	15.088	23408

3.1.1 Filter out pratically unobserved genes

Rep11_MOB_0 <- Rep11_MOB_0[rowSums(Rep11_MOB_0) >= 3, ]

3.1.2 Get total_counts for every spot

Rep11_MOB_0 <- Rep11_MOB_0[, row.names(MOB_sample_info)]
MOB_sample_info$total_counts <- colSums(Rep11_MOB_0)
head(MOB_sample_info)

ABCDEFGHIJ0123456789

	x <dbl>	y <dbl>	total_counts <dbl>
16.92x9.015	16.920	9.015	18790
16.945x11.075	16.945	11.075	36990
16.97x10.118	16.970	10.118	12471
16.939x12.132	16.939	12.132	22703
16.949x13.055	16.949	13.055	18641
16.942x15.088	16.942	15.088	23408

3.1.3 Get coordinates from `MOB_sample_info`

X <- MOB_sample_info[, c("x", "y")]
head(X)

ABCDEFGHIJ0123456789

	x <dbl>	y <dbl>
16.92x9.015	16.920	9.015
16.945x11.075	16.945	11.075
16.97x10.118	16.970	10.118
16.939x12.132	16.939	12.132
16.949x13.055	16.949	13.055
16.942x15.088	16.942	15.088

3.2 `stabilize`

The SpatialDE method assumes normally distributed data, so we stabilize the variance of the negative binomial distributed counts data using Anscombe’s approximation. The stabilize() function takes as input a data.frame of expression values with samples in columns and genes in rows. Thus, in this case, we have to transpose the data.

norm_expr <- stabilize(Rep11_MOB_0)
#> + /home/biocbuild/.cache/R/basilisk/1.19.0/0/bin/conda create --yes --prefix /home/biocbuild/.cache/R/basilisk/1.19.0/spatialDE/1.13.0/env 'python=3.10.14' --quiet -c conda-forge --override-channels
#> + /home/biocbuild/.cache/R/basilisk/1.19.0/0/bin/conda install --yes --prefix /home/biocbuild/.cache/R/basilisk/1.19.0/spatialDE/1.13.0/env 'python=3.10.14' -c conda-forge --override-channels
#> + /home/biocbuild/.cache/R/basilisk/1.19.0/0/bin/conda install --yes --prefix /home/biocbuild/.cache/R/basilisk/1.19.0/spatialDE/1.13.0/env -c conda-forge 'python=3.10.14' 'numpy=1.23.5' 'scipy=1.9.3' 'patsy=0.5.3' 'pandas=1.5.2' --override-channels
norm_expr[1:5, 1:5]
#>         16.92x9.015 16.945x11.075 16.97x10.118 16.939x12.132 16.949x13.055
#> Nrf1       1.227749     0.8810934    0.8810934     1.2277491     0.8810934
#> Zbtb5      1.227749     0.8810934    1.2277491     0.8810934     0.8810934
#> Ccnl1      1.227749     1.6889027    1.2277491     1.2277491     0.8810934
#> Lrrfip1    1.484676     1.4846765    0.8810934     0.8810934     1.6889027
#> Bbs1       1.227749     1.4846765    0.8810934     1.8584110     0.8810934

3.3 `regress_out`

Next, we account for differences in library size between the samples by regressing out the effect of the total counts for each gene using linear regression.

resid_expr <- regress_out(norm_expr, sample_info = MOB_sample_info)
resid_expr[1:5, 1:5]
#>         16.92x9.015 16.945x11.075 16.97x10.118 16.939x12.132 16.949x13.055
#> Nrf1    -0.75226761    -1.2352000   -1.0164479    -0.7903289    -1.0973214
#> Zbtb5    0.09242373    -0.3323719    0.1397144    -0.2760560    -0.2533134
#> Ccnl1   -2.77597164    -2.5903783   -2.6092013    -2.8529340    -3.1193883
#> Lrrfip1 -1.92331333    -2.1578718   -2.3849405    -2.5924072    -1.7163300
#> Bbs1    -1.12186064    -1.0266476   -1.3706460    -0.5363646    -1.4666155

3.4 `run`

To reduce running time, the SpatialDE test is run on a subset of 1000 genes. Running it on the complete data set takes about 10 minutes.

# For this example, run spatialDE on the first 1000 genes
sample_resid_expr <- head(resid_expr, 1000)

results <- spatialDE::run(sample_resid_expr, coordinates = X)
head(results[order(results$qval), ])

ABCDEFGHIJ0123456789

	FSV <dbl>	M <dbl>	g <chr>	l <dbl>	max_delta <dbl>	max_ll <dbl>	max_mu_hat <dbl>	max_s2_t_hat <dbl>	model <chr>
752	0.5869020	4	Fabp7	1.135190	0.6872704	-154.7691	5.036316	3.6039400	SE
671	0.5276876	4	Apc	1.135190	0.8739619	-146.9456	-3.922771	2.1552117	SE
884	0.3651030	4	Kcnh3	1.907609	1.6225593	-177.5658	-8.066532	3.7081686	SE
668	0.3837007	4	Pcp4	1.135190	1.5683362	-127.1354	-11.866414	16.9252387	SE
722	0.3345161	4	Penk	1.135190	1.9424986	-163.0520	-9.491798	10.3640716	SE
725	0.3931224	4	Frzb	1.135190	1.5073475	-217.1284	1.180284	0.3109513	SE

3.5 `model_search`

Finally, we can classify the DE genes to interpetable DE classes using the model_search function. We apply the model search on filtered DE results, using a threshold of 0.05 for the Q-values.

de_results <- results[results$qval < 0.05, ]

ms_results <- model_search(
    sample_resid_expr,
    coordinates = X,
    de_results = de_results
)

# To show ms_results sorted on qvalue, uncomment the following line
# head(ms_results[order(ms_results$qval), ])

head(ms_results)

ABCDEFGHIJ0123456789

	BIC <dbl>	FSV <dbl>	LLR <dbl>	M <dbl>	g <chr>	l <dbl>	max_delta <dbl>	max_ll <dbl>
0	273.4206	0.06585344	NaN	4	X2900097C17Rik	5.386796	14.481719	-125.5890
1	267.1570	0.10520888	NaN	4	Pcp4	9.052138	8.896112	-122.4571
2	333.8619	0.10540242	NaN	4	Rbfox3	9.052138	8.877856	-155.8096
3	370.2613	0.09234754	NaN	4	Sez6	9.052138	10.280760	-174.0093
4	378.8001	0.08317351	NaN	4	Gng13	9.052138	11.530100	-178.2787
5	394.8754	0.10227380	NaN	4	Gng4	9.052138	9.181433	-186.3163

3.6 `spatial_patterns`

Furthermore, we can group spatially variable genes (SVGs) into spatial patterns using automatic expression histology (AEH).

sp <- spatial_patterns(
    sample_resid_expr,
    coordinates = X,
    de_results = de_results,
    n_patterns = 4L, length = 1.5
)
sp$pattern_results

ABCDEFGHIJ0123456789

	g <chr>	pattern <dbl>	membership <dbl>
595	X2900097C17Rik	0	1
660	Eef1a1	0	1
667	Pmepa1	2	1
668	Pcp4	1	1
671	Apc	3	1
696	Rbfox3	1	1
722	Penk	1	1
725	Frzb	2	1
752	Fabp7	2	1
791	Gm13889	1	1

3.7 Plots

Visualizing one of the most significant genes.

gene <- "Pcp4"

ggplot(data = MOB_sample_info, aes(x = x, y = y, color = norm_expr[gene, ])) +
    geom_point(size = 7) +
    ggtitle(gene) +
    scale_color_viridis_c() +
    labs(color = gene)

3.7.1 Plot Spatial Patterns of Multiple Genes

As an alternative to the previous figure, we can plot multiple genes using the normalized expression values.

ordered_de_results <- de_results[order(de_results$qval), ]

multiGenePlots(norm_expr,
    coordinates = X,
    ordered_de_results[1:6, ]$g,
    point_size = 4,
    viridis_option = "D",
    dark_theme = FALSE
)

3.8 Plot Fraction Spatial Variance vs Q-value

FSV_sig(results, ms_results)
#> Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps

4 SpatialExperiment integration

The SpatialDE workflow can also be executed with a SpatialExperiment object as input.

library(SpatialExperiment)

# For SpatialExperiment object, we neeed to transpose the counts matrix in order
# to have genes on rows and spot on columns. 
# For this example, run spatialDE on the first 1000 genes

partial_counts <- head(Rep11_MOB_0, 1000)

spe <- SpatialExperiment(
  assays = list(counts = partial_counts),
  spatialData = DataFrame(MOB_sample_info[, c("x", "y")]),
  spatialCoordsNames = c("x", "y")
)

out <- spatialDE(spe, assay_type = "counts", verbose = FALSE)
head(out[order(out$qval), ])

ABCDEFGHIJ0123456789

	FSV <dbl>	M <dbl>	g <chr>	l <dbl>	max_delta <dbl>	max_ll <dbl>	max_mu_hat <dbl>	max_s2_t_hat <dbl>	model <chr>
759	0.5853287	4	Fabp7	1.135190	0.6917423	-157.8641	4.805770	3.2961235	SE
684	0.5215592	4	Apc	1.135190	0.8957046	-152.4253	-2.852122	1.1971964	SE
886	0.3425473	4	Kcnh3	1.907609	1.7908397	-201.1959	-7.162902	2.9230555	SE
681	0.3671680	4	Pcp4	1.135190	1.6829213	-124.8416	-9.209852	10.0688118	SE
735	0.3820350	4	Frzb	1.135190	1.5794319	-239.8355	1.156500	0.3240445	SE
832	0.2378540	4	Gng13	1.907609	2.9897888	-204.1019	3.648425	0.7506023	SE

4.1 Plot Spatial Patterns of Multiple Genes (using SpatialExperiment object)

We can plot spatial patterns of multiples genes using the spe object.

spe_results <- out[out$qval < 0.05, ]

ordered_spe_results <- spe_results[order(spe_results$qval), ]

multiGenePlots(spe,
    assay_type = "counts",
    ordered_spe_results[1:6, ]$g,
    point_size = 4,
    viridis_option = "D",
    dark_theme = FALSE
)

4.2 Classify spatially variable genes with `model_search` and `spatial_patterns`

msearch <- modelSearch(spe,
    de_results = out, qval_thresh = 0.05,
    verbose = FALSE
)

head(msearch)

ABCDEFGHIJ0123456789

	BIC <dbl>	FSV <dbl>	LLR <dbl>	M <dbl>	g <chr>	l <dbl>	max_delta <dbl>	max_ll <dbl>
0	287.6679	0.06406448	NaN	4	X2900097C17Rik	5.386796	14.914618	-132.7126
1	263.5043	0.09590705	NaN	4	Pcp4	9.052138	9.860377	-120.6308
2	370.8671	0.09681127	NaN	4	Rbfox3	9.052138	9.758510	-174.3122
3	428.9359	0.08063297	NaN	4	Gng13	9.052138	11.926341	-203.3466
4	401.7737	0.14867725	NaN	4	Kcnh3	9.052138	5.989364	-189.7655
5	479.4804	0.36943862	14.01272	4	Pmepa1	1.135190	1.666576	-228.6188

spatterns <- spatialPatterns(spe,
    de_results = de_results, qval_thresh = 0.05,
    n_patterns = 4L, length = 1.5, verbose = FALSE
)

spatterns$pattern_results

ABCDEFGHIJ0123456789

	g <chr>	pattern <dbl>	membership <dbl>
595	X2900097C17Rik	2	1.0000000
660	Eef1a1	3	1.0000000
667	Pmepa1	1	1.0000000
668	Pcp4	0	1.0000000
671	Apc	3	1.0000000
696	Rbfox3	0	1.0000000
722	Penk	0	1.0000000
725	Frzb	1	1.0000000
752	Fabp7	1	1.0000000
791	Gm13889	3	0.9999996

`sessionInfo`

Session info

#> [1] "2024-10-29 19:41:45 EDT"
#> R Under development (unstable) (2024-10-21 r87258)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] SpatialExperiment_1.17.0    SingleCellExperiment_1.29.0
#>  [3] SummarizedExperiment_1.37.0 Biobase_2.67.0             
#>  [5] GenomicRanges_1.59.0        GenomeInfoDb_1.43.0        
#>  [7] IRanges_2.41.0              S4Vectors_0.45.0           
#>  [9] BiocGenerics_0.53.0         MatrixGenerics_1.19.0      
#> [11] matrixStats_1.4.1           ggplot2_3.5.1              
#> [13] spatialDE_1.13.0            BiocStyle_2.35.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6            dir.expiry_1.15.0       rjson_0.2.23           
#>  [4] xfun_0.48               bslib_0.8.0             ggrepel_0.9.6          
#>  [7] lattice_0.22-6          vctrs_0.6.5             tools_4.5.0            
#> [10] generics_0.1.3          parallel_4.5.0          tibble_3.2.1           
#> [13] fansi_1.0.6             highr_0.11              pkgconfig_2.0.3        
#> [16] Matrix_1.7-1            checkmate_2.3.2         lifecycle_1.0.4        
#> [19] GenomeInfoDbData_1.2.13 farver_2.1.2            compiler_4.5.0         
#> [22] tinytex_0.53            munsell_0.5.1           htmltools_0.5.8.1      
#> [25] sass_0.4.9              yaml_2.3.10             pillar_1.9.0           
#> [28] crayon_1.5.3            jquerylib_0.1.4         cachem_1.1.0           
#> [31] DelayedArray_0.33.0     magick_2.8.5            abind_1.4-8            
#> [34] basilisk_1.19.0         tidyselect_1.2.1        digest_0.6.37          
#> [37] dplyr_1.1.4             bookdown_0.41           labeling_0.4.3         
#> [40] fastmap_1.2.0           grid_4.5.0              colorspace_2.1-1       
#> [43] cli_3.6.3               SparseArray_1.7.0       magrittr_2.0.3         
#> [46] S4Arrays_1.7.0          utf8_1.2.4              withr_3.0.2            
#> [49] backports_1.5.0         filelock_1.0.3          scales_1.3.0           
#> [52] UCSC.utils_1.3.0        rmarkdown_2.28          XVector_0.47.0         
#> [55] httr_1.4.7              gridExtra_2.3           reticulate_1.39.0      
#> [58] png_0.1-8               evaluate_1.0.1          knitr_1.48             
#> [61] basilisk.utils_1.19.0   viridisLite_0.4.2       rlang_1.1.4            
#> [64] Rcpp_1.0.13             glue_1.8.0              BiocManager_1.30.25    
#> [67] jsonlite_1.8.9          R6_2.5.1                zlibbioc_1.53.0

Introduction to spatialDE

29 October 2024

1 Introduction

2 Installation

3 Example: Mouse Olfactory Bulb

3.1 Load data

3.1.1 Filter out pratically unobserved genes

3.1.2 Get total_counts for every spot

3.1.3 Get coordinates from `MOB_sample_info`

3.2 `stabilize`

3.3 `regress_out`

3.4 `run`

3.5 `model_search`

3.6 `spatial_patterns`

3.7 Plots

3.7.1 Plot Spatial Patterns of Multiple Genes

3.8 Plot Fraction Spatial Variance vs Q-value

4 SpatialExperiment integration

4.1 Plot Spatial Patterns of Multiple Genes (using SpatialExperiment object)

4.2 Classify spatially variable genes with `model_search` and `spatial_patterns`

`sessionInfo`

Introduction to spatialDE

29 October 2024

1 Introduction

2 Installation

3 Example: Mouse Olfactory Bulb

3.1 Load data

3.1.1 Filter out pratically unobserved genes

3.1.2 Get total_counts for every spot

3.1.3 Get coordinates from MOB_sample_info

3.2 stabilize

3.3 regress_out

3.4 run

3.5 model_search

3.6 spatial_patterns

3.7 Plots

3.7.1 Plot Spatial Patterns of Multiple Genes

3.8 Plot Fraction Spatial Variance vs Q-value

4 SpatialExperiment integration

4.1 Plot Spatial Patterns of Multiple Genes (using SpatialExperiment object)

4.2 Classify spatially variable genes with model_search and spatial_patterns

sessionInfo

3.1.3 Get coordinates from `MOB_sample_info`

3.2 `stabilize`

3.3 `regress_out`

3.4 `run`

3.5 `model_search`

3.6 `spatial_patterns`

4.2 Classify spatially variable genes with `model_search` and `spatial_patterns`

`sessionInfo`