Contents

1 Overview

SpaceMarkers leverages latent feature analysis of the spatial components of transcriptomic data to identify biologically relevant molecular interactions between cell groups.This tutorial will use the latent features from CoGAPS to look at pattern interactions in a Visium 10x breast ductal carcinoma spatial transcriptomics dataset.

2 Installation

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("SpaceMarkers")
library(SpaceMarkers)

3 Setup

3.2 Extracting Counts Matrix

3.2.1 load10xExpr

Here the counts matrix will be obtained from the h5 object on the Visium site and genes with less than 3 counts are removed from the dataset.This can be achieved with the load10XExpr function.

download.file(counts_url,basename(counts_url))
counts_matrix <- load10XExpr(visiumDir = ".", h5filename = counts_file)
## Warning in asMethod(object): sparse->dense coercion: allocating vector of size
## 1.3 GiB
good_gene_threshold <- 3
goodGenes <- rownames(counts_matrix)[
    apply(counts_matrix,1,function(x) sum(x>0)>=good_gene_threshold)]
counts_matrix <- counts_matrix[goodGenes,]

3.3 Obtaining CoGAPS Patterns

In this example the latent features from CoGAPS will be used to identify overlapping genes with SpaceMarkers. Here the featureLoadings (cells) and samplePatterns (genes) for both the expression matrix and CoGAPS matrix need to match.

data("cogaps_result")
features <- intersect(rownames(counts_matrix),rownames(
    slot(cogaps_result,"featureLoadings")))
barcodes <- intersect(colnames(counts_matrix),rownames(
    slot(cogaps_result,"sampleFactors")))
counts_matrix <- counts_matrix[features,barcodes]
cogaps_matrix<-slot(cogaps_result,"featureLoadings")[features,]%*%
    t(slot(cogaps_result,"sampleFactors")[barcodes,])

3.4 Obtaining Spatial Coordinates

3.4.1 load10XCoords

The spatial coordinates will also be pulled from Visium for this dataset. These are combined with the latent features to demonstrate how cells for each pattern interact in 2D space. The data can be extracted with the load10XCoords() function

download.file(sp_url, basename(sp_url))
untar(basename(sp_url))
spCoords <- load10XCoords(visiumDir = ".")
rownames(spCoords) <- spCoords$barcode
spCoords <- spCoords[barcodes,]
spPatterns <- cbind(spCoords,slot(cogaps_result,"sampleFactors")[barcodes,])
head(spPatterns)
##                               barcode         y        x    Pattern_1
## AAACAACGAATAGTTC-1 AAACAACGAATAGTTC-1  67.28568 207.4858 0.4676255882
## AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1 238.79054 375.0650 0.2690758109
## AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1  77.82161 260.3531 0.1105933860
## AAACACCAATAACTGC-1 AAACACCAATAACTGC-1 268.53653 212.2053 0.0002508377
## AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1 115.92419 360.0982 0.2849308848
## AAACAGCTTTCAGAAG-1 AAACAGCTTTCAGAAG-1 213.86511 192.9231 0.1583736390
##                       Pattern_2    Pattern_3    Pattern_4    Pattern_5
## AAACAACGAATAGTTC-1 1.049391e-01 2.576064e-01 0.6848062277 4.747092e-02
## AAACAAGTATCTCCCA-1 4.394425e-01 2.056469e-01 0.2921337187 1.167576e-02
## AAACAATCTACTAGCA-1 1.148523e-02 2.309153e-01 0.4111314714 9.508318e-02
## AAACACCAATAACTGC-1 1.685795e-01 1.223603e-01 0.0001562788 8.041928e-01
## AAACAGAGCGACTCCT-1 1.102506e-01 9.053156e-08 0.2429406196 3.430807e-08
## AAACAGCTTTCAGAAG-1 9.741083e-06 1.723470e-01 0.3059957027 7.167605e-01

For demonstration purposes we will look at two patterns; Pattern_1 (immune cell) and Pattern_5 (invasive carcinoma lesion). Furthermore we will only look at the relationship between a pre-curated list of genes for efficiency.

data("curated_genes")
spPatterns<-spPatterns[c("barcode","y","x","Pattern_1","Pattern_5")]
counts_matrix <- counts_matrix[curated_genes,]
cogaps_matrix <- cogaps_matrix[curated_genes, ]

4 Executing SpaceMarkers

4.1 SpaceMarker Modes

SpaceMarkers can operate in ‘residual’or ’DE’ (DifferentialExpression) mode. In an ideal world the overlapping patterns identified by SpaceMarkers would be a homogeneous population of cells and the relationship between them would be linear. However, due to confounding effects of variations in cell density and common cell types in any given region, this is not always true.

To account for these confounding effects, the ‘residual’ mode compares the feature interactions between the expression matrix and the reconstructed latent space matrix. The features with the highest residual error are reported. The genes are then classified according to regions of overlapping vs exclusive influence. The default mode is ‘residual’ mode.

Suppose the feature (gene) information is not readily available and only the sample (cells) latent feature patterns with P-values are available? This is the advantage of ‘DE’ mode. Where residual mode assesses the non-linear effects that may arise from confounding variables, ‘DE’ mode assesses simple linear interactions between patterns directly from expression. DE mode also compares genes from regions of overlapping vs exclusive influence but does not consider residuals from the expression matrix as there is no matrix reconstruction with the latent feature matrix.

4.1.1 Residual Mode

SpaceMarkers identifies regions of influence using a gaussian kernel outlier based model. The reference pattern (Pattern_1 in this case) is used as the prior for this model. SpaceMarkers then identifies where the regions of influence are interacting from each of the other patterns as well as where they are mutually exclusive.

getSpatialParameters: This function identifies the optimal width of the gaussian distribution (sigmaOpt) as well as the outlier threshold around the set of spots (thresOpt) for each pattern.These parameters minimize the spatial autocorrelation residuals of the spots in the regions of influence.

getSpatialParameters took approximately 5 minutes on a MacBook Pro Quad-Intel Core i5 processor with 16 GB of memory. Therefore we will load this data from a previous run. The spPatterns object is the sole parameter for this function if you would like to run this yourself

data("optParams")
optParams
##           Pattern_1 Pattern_5
## sigmaOpt        7.0       6.4
## threshOpt       2.1       1.2

getInteractingGenes: This function identifies the regions of influence and interaction as well as the genes associated with these regions. A non-parametric Kruskal-Wallis test is used to identify statistically significant genes in any one region of influence without discerning which region is more significant. A post hoc Dunn’s Test is used for analysis of genes between regions and can distinguish which of two regions is more significant. If ‘residual’ mode is selected the user must provide a reconstructed matrix from the latent feature matrix. The matrix is passed to the ‘reconstruction’ argument and can be left as NULL for ‘DE’ mode. The ‘data’ parameter is the original expression matrix. The ‘spPatterns’ argument takes a dataframe with the spatial coordinates of each cell as well as the patterns. The spatial coordinate columns must contain the labels ‘x’ and ‘y’ to be recognized by the function.

SpaceMarkers <- getInteractingGenes(data = counts_matrix,
                                    reconstruction = cogaps_matrix,
                                    optParams = optParams,
                                    spPatterns = spPatterns,
                                    refPattern = "Pattern_1",
                                    mode ="residual",analysis="overlap")
## Using user provided optParams.
## Warning in matrixTests::row_kruskalwallis(x = as.matrix(testMat), g = region):
## 2262 columns dropped due to missing group information

NB: When running getInteractingGenes some warnings may be generated. The warnings are due to the nature of the ‘sparse’ data being used. Comparing two cells from the two patterns with identical information is redundant as SpaceMarkers is identifying statistically different expression for interactions exclusive to either of the two patterns and a region that is due to interaction between the given two patterns. Also, if there are too many zeros in the genes (rows) of those regions, the columns are dropped as there is nothing to compare in the Kruskal Wallis test.

print(head(SpaceMarkers$interacting_genes[[1]]))
##        Gene Pattern_1 x Pattern_5 KW.obs.tot KW.obs.groups KW.df KW.statistic
## IGHE   IGHE                vsBoth       2636             3     2     307.9678
## APOC1 APOC1                vsBoth       2636             3     2     373.3717
## TGM2   TGM2                vsBoth       2636             3     2     233.6329
## IFI30 IFI30                vsBoth       2636             3     2     160.1853
## PHGR1 PHGR1                vsBoth       2636             3     2     106.0202
## CLU     CLU                vsBoth       2636             3     2     107.2119
##          KW.pvalue     KW.p.adj Dunn.zP1_Int Dunn.zP2_Int Dunn.zP2_P1
## IGHE  1.335467e-67 6.619269e-67   -16.134086    -9.082053  12.0123341
## APOC1 8.382338e-82 4.550412e-81   -15.180146   -19.303620  -6.0326744
## TGM2  1.850362e-51 7.812641e-51   -13.895344    -7.575519  10.7424067
## IFI30 1.645121e-35 5.515995e-35   -10.298034   -12.656405  -3.3852418
## PHGR1 9.506202e-24 2.643188e-23   -10.270047    -7.912475   4.2122172
## CLU   5.238797e-24 1.493057e-23    -9.910068    -9.800521   0.5755801
##       Dunn.pval_1_Int Dunn.pval_2_Int Dunn.pval_2_1 Dunn.pval_1_Int.adj
## IGHE     1.469666e-58    1.065463e-19  3.060820e-33        1.726857e-57
## APOC1    4.787159e-52    5.007523e-83  1.612681e-09        5.399916e-51
## TGM2     6.759774e-44    3.576944e-14  6.434476e-27        7.331755e-43
## IFI30    7.191575e-25    1.031142e-36  7.111558e-04        4.716335e-24
## PHGR1    9.615778e-25    2.523207e-15  2.528762e-05        6.162840e-24
## CLU      3.763892e-23    1.120063e-22  5.648990e-01        2.166158e-22
##       Dunn.pval_2_Int.adj Dunn.pval_2_1.adj SpaceMarkersMetric
## IGHE         5.365365e-19      2.538680e-32           132.2682
## APOC1        7.432218e-82      5.109844e-09           130.3603
## TGM2         1.461881e-13      4.652621e-26           127.7907
## IFI30        9.692735e-36      1.856907e-03           120.5961
## PHGR1        1.078098e-14      7.060503e-05           120.5401
## CLU          6.074190e-22      8.253965e-01           119.8201
print(head(SpaceMarkers$hotspots))
##      Pattern_1   Pattern_5  
## [1,] "Pattern_1" NA         
## [2,] "Pattern_1" NA         
## [3,] NA          NA         
## [4,] NA          "Pattern_5"
## [5,] "Pattern_1" NA         
## [6,] NA          "Pattern_5"

The output is a list of data frames with information about the interacting genes of the refPattern and each pattern from the CoGAPS matrix (interacting_genes object). There is also a data frame with all of the regions of influence for any two of patterns (the hotspotRegions object).

For the ‘interacting_genes’ data frames, the first column is the list of genes and the second column says whether the statistical test were done vsPattern_1, vsPattern_2 or vsBoth. The remaining columns are statistics for the Kruskal-Wallis test and the post hoc Dunn’s test.The SpaceMarkersMetric column is a product of sums of the Dunn’s statistics and is used to rank the genes.

4.1.2 DE Mode

As described previously ‘DE’ mode only requires the counts matrix and spatial patterns and not the reconstructed CoGAPS matrix. It identifies simpler molecular interactions between regions.

SpaceMarkers_DE <- getInteractingGenes(
    data=counts_matrix,reconstruction=NULL,
    optParams = optParams,
    spPatterns = spPatterns,
    refPattern = "Pattern_1",
    mode="DE",analysis="overlap")
## Using user provided optParams.
## Warning in matrixTests::row_kruskalwallis(x = as.matrix(testMat), g = region):
## 2261 columns dropped due to missing group information

4.1.3 Differences between Residual Mode and DE Mode

One of the first things to notice is the difference in the number of genes identified between the two modes. Here we will use the genes

residual_p1_p5<-SpaceMarkers$interacting_genes[[1]]
DE_p1_p5<-SpaceMarkers_DE$interacting_genes[[1]]
paste(
    "Residual mode identified",dim(residual_p1_p5)[1],
        "interacting genes,while DE mode identified",dim(DE_p1_p5)[1],
        "interacting genes",collapse = NULL)
## [1] "Residual mode identified 35 interacting genes,while DE mode identified 77 interacting genes"

DE mode produces more genes than residual mode because the matrix of residuals highlights less significant differences for confounding genes across the spots.The next analysis will show where the top genes rank in each mode’s list if they are identified at all. A function was created that will take the top 20 genes of a reference list of genes and compare it to the entire list of a second list of genes. The return object is a data frame of the gene, the name of each list and the ranking of each gene as compared to the reference list. If there is no gene identified in the second list compared to the reference it is classified as NA.

compare_genes <- function(ref_list, list2,ref_name = "mode1",
                            list2_name = "mode2", sub_slice = NULL){
    ref_rank <- seq(1,length(ref_list),1)
    list2_ref_rank <- which(list2 %in% ref_list)
    list2_ref_genes <- list2[which(list2 %in% ref_list)]
    ref_genes_only <- ref_list[ !ref_list  %in% list2_ref_genes ]
    mode1 <- data.frame("Gene" = ref_list,"Rank" = ref_rank,"mode"= ref_name)
    mode2 <- data.frame("Gene" = c(list2_ref_genes, ref_genes_only),"Rank" = c(
        list2_ref_rank,rep(NA,length(ref_genes_only))),"mode"= list2_name)
    mode1_mode2 <- merge(mode1, mode2, by = "Gene", all = TRUE) 
    mode1_mode2 <- mode1_mode2[order(mode1_mode2$Rank.x),]
    mode1_mode2 <- subset(mode1_mode2,select = c("Gene","Rank.x","Rank.y"))
    colnames(mode1_mode2) <- c("Gene",paste0(ref_name,"_Rank"),
                                paste0(list2_name,"_Rank"))
    return(mode1_mode2)
}
res_to_DE <- compare_genes(head(residual_p1_p5$Gene, n = 20),DE_p1_p5$Gene,
                            ref_name="residual",list2_name="DE")
DE_to_res <- compare_genes(head(DE_p1_p5$Gene, n = 20),residual_p1_p5$Gene,
                            ref_name = "DE",list2_name = "residual")

4.1.4 Comparing residual mode to DE mode

res_to_DE
##        Gene residual_Rank DE_Rank
## 15     IGHE             1       1
## 1     APOC1             2      15
## 19     TGM2             3       2
## 14    IFI30             4      18
## 18    PHGR1             5       4
## 6       CLU             6       3
## 10     CST1             7       7
## 5      CAPG             8      28
## 9      COMP             9       9
## 2      APOE            10      14
## 16   LAPTM5            11      33
## 11      FAH            12      19
## 17   NDUFB2            13      13
## 4      C1QC            14      41
## 13 HLA-DRB1            15      NA
## 3      C1QB            16      59
## 8    COL4A1            17      22
## 7   COL18A1            18      25
## 12    GCHFR            19       6
## 20    TREM2            20      21

Here we identify the top 20 genes in ‘residual’ mode and their corresponding ranking in DE mode.HLA-DRB1 is the only gene identified in residual mode and not in DE mode. The other genes are ranked relatively high in both residual and DE mode.

4.1.5 Comparing DE mode to residual mode

DE_to_res
##      Gene DE_Rank residual_Rank
## 14   IGHE       1             1
## 19   TGM2       2             3
## 4     CLU       3             6
## 16  PHGR1       4             5
## 8    CTSD       5            NA
## 12  GCHFR       6            19
## 6    CST1       7             7
## 11    FTL       8            31
## 5    COMP       9             9
## 20    UBC      10            NA
## 1    ACTB      11            NA
## 10   FTH1      12            NA
## 15 NDUFB2      13            13
## 3    APOE      14            10
## 2   APOC1      15             2
## 18    SDS      16            NA
## 7    CTSB      17            23
## 13  IFI30      18             4
## 9     FAH      19            12
## 17   PSAP      20            22

Recall that DE mode looks at the information encoded in the latent feature space and does not filter out genes based on any confounders between the counts matrix and latent feature matrix as is done in ‘residual’ mode. Therefore there are more genes in DE mode not identified at all in residual mode.

There is some agreement with interacting genes between the two methods but there are also quite a few differences. Therefore, the selected mode can significantly impact the downstream results and should be taken into consideration based on the specific biological question being answered and the data available.

4.2 Types of Analyses

Another feature of the SpaceMarkers package is the type of analysis that can be carried out, whether ‘overlap’ or ‘enrichment’ mode. The major difference between the two is that enrichment mode includes genes even if they did not pass the post-hoc Dunn’s test. These additional genes were included to enable a more statistically powerful pathway enrichment analysis and understand to a better extent the impact of genes involved each pathway. Changing analysis = ‘enrichment’ in the getInteractingGenes function will enable this.

SpaceMarkers_enrich <- getInteractingGenes(data = counts_matrix,
                                    reconstruction = cogaps_matrix,
                                    optParams = optParams,
                                    spPatterns = spPatterns,
                                    refPattern = "Pattern_1",
                                    mode ="residual",analysis="enrichment")
## Using user provided optParams.
## Warning in matrixTests::row_kruskalwallis(x = as.matrix(testMat), g = region):
## 2262 columns dropped due to missing group information
SpaceMarkers_DE_enrich <- getInteractingGenes(
    data=counts_matrix,reconstruction=NULL,
    optParams = optParams,
    spPatterns = spPatterns,
    refPattern = "Pattern_1",
    mode="DE",analysis="enrichment")
## Using user provided optParams.
## Warning in matrixTests::row_kruskalwallis(x = as.matrix(testMat), g = region):
## 2265 columns dropped due to missing group information
residual_p1_p5_enrichment<-SpaceMarkers_enrich$interacting_genes[[1]]$Gene
DE_p1_p5_enrichment<-SpaceMarkers_DE_enrich$interacting_genes[[1]]$Gene

4.2.1 Residual Mode and DE Mode - Enrichment

The data frames for the Pattern_1 x Pattern_5 will be used to compare the results of the enrichment analyses

enrich_res_to_de<-compare_genes(
    head(DE_p1_p5_enrichment, 20),
    residual_p1_p5_enrichment,
    ref_name="DE_Enrich",list2_name = "res_Enrich")
enrich_res_to_de
##      Gene DE_Enrich_Rank res_Enrich_Rank
## 14   IGHE              1               1
## 18   TGM2              2               3
## 4     CLU              3               6
## 16  PHGR1              4               5
## 8    CTSD              5              79
## 12  GCHFR              6              19
## 6    CST1              7               7
## 11    FTL              8              34
## 5    COMP              9               9
## 20    UBC             10              58
## 1    ACTB             11              99
## 10   FTH1             12              93
## 15 NDUFB2             13              13
## 2   APOC1             14               2
## 3    APOE             15              10
## 17    SDS             16              47
## 7    CTSB             17              23
## 13  IFI30             18               4
## 9     FAH             19              12
## 19  TREM2             20              20

The ranks differ alot more here because now genes that were not previously ranked are assigned a score.

overlap_enrich_de<-compare_genes(
    head(DE_p1_p5_enrichment,20),
    DE_p1_p5$Gene,
    ref_name="DE_Enrich",
    list2_name="DE_Overlap")
overlap_enrich_de
##      Gene DE_Enrich_Rank DE_Overlap_Rank
## 14   IGHE              1               1
## 18   TGM2              2               2
## 4     CLU              3               3
## 16  PHGR1              4               4
## 8    CTSD              5               5
## 12  GCHFR              6               6
## 6    CST1              7               7
## 11    FTL              8               8
## 5    COMP              9               9
## 20    UBC             10              10
## 1    ACTB             11              11
## 10   FTH1             12              12
## 15 NDUFB2             13              13
## 2   APOC1             14              15
## 3    APOE             15              14
## 17    SDS             16              16
## 7    CTSB             17              17
## 13  IFI30             18              18
## 9     FAH             19              19
## 19  TREM2             20              21

The enrichment and overlap analysis are in great agreement for DE mode. Typically, you may see slight changes among genes lower in the ranking. This is especially important where genes that do not pass the Dunn’s test for interactions between any of the other two patterns in the overlap analysis are now ranked in enrichment analysis. The Pattern_1 x Pattern_5 column for these genes is labelled as FALSE.

Here is an example of the statistics for such genes.

tail(SpaceMarkers_DE_enrich$interacting_genes[[1]])
##                  Gene Pattern_1 x Pattern_5 KW.obs.tot KW.obs.groups KW.df
## HLA-DRB1     HLA-DRB1                 FALSE       2633             3     2
## GPD1             GPD1                 FALSE       2633             3     2
## AL356599.1 AL356599.1                 FALSE       2633             3     2
## AC002451.1 AC002451.1                 FALSE       2633             3     2
## DUSP18         DUSP18                 FALSE       2633             3     2
## ADAM23         ADAM23                 FALSE       2633             3     2
##            KW.statistic     KW.pvalue      KW.p.adj Dunn.zP1_Int Dunn.zP2_Int
## HLA-DRB1   1136.5137908 1.618699e-247 1.845317e-245    1.1212254  -18.6565165
## GPD1         17.0294779  2.004914e-04  3.265146e-04    1.2756067   -1.2409944
## AL356599.1   13.1106722  1.422505e-03  2.162207e-03    1.3192830   -0.8844838
## AC002451.1    0.7892067  6.739473e-01  6.859821e-01    0.8251428    0.8628801
## DUSP18        2.1839938  3.355458e-01  3.477474e-01    1.4517799    1.0213535
## ADAM23        4.0322789  1.331686e-01  1.438291e-01    1.5115479    0.4626909
##            Dunn.zP2_P1 Dunn.pval_1_Int Dunn.pval_2_Int Dunn.pval_2_1
## HLA-DRB1   -32.0505723               1    1.117852e-77 2.155401e-225
## GPD1        -4.1242541               1    2.146078e-01  3.719380e-05
## AL356599.1  -3.6197884               1    3.764351e-01  2.948440e-04
## AC002451.1   0.0275594               1    1.000000e+00  9.780136e-01
## DUSP18      -0.7554868               1    1.000000e+00  4.499569e-01
## ADAM23      -1.7586803               1    1.000000e+00  7.863183e-02
##            Dunn.pval_1_Int.adj Dunn.pval_2_Int.adj Dunn.pval_2_1.adj
## HLA-DRB1                     1        2.190990e-76     6.336879e-223
## GPD1                         1        2.336841e-01      7.147044e-05
## AL356599.1                   1        3.952568e-01      5.384108e-04
## AC002451.1                   1        1.000000e+00      1.000000e+00
## DUSP18                       1        1.000000e+00      4.916462e-01
## ADAM23                       1        1.000000e+00      1.042329e-01
##            SpaceMarkersMetric
## HLA-DRB1            -2.242451
## GPD1                -2.551213
## AL356599.1          -2.638566
## AC002451.1        -101.650286
## DUSP18            -102.903560
## ADAM23            -103.023096

A similar trend can be observed when comparing the overlap and enrichment analysis in residual mode.

overlap_enrich_res<-compare_genes(
    head(residual_p1_p5$Gene, 20),
    residual_p1_p5_enrichment,
    ref_name ="res_overlap",list2_name="res_enrich")
overlap_enrich_res
##        Gene res_overlap_Rank res_enrich_Rank
## 15     IGHE                1               1
## 1     APOC1                2               2
## 19     TGM2                3               3
## 14    IFI30                4               4
## 18    PHGR1                5               5
## 6       CLU                6               6
## 10     CST1                7               7
## 5      CAPG                8               8
## 9      COMP                9               9
## 2      APOE               10              10
## 16   LAPTM5               11              11
## 11      FAH               12              12
## 17   NDUFB2               13              13
## 4      C1QC               14              14
## 13 HLA-DRB1               15              16
## 3      C1QB               16              15
## 8    COL4A1               17              17
## 7   COL18A1               18              18
## 12    GCHFR               19              19
## 20    TREM2               20              20

5 Visualizing SpaceMarkers

The differences between the gene interactions of Pattern_1 and Pattern_5 can be visualized in various ways to view both the magnitude and location of expression in space. In this analysis the top 2-3 genes in residual mode from Pattern_5 only vs Pattern_1, Pattern_1 only vs Pattern_5 and the interacting region vs both Pattern_1 and Pattern_5 will be compared.

5.1 Loading Packages

The following libraries are required to make the plots:

library(Matrix)
library(rjson)
library(cowplot)
library(RColorBrewer)
library(grid)
library(readbitmap)
library(dplyr)
library(data.table)
library(viridis)
library(hrbrthemes)
library(ggplot2)

5.2 Code Setup

This first function below can visualize the locations of these patterns on a spatial grid. The code has been adopted from 10xgenomics: (support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/rkit)

geom_spatial <-  function(mapping = NULL,
    data = NULL,
    stat = "identity",
    position = "identity",
    na.rm = FALSE,
    show.legend = NA,
    inherit.aes = FALSE,...) {
    GeomCustom <- ggproto(
        "GeomCustom",
        Geom,
        setup_data = function(self, data, params) {
            data <- ggproto_parent(Geom, self)$setup_data(data, params)
            data
        },
        
        draw_group = function(data, panel_scales, coord) {
            vp <- grid::viewport(x=data$x, y=data$y)
            g <- grid::editGrob(data$grob[[1]], vp=vp)
            ggplot2:::ggname("geom_spatial", g)
        },
        
        required_aes = c("grob","x","y")
        
    )
    
    layer(geom = GeomCustom,mapping = mapping,data = data,stat = stat,position=
                position,show.legend = show.legend,inherit.aes =
                inherit.aes,params = list(na.rm = na.rm, ...)
    )
}

Some spatial information for the breast cancer dataset is required.

sample_names <- c("BreastCancer")
image_paths <- c("spatial/tissue_lowres_image.png")
scalefactor_paths <- c("spatial/scalefactors_json.json")
tissue_paths <- c("spatial/tissue_positions_list.csv")

images_cl <- list()

for (i in 1:length(sample_names)) {
    images_cl[[i]] <- read.bitmap(image_paths[i])
}

height <- list()

for (i in 1:length(sample_names)) {
    height[[i]] <-  data.frame(height = nrow(images_cl[[i]]))
}

height <- bind_rows(height)

width <- list()

for (i in 1:length(sample_names)) {
    width[[i]] <- data.frame(width = ncol(images_cl[[i]]))
}

width <- bind_rows(width)

grobs <- list()
for (i in 1:length(sample_names)) {
    grobs[[i]]<-rasterGrob(images_cl[[i]],width=unit(1,"npc"),
                            height=unit(1,"npc"))
}

images_tibble <- tibble(sample=factor(sample_names), grob=grobs)
images_tibble$height <- height$height
images_tibble$width <- width$width


scales <- list()

for (i in 1:length(sample_names)) {
    scales[[i]] <- rjson::fromJSON(file = scalefactor_paths[i])
}

It is also helpful to adjust spot position by scale factor and format some of the tissue information.

bcs <- list()
for (i in 1:length(sample_names)) {
    bcs[[i]] <- read.csv(tissue_paths[i],col.names=c(
        "barcode","tissue","row","col","imagerow","imagecol"), header = FALSE)
    bcs[[i]]$imagerow <- bcs[[i]]$imagerow * scales[[i]]$tissue_lowres_scalef
    # scale tissue coordinates for lowres image
    bcs[[i]]$imagecol <- bcs[[i]]$imagecol * scales[[i]]$tissue_lowres_scalef
    bcs[[i]]$tissue <- as.factor(bcs[[i]]$tissue)
    bcs[[i]]$height <- height$height[i]
    bcs[[i]]$width <- width$width[i]
}
names(bcs) <- sample_names

Adding umi per spot, total genes per spot and merging the data

matrix <- list()
for (i in 1:length(sample_names)) {
    matrix[[i]] <- as.data.frame(t(as.matrix(counts_matrix)))
}
umi_sum <- list()
for (i in 1:length(sample_names)) {
    umi_sum[[i]] <- data.frame(barcode =  row.names(matrix[[i]]),
                                sum_umi = Matrix::rowSums(matrix[[i]]))
}
names(umi_sum) <- sample_names

umi_sum <- bind_rows(umi_sum, .id = "sample")
gene_sum <- list()

for (i in 1:length(sample_names)) {
    gene_sum[[i]] <- data.frame(barcode=row.names(
        matrix[[i]]),sum_gene=Matrix::rowSums(matrix[[i]] != 0))

}
names(gene_sum) <- sample_names
gene_sum <- bind_rows(gene_sum, .id = "sample")
bcs_merge <- bind_rows(bcs, .id = "sample")
bcs_merge <- merge(bcs_merge,umi_sum, by = c("barcode", "sample"))
bcs_merge <- merge(bcs_merge,gene_sum, by = c("barcode", "sample"))

Specifying a continuous scale and colors before plotting

myPalette <- function(numLevels) {
    return(colorRampPalette(c("blue","yellow"))(numLevels))}

Extracting top 3 genes …

gene_list <- c()
sp_genes <- SpaceMarkers$interacting_genes[[1]]
interactions <- unique(sp_genes$`Pattern_1 x Pattern_5`)
n_genes <- 3
for (g in 1:length(interactions)){
    
    df <- sp_genes %>% dplyr::filter(
        sp_genes$`Pattern_1 x Pattern_5` == interactions[g] & abs(
        sp_genes$Dunn.zP2_P1) > 1 )
    df <- df[!is.na(df$Gene),]
    valid_genes <- min(nrow(df),n_genes)
    print(paste0("Top ",valid_genes," genes for ",interactions[g]))
    print(df$Gene[1:valid_genes])
    gene_list <- c(gene_list,df$Gene[1:valid_genes])
    
}
## [1] "Top 3 genes for vsBoth"
## [1] "IGHE"  "APOC1" "TGM2" 
## [1] "Top 3 genes for vsPattern_1"
## [1] "DDX56"  "ZNF593" "ATXN2L"
## [1] "Top 3 genes for vsPattern_5"
## [1] "SUCLG2"  "NDUFA10" "FDFT1"

Visualize expression spatially

plots <- list()
# default size = 1.75, stroke = 0.5
for (g in gene_list){
    for (i in 1:length(sample_names)) {
        plots[[length(plots)+1]] <- bcs_merge %>%dplyr::filter(
            sample ==sample_names[i]) %>% bind_cols(as.data.table(
            matrix[i])[,g, with=FALSE]) %>% ggplot(aes_string(
                x='imagecol', y='imagerow', fill=g, alpha = g)) +geom_spatial(
                data=images_tibble[i,], aes(grob=grob), x=0.5, y=0.5)+
            geom_point(shape = 21, colour = "black", size = 1.1, stroke = 0.2)+
            coord_cartesian(expand=FALSE)+scale_fill_gradientn(
                colours = myPalette(100))+xlim(0,max(bcs_merge %>%dplyr::filter(
                    sample ==sample_names[i]) %>% select(width)))+ylim(max(
                        bcs_merge %>%dplyr::filter(sample ==sample_names[i])%>%
                            select(height)),0)+xlab("") +ylab("") + ggtitle(
                                sample_names[i])+
            theme_set(theme_bw(base_size = 10))+
            theme(
                panel.grid.major=element_blank(),
                panel.grid.minor = element_blank(),
                panel.background = element_blank(),
                axis.line = element_line(colour="black"),
                axis.text=element_blank(),axis.ticks = element_blank())
    }
}

This next block of code can visualize the gene expression in each region using box plots.

region <- SpaceMarkers$hotspots[,1]
region <- ifelse(!is.na(region)&!is.na(SpaceMarkers$hotspots[,2]),
                    "Interacting",ifelse(!is.na(region),region,
                                            SpaceMarkers$hotspots[,2]))
region <- factor(region, levels = c("Pattern_1","Interacting","Pattern_5"))
plist <- list()
mplot2 <- t(as.matrix(counts_matrix[,!is.na(region)]))
mplot2 <- as.data.frame(as.matrix(mplot2))
mplot2 <- cbind(mplot2,region = region[!is.na(region)])
for (ii in 1:length(gene_list)){
    plist[[ii]]<- mplot2 %>% ggplot( aes_string(x='region',y=gene_list[ii],
                                                fill='region'))+geom_boxplot()+
        scale_fill_viridis(discrete = TRUE,alpha=0.6)+
        geom_jitter(color="black",size=0.4,alpha=0.9)+theme_ipsum()+
        theme(legend.position="none",plot.title = element_text(size=11),
                axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
        ggtitle(paste0(gene_list[ii]," Expression (Log)")) + xlab("") 
}

5.3 SpaceMarkers Visualiztions

5.3.1 Interacting region (vsBoth)

This category compares the interacting region to both Pattern_1 and Pattern_5 exclusively

Below there are box plots and spatial heatmaps to help visualize the expression of individual genes across different patterns. The two main statistics used to help interpret the expression of genes across the patterns are the KW statistics/pvalue and the Dunn’s test. In this context the null hypothesis of the KW test is that the expression of a given gene across all of the spots is equal. The post hoc Dunn’s test identifies how statistically significant the difference in expression of the given gene is between two patterns. The Dunn’s test considers the differences between specific patterns and the KW test considers differences across all of the spots without considering the specific patterns.

5.3.1.1 Table of statistics

head(residual_p1_p5 %>% dplyr::filter(
    sp_genes$`Pattern_1 x Pattern_5` == "vsBoth"),n_genes)
##        Gene Pattern_1 x Pattern_5 KW.obs.tot KW.obs.groups KW.df KW.statistic
## IGHE   IGHE                vsBoth       2636             3     2     307.9678
## APOC1 APOC1                vsBoth       2636             3     2     373.3717
## TGM2   TGM2                vsBoth       2636             3     2     233.6329
##          KW.pvalue     KW.p.adj Dunn.zP1_Int Dunn.zP2_Int Dunn.zP2_P1
## IGHE  1.335467e-67 6.619269e-67    -16.13409    -9.082053   12.012334
## APOC1 8.382338e-82 4.550412e-81    -15.18015   -19.303620   -6.032674
## TGM2  1.850362e-51 7.812641e-51    -13.89534    -7.575519   10.742407
##       Dunn.pval_1_Int Dunn.pval_2_Int Dunn.pval_2_1 Dunn.pval_1_Int.adj
## IGHE     1.469666e-58    1.065463e-19  3.060820e-33        1.726857e-57
## APOC1    4.787159e-52    5.007523e-83  1.612681e-09        5.399916e-51
## TGM2     6.759774e-44    3.576944e-14  6.434476e-27        7.331755e-43
##       Dunn.pval_2_Int.adj Dunn.pval_2_1.adj SpaceMarkersMetric
## IGHE         5.365365e-19      2.538680e-32           132.2682
## APOC1        7.432218e-82      5.109844e-09           130.3603
## TGM2         1.461881e-13      4.652621e-26           127.7907

5.3.1.2 Visualizations

plot_grid(plotlist = list(plist[[1]],plots[[1]]))

plot_grid(plotlist = list(plist[[2]],plots[[2]]))

On the spatial heatmap, Pattern_1 takes up most of the top half of the spatial heatmap, followed by the interacting region along the diagonal and finally Pattern_5 in the bottom left corner. APOC1 is expressed across all patterns but is especially strong in the interacting pattern. IGHE is highly specific to the interacting pattern. The Dunn.pval_1_Int is lower for IGHE compared to APOC1 indicating more significance in the interacting region vs the other two regions. However, the overall SpaceMarkersMetric for APOC1 is higher and so it is ranked higher than IGHE.

plot_grid(plotlist = list(plist[[3]],plots[[3]]))

The spatial heatmap for TGM2 shows fairly high expression interacting region relative to Pattern_1 or Pattern_5 by itself similar but less so than IGHE. as well with high expression in the interacting pattern vs the other two patterns.

5.3.2 Pattern_5 Only vs Pattern_1 (including Pattern_1 in interacting region)

5.3.2.1 Table of statistics

head(sp_genes %>% dplyr::filter(
    sp_genes$`Pattern_1 x Pattern_5` == "vsPattern_1"),n_genes - 1 )
##          Gene Pattern_1 x Pattern_5 KW.obs.tot KW.obs.groups KW.df KW.statistic
## DDX56   DDX56           vsPattern_1       2636             3     2     7.846153
## ZNF593 ZNF593           vsPattern_1       2636             3     2     7.391805
##         KW.pvalue   KW.p.adj Dunn.zP1_Int Dunn.zP2_Int Dunn.zP2_P1
## DDX56  0.01978015 0.02625147    -2.746561    -1.910598    1.457552
## ZNF593 0.02482504 0.03179837    -2.690405    -1.957235    1.289664
##        Dunn.pval_1_Int Dunn.pval_2_Int Dunn.pval_2_1 Dunn.pval_1_Int.adj
## DDX56      0.006022365      0.05605628     0.1449639          0.01392055
## ZNF593     0.007136540      0.05031982     0.1971673          0.01610003
##        Dunn.pval_2_Int.adj Dunn.pval_2_1.adj SpaceMarkersMetric
## DDX56           0.09879920         0.2349416           105.4931
## ZNF593          0.09096275         0.3123662           105.3808

5.3.2.2 Visualizations

plot_grid(plotlist = list(plist[[4]],plots[[4]]))

plot_grid(plotlist = list(plist[[5]],plots[[5]]))

Unlike the previous three genes the KW pvalues and Dunn’s pvalues are relatively high so the distinction will not appear as pronounced on the expression map.

5.3.3 Pattern_1 Only vs Pattern_5 (including Pattern_5 in interacting region)

5.3.3.1 Table of statistics

head(sp_genes %>% dplyr::filter(
    sp_genes$`Pattern_1 x Pattern_5`=="vsPattern_5"),n_genes - 1)
##            Gene Pattern_1 x Pattern_5 KW.obs.tot KW.obs.groups KW.df
## SUCLG2   SUCLG2           vsPattern_5       2636             3     2
## NDUFA10 NDUFA10           vsPattern_5       2636             3     2
##         KW.statistic  KW.pvalue   KW.p.adj Dunn.zP1_Int Dunn.zP2_Int
## SUCLG2      7.327597 0.02563495 0.03247094    -1.250733    -2.415918
## NDUFA10     6.856006 0.03245169 0.03977949    -1.049682    -2.251478
##         Dunn.zP2_P1 Dunn.pval_1_Int Dunn.pval_2_Int Dunn.pval_2_1
## SUCLG2    -1.827044       0.2110318      0.01569560    0.06769319
## NDUFA10   -1.894130       0.2938643      0.02435527    0.05820773
##         Dunn.pval_1_Int.adj Dunn.pval_2_Int.adj Dunn.pval_2_1.adj
## SUCLG2            0.3287899          0.03207361         0.1163993
## NDUFA10           0.4455363          0.04802927         0.1013246
##         SpaceMarkersMetric
## SUCLG2            102.5015
## NDUFA10           102.0994

5.3.3.2 Visualizations

plot_grid(plotlist = list(plist[[6]],plots[[6]]))

plot_grid(plotlist = list(plist[[7]],plots[[7]]))

The expression of these genes across interaction is harder to distinguish on either plot. Although, they pass the initial KW-test, the post-hoc Dunn’s test p-values are high. This lack of distinction highlights the fact that SpaceMarkers performs best for identifying interactions for two latent individual latent Patterns interacting with each other. In other words, the vsBoth genes are ranked highest in the list, while the vsPattern_1 and vsPattern_5 genes, which look at an individual Pattern interaction with another already interacting region, are ranked lower in the SpaceMarkers list.

6 Removing Directories

unlink(basename(sp_url))
unlink("spatial", recursive = TRUE)
files <- list.files(".")[grepl(basename(counts_url),list.files("."))]
unlink(files)

7 References

Appendix

Deshpande, Atul, et al. “Uncovering the spatial landscape of molecular interactions within the tumor microenvironment through latent spaces.” Cell Systems 14.4 (2023): 285-301.

“Space Ranger.” Secondary Analysis in R -Software -Spatial Gene Expression - Official 10x Genomics Support, support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/rkit. Accessed 22 Dec. 2023.

1 load10XExpr() Arguments

Argument Description
visiumDir A string path to the h5 file with expression information
h5filename A string of the name of the h5 file in the directory

2 load10XCoords() Arguments

Argument Description
visiumDir A path to the location of the the spatial coordinates folder.
resolution String values to look for in the .json object;lowres or highres.

3 getSpatialParameters() Arguments

Argument Description
spPatterns A data frame of spatial coordinates and patterns.

4 getInteractingGenes() Arguments

Argument Description
data An expression matrix of genes and columns being the samples.
reconstruction Latent feature matrix. NULL if ‘DE’ mode is specified
optParams A matrix of sigmaOpts (width) and the thresOpt (outlierthreshold)
spPatterns A data frame that contains of spatial coordinates and patterns.
refPattern A string of the reference pattern for comparison to other patterns
mode A string specifying either ‘residual’ or ‘DE’ mode.
minOverlap A value that specifies the minimum pattern overlap. 50 is the default
hotspotRegions A vector of patterns to compare to the ‘refPattern’
analysis A string specifying the type of analysis
sessionInfo()
## R Under development (unstable) (2024-01-16 r85808)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ggplot2_3.4.4       hrbrthemes_0.8.0    viridis_0.6.5      
##  [4] viridisLite_0.4.2   data.table_1.15.0   dplyr_1.1.4        
##  [7] readbitmap_0.1.5    RColorBrewer_1.1-3  cowplot_1.1.3      
## [10] rjson_0.2.21        Matrix_1.6-5        SpaceMarkers_0.99.6
## [13] BiocStyle_2.31.0   
## 
## loaded via a namespace (and not attached):
##  [1] deldir_2.0-2            gridExtra_2.3           rlang_1.1.3            
##  [4] magrittr_2.0.3          matrixStats_1.2.0       compiler_4.4.0         
##  [7] spatstat.geom_3.2-8     systemfonts_1.0.5       png_0.1-8              
## [10] vctrs_0.6.5             reshape2_1.4.4          hdf5r_1.3.9            
## [13] stringr_1.5.1           httpcode_0.3.0          pkgconfig_2.0.3        
## [16] crayon_1.5.2            fastmap_1.1.1           magick_2.8.2           
## [19] ellipsis_0.3.2          backports_1.4.1         labeling_0.4.3         
## [22] utf8_1.2.4              promises_1.2.1          rmarkdown_2.25         
## [25] purrr_1.0.2             bit_4.0.5               xfun_0.42              
## [28] cachem_1.0.8            jsonlite_1.8.8          goftest_1.2-3          
## [31] highr_0.10              later_1.3.2             matrixTests_0.2.3      
## [34] spatstat.utils_3.0-4    jpeg_0.1-10             tiff_0.1-12            
## [37] broom_1.0.5             parallel_4.4.0          R6_2.5.1               
## [40] bslib_0.6.1             stringi_1.8.3           spatstat.data_3.0-4    
## [43] car_3.1-2               extrafontdb_1.0         jquerylib_0.1.4        
## [46] Rcpp_1.0.12             bookdown_0.37           knitr_1.45             
## [49] tensor_1.5              extrafont_0.19          httpuv_1.6.14          
## [52] splines_4.4.0           tidyselect_1.2.0        qvalue_2.35.0          
## [55] abind_1.4-5             yaml_2.3.8              spatstat.random_3.2-2  
## [58] spatstat.explore_3.2-6  curl_5.2.0              lattice_0.22-5         
## [61] tibble_3.2.1            plyr_1.8.9              withr_3.0.0            
## [64] shiny_1.8.0             evaluate_0.23           polyclip_1.10-6        
## [67] pillar_1.9.0            BiocManager_1.30.22     carData_3.0-5          
## [70] generics_0.1.3          munsell_0.5.0           scales_1.3.0           
## [73] xtable_1.8-4            glue_1.7.0              gdtools_0.3.5          
## [76] tools_4.4.0             gfonts_0.2.0            bmp_0.3                
## [79] tidyr_1.3.1             ape_5.7-1               Rttf2pt1_1.3.12        
## [82] colorspace_2.1-0        nlme_3.1-164            cli_3.6.2              
## [85] spatstat.sparse_3.0-3   fontBitstreamVera_0.1.1 fansi_1.0.6            
## [88] gtable_0.3.4            rstatix_0.7.2           sass_0.4.8             
## [91] digest_0.6.34           fontquiver_0.2.1        crul_1.4.0             
## [94] farver_2.1.1            htmltools_0.5.7         lifecycle_1.0.4        
## [97] mime_0.12               fontLiberation_0.1.0    bit64_4.0.5