1 Introduction
2 Exploring available datasets
- 2.1 Searching by year
3 Getting datasets
- 3.1 Example: Returning all datasets with cell-type labels
- 3.2 Example: Returning all datasets with cell-type labels and cell-type gene signatures
4 Saving Data
5 Session Information

library(TMExplorer)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr

1 Introduction

TMExplorer (Tumour Microenvironment Explorer) is a curated collection of scRNAseq datasets sequenced from tumours. It aims to provide a single point of entry for users looking to study the tumour microenvironment at the single-cell level.

Users can quickly search available datasets using the metadata table, and then download the datasets they are interested in for analysis. Optionally, users can save the datasets for use in applications other than R.

This package will improve the ease of studying the tumour microenvironment with single-cell sequencing. Developers may use this package to obtain data for validation of new algorithms and researchers interested in the tumour microenvironment may use it to study specific cancers more closely.

2 Exploring available datasets

Start by exploring the available datasets through metadata.

res = queryTME(metadata_only = TRUE)

Reference	accession	author	journal	year
Patel_Science_2014	GSE57872	Patel	Science	2014
Tirosh_Science_2016a	GSE72056	Tirosh	Science	2016
Tirosh_Nature_ 2016b	GSE70630	Tirosh	Nature	2016
Venteicher_Science_2017	GSE89567	Venteicher	Science	2017
Li_Nature_Genetics_2017	GSE81861	Li	Nature Genetics	2017
Chung_Nature_Commun_2017	GSE75688	Chung	Nature Comm	2017

This will return a list containing a single dataframe of metadata for all available datasets. View the metadata with View(res[[1]]) and then check ?queryTME for a description of searchable fields.

Note: in order to keep the function’s interface consistent, queryTME always returns a list of objects, even if there is only one object. You may prefer running res = queryTME(metadata_only = TRUE)[[1]] in order to save the dataframe directly.

The metatadata_only argument can be applied alongside any other argument in order to examine only datasets that have certain qualities. You can, for instance, view only breast cancer datasets by using

res = queryTME(tumour_type = 'Breast cancer', metadata_only = TRUE)[[1]]

Reference	accession	author	journal	year
Chung_Nature_Commun_2017	GSE75688	Chung	Nature Comm	2017
Jordan_Nature_2016	GSE75367	Jordan	Nature	2016
Azizi_Cell_2018	GSE114727	Azizi	Cell	2018
Yeo_Elife_2020	GSE123366	Yeo	Elife	2020

Table 1: Search parameters for `queryTME` alongside example values.
Search Parameter	Description	Examples
geo_accession	Search by GEO accession number	GSE72056, GSE57872
score_type	Search by type of score shown in $expression	TPM, RPKM, FPKM
has_signatures	Filter by presence of cell-type gene signatures	TRUE, FALSE
has_truth	Filter by presence of cell-type labels	TRUE, FALSE
tumour_type	Search by tumour type	Breast cancer, Melanoma
author	Search by first author	Patel, Tirosh, Chung
journal	Search by publication journal	Science, Nature, Cell
year	Search by year of publication	<2015, >2015, 2013-2015
pmid	Search by publication ID	24925914, 27124452
sequence_tech	Search by sequencing technology	SMART-seq, Fluidigm C1
organism	Search by source organism	Human, Mice
sparse	Return expression in sparse matrices	TRUE, FALSE

2.1 Searching by year

In order to search by single years and a range of years, the package looks for specific patterns. ‘2013-2015’ will search for datasets published between 2013 and 2015, inclusive. ‘<2015’ or ‘2015>’ will search for datasets published before or in 2015. ‘>2015’ or ‘2015<’ will search for datasets published in or after 2015.

3 Getting datasets

Once you’ve found a field to search on, you can get your data. For this example, we’re pulling a specific dataset by its GEO ID.

res = queryTME(geo_accession = "GSE81861")

This will return a list containing dataset GSE72056. The dataset is stored as a SingleCellExperiment object, which has the following metadata list

Table 2: Metadata attributes in the `SingleCellExperiment` object.
Attribute	Description
signatures	A `data.frame` containing the cell types and a list of genes that represent that cell type
cells	A list of cells included in the study
genes	A list of genes included in the study
pmid	The PubMed ID of the study
technology	The sequencing technology used
score_type	The type of score shown in `tme_data$expression`
organism	The type of organism from which cells were sequenced
author	The first author of the paper presenting the data
tumour_type	The type of tumour sequenced
patients	The number of patients included in the study
tumours	The number of tumours sampled by the study
geo_accession	The GEO accession ID for the dataset

To access the expression data for a result, use

View(counts(res[[1]]))

	RHC3546__Tcell__.C6E879	RHC3552__Epithelial__.2749FE
chrX:99883666-99894988_TSPAN6_ENSG00000000003.10	3	0
chrX:99839798-99854882_TNMD_ENSG00000000005.5	0	0
chr20:49505584-49575092_DPM1_ENSG00000000419.8	0	0
chr1:169631244-169863408_SCYL3_ENSG00000000457.9	0	0
chr1:169631244-169863408_C1orf112_ENSG00000000460.12	0	0
chr1:27938574-27961788_FGR_ENSG00000000938.8	0	0

Cell type labels are stored under colData(res[[1]]) for datasets for which cell type labels are available

Metadata is stored in a named list accessible by metadata(res[[1]]). Specific entries can be accessed by attribute name.

metadata(res[[1]])$pmid
#> # A tibble: 1 × 1
#>       PMID
#>      <dbl>
#> 1 28319088

3.1 Example: Returning all datasets with cell-type labels

Say you want to measure the performance of cell-type classification methods. To do this, you need datasets that have the true cell-types available.

res = queryTME(has_truth = TRUE)

This will return a list of all datasets that have true cell-types available. You can see the cell types for the first dataset using the following command:

View(colData(res[[1]]))

	label
RHC3546__Tcell__.C6E879	Tcell
RHC3552__Epithelial__.2749FE	Epithelial
RHC3553__Epithelial__.2749FE	Epithelial
RHC3555__Bcell__.7DEA7B	Bcell
RHC3556__Epithelial__.2749FE	Epithelial
RHC3557__Bcell__.7DEA7B	Bcell

The first column of this dataframe contains the cell barcode, and the second contains the cell type.

3.2 Example: Returning all datasets with cell-type labels and cell-type gene signatures

Some cell-type classification methods require a list of gene signatures, to return only datasets that have cell-type gene signatures available, use:

res = queryTME(has_truth = TRUE, has_signatures = TRUE)
View(metadata(res[[1]])$signatures)

MYELOID	FIBROBLAST	TCELL
ITGAX_ENSG00000140678.12	SPARC_ENSG00000113140.6	TRBC2_ENSG00000211772.4
CD68_ENSG00000129226.9	COL14A1_ENSG00000187955.7	TRBC2_ENSG00000211772.4
CD14_ENSG00000170458.9	COL13A1_ENSG00000197467.9	CD3E_ENSG00000198851.5
CCL3_ENSG00000006075.11	DCN_ENSG00000011465.12	CD3G_ENSG00000160654.5

4 Saving Data

To facilitate the use of any or all datasets outside of R, you can use saveTME(). saveTME takes two parameters, one a tme_data object to be saved, and the other the directory you would like data to be saved in. Note that the output directory should not already exist.

To save the data from the earlier example to disk, use the following commands.

res = queryTME(geo_accession = "GSE72056")[[1]]
saveTME(res, '~/Downloads/GSE72056')

The result is three CSV files that can be used in other programs. In the future we will support saving in other formats.

5 Session Information

sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] TMExplorer_1.8.0            BiocFileCache_2.6.0        
#>  [3] dbplyr_2.2.1                SingleCellExperiment_1.20.0
#>  [5] SummarizedExperiment_1.28.0 Biobase_2.58.0             
#>  [7] GenomicRanges_1.50.0        GenomeInfoDb_1.34.0        
#>  [9] IRanges_2.32.0              S4Vectors_0.36.0           
#> [11] BiocGenerics_0.44.0         MatrixGenerics_1.10.0      
#> [13] matrixStats_0.62.0          BiocStyle_2.26.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.9             lattice_0.20-45        assertthat_0.2.1      
#>  [4] digest_0.6.30          utf8_1.2.2             R6_2.5.1              
#>  [7] RSQLite_2.2.18         evaluate_0.17          httr_1.4.4            
#> [10] highr_0.9              pillar_1.8.1           zlibbioc_1.44.0       
#> [13] rlang_1.0.6            curl_4.3.3             jquerylib_0.1.4       
#> [16] blob_1.2.3             Matrix_1.5-1           rmarkdown_2.17        
#> [19] stringr_1.4.1          RCurl_1.98-1.9         bit_4.0.4             
#> [22] DelayedArray_0.24.0    compiler_4.2.1         xfun_0.34             
#> [25] pkgconfig_2.0.3        htmltools_0.5.3        tidyselect_1.2.0      
#> [28] tibble_3.1.8           GenomeInfoDbData_1.2.9 bookdown_0.29         
#> [31] fansi_1.0.3            dplyr_1.0.10           bitops_1.0-7          
#> [34] rappdirs_0.3.3         grid_4.2.1             jsonlite_1.8.3        
#> [37] lifecycle_1.0.3        DBI_1.1.3              magrittr_2.0.3        
#> [40] cli_3.4.1              stringi_1.7.8          cachem_1.0.6          
#> [43] XVector_0.38.0         bslib_0.4.0            filelock_1.0.2        
#> [46] generics_0.1.3         vctrs_0.5.0            tools_4.2.1           
#> [49] bit64_4.0.5            glue_1.6.2             purrr_0.3.5           
#> [52] fastmap_1.1.0          yaml_2.3.6             BiocManager_1.30.19   
#> [55] memoise_2.0.1          knitr_1.40             sass_0.4.2

TMExplorer

3 November 2022

Package

Contents