--- title: "Accessing IMC datasets" date: "Created: Nov 02, 2020; Compiled: `r BiocStyle::doc_date()`" package: "`r BiocStyle::pkg_ver('imcdatasets')`" author: - name: Nicolas Damond affiliation: Department for Quantitative Biomedicine, University of Zurich email: nicolas.damond@dqbm.uzh.ch - name: Nils Eling affiliation: Department for Quantitative Biomedicine, University of Zurich email: nils.eling@dqbm.uzh.ch output: BiocStyle::html_document: toc_float: yes bibliography: "`r system.file('scripts', 'ref.bib', package='imcdatasets')`" vignette: > %\VignetteIndexEntry{"Accessing IMC datasets"} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r style, echo=FALSE} knitr::opts_chunk$set(error=FALSE, warning=FALSE, message=FALSE, fig.retina = 0.75) library(BiocStyle) ``` ```{r library, echo=FALSE} library(cytomapper) library(SingleCellExperiment) library(imcdatasets) ``` # Introduction The `r Biocpkg("imcdatasets")` package provides access to publicly available datasets generated using imaging mass cytometry (IMC) [@giesen2014imc]. IMC is a technology that enables measurement of up to 50 markers from tissue sections at a resolution of 1 $\mu m$ @giesen2014imc. In classical processing pipelines, such as the [ImcSegmentationPipeline](https://github.com/BodenmillerGroup/ImcSegmentationPipeline), the multichannel images are segmented to generate cells masks. These masks are then used to extract single cell features from the multichannel images. Each dataset in `imcdatasets` is composed of three elements that can be retrieved separately: 1. Single-cell data in the form of a `SingleCellExperiment` class object (named `XYZ_sce.rds`). 2. Multichannel images in the form of a `CytoImageList` class object (named `XYZ_images.rds`). 3. Cell segmentation masks in the form of a `CytoImageList` class object (named `XYZ_masks.rds`). Here, `XYZ` refers to the name of the dataset. # Available datasets The `listDatasets()` function returns all available datasets in `imcdatasets`, along with associated information. The `FunctionCall` column gives the name of the R function that enables to load the dataset. ```{r list-datasets} datasets <- listDatasets() datasets <- as.data.frame(datasets) datasets$FunctionCall <- sprintf("`%s`", datasets$FunctionCall) knitr::kable(datasets) ``` # Retrieving data Users can import the datasets by calling a single function and specifying the type of data to retrieve. The following examples highlight accessing the dataset provided by _Damond, N. et al., A Map of Human Type 1 Diabetes Progression by Imaging Mass Cytometry_ [@damond2019pancreas]. __Importing single-cell expression data and metadata__ ```{r import-dataset} sce <- DamondPancreas2019Data(data_type = "sce") sce ``` __Importing multichannel images__ ```{r import-images} images <- DamondPancreas2019Data(data_type = "images") images ``` __Importing cell segmentation masks__ ```{r import-masks} masks <- DamondPancreas2019Data(data_type = "masks") masks ``` __On disk storage__ Objects containing multi-channel images and segmentation masks can furthermore be stored on disk rather than in memory. Nevertheless, they need to be loaded into memory once before writing them to disk. This process takes longer than keeping them in memory but reduces memory requirements during downstream analysis. To write images or masks to disk, set `on_disk = TRUE` and specify a path where images/masks will be stored as .h5 files: ```{r on_disk} # Create temporary location cur_path <- tempdir() masks <- DamondPancreas2019Data(data_type = "masks", on_disk = TRUE, h5FilesPath = cur_path) masks ``` # Dataset info and metadata Additional information about each dataset is available in the help page: ```{r function-help} ?DamondPancreas2019Data ``` The metadata associated with a specific data object can be displayed as follows: ```{r access-metadata, eval = FALSE} DamondPancreas2019_sce(metadata = TRUE) DamondPancreas2019_images(metadata = TRUE) DamondPancreas2019_masks(metadata = TRUE) ``` Note: the onLoad functions will be deprecated. In future releases (Bioconductor >= 3.16), please use the wrapper functions: ```{r access-metadata-future-releases, eval = FALSE} DamondPancreas2019Data(data_type = "sce", metadata = TRUE) DamondPancreas2019Data(data_type = "images", metadata = TRUE) DamondPancreas2019Data(data_type = "masks", metadata = TRUE) ``` # Usage The `SingleCellExperiment` class objects can be used for data analysis. For more information, please refer to the `r Biocpkg("SingleCellExperiment")` package and to the [Orchestrating Single-Cell Analysis with Bioconductor](http://bioconductor.org/books/release/OSCA/) workflow. The `CytoImageList` class objects can be used for plotting cell and pixel information. Some typical use cases are given below. For more information, please see the `r Biocpkg("cytomapper")` package and the [associated vignette](https://www.bioconductor.org/packages/devel/bioc/vignettes/cytomapper/inst/doc/cytomapper.html). __Subsetting the images and masks__ ```{r usage-subset} cur_images <- images[1:5] cur_masks <- masks[1:5] ``` __Plotting pixel information__ The `images` objects can be used to display pixel-level data. ```{r usage-pixel} plotPixels( cur_images, colour_by = c("CDH", "CD99", "H3"), bcg = list( CD99 = c(0,2,1), CDH = c(0,8,1), H3 = c(0,5,1) ) ) ``` __Plotting cell information__ The `masks` and `sce` objects can be combined to display cell-level data. ```{r usage-cell} plotCells( cur_masks, object = sce, img_id = "ImageNumber", cell_id = "CellNumber", colour_by = c("CD8a", "PIN"), exprs_values = "exprs" ) ``` __Outlining cells on images__ Cell information can be displayed on top of images by combining the `images`, `masks` and `sce` objects. ```{r usage-outline} plotPixels( cur_images, mask = cur_masks, object = sce, img_id = "ImageNumber", cell_id = "CellNumber", outline_by = "CellType", colour_by = c("H3", "CD99", "CDH"), bcg = list( CD99 = c(0,2,1), CDH = c(0,8,1), H3 = c(0,5,1) ) ) ``` # Session info {.unnumbered} ```{r sessionInfo, echo=FALSE} sessionInfo() ``` # References