--- title: "VectraPolarisData" author: - name: Julia Wrobel and Tusharkanti Ghosh affiliation: Department of Biostatistics and Informatics, Colorado School of Public Health output: BiocStyle::html_document: toc_float: true package: VectraPolarisData abstract: | The VectraPolarisData ExperimentHub package provides two large multiplex immunofluorescence datasets collected by Akoya Biosciences Vectra 3 and Vectra Polaris platforms. Image preprocessing (cell segmentation and phenotyping) was performed using Inform software. Data cover are formatted into objects of class SpatialExperiment. vignette: | %\VignetteIndexEntry{VectraPolarisData} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Loading the data To retrieve a dataset, we can use a dataset's corresponding named function `()`, where `` should correspond to one a valid dataset identifier (see `?VectraPolarisData`). Below both the lung and ovarian cancer datasets are loaded this way. ```{r, message = FALSE, warning = FALSE} library(VectraPolarisData) spe_lung <- HumanLungCancerV3() spe_ovarian <- HumanOvarianCancerVP() ``` Alternatively, data can loaded directly from Bioconductor's `r Biocpkg("ExperimentHub")` as follows. First, we initialize a hub instance and store the complete list of records in a variable `eh`. Using `query()`, we then identify any records made available by the `VectraPolarisData` package, as well as their accession IDs (EH7311 for the lung cancer data). Finally, we can load the data into R via `eh[[id]]`, where `id` corresponds to the data entry's identifier we'd like to load. E.g.: ```{r, eval = FALSE} library(ExperimentHub) eh <- ExperimentHub() # initialize hub instance q <- query(eh, "VectraPolarisData") # retrieve 'VectraPolarisData' records id <- q$ah_id[1] # specify dataset ID to load spe <- eh[[id]] # load specified dataset ``` # Data Representation Both the `HumanLungCancerV3()` and `HumanOvarianCancerVP()` datasets are stored as `SpatialExperiment` objects. This allows users of our data to interact with methods built for `SingleCellExperiment`, `SummarizedExperiment`, and `SpatialExperiment` class methods in Bioconductor. See [this ebook](https://lmweber.org/OSTA-book/index.html#welcome) for more details on `SpatialExperiment`. To get cell level tabular data that can be stored in this format, raw multiplex.tiff images have been preprocessed, segmented and cell phenotyped using [`Inform`](https://www.akoyabio.com/phenoimager/software/inform-tissue-finder/) software from Akoya Biosciences. The `SpatialExperiment` class was originally built for spatial transcriptomics data and follows the structure depicted in the schematic below (Righelli et al. 2021): ```{r, echo=FALSE} # image from Righelli et al. 2021 url <- "https://lmweber.org/OSTA-book/images/SPE.png" ```
To adapt this class structure for multiplex imaging data we use slots in the following way: * `assays` slot: `intensities`, `nucleus_intensities`, `membrane_intensities` * `sample_id` slot: contains image identifier. For the VectraOvarianDataVP this also identifies the subject because there is one image per subject * `colData` slot: Other cell-level characteristics of the marker intensities, cell phenotypes, cell shape characteristics * `spatialCoordsNames` slot: The `x-` and `y-` coordinates describing the location of the center point in the image for each cell * `metadata` slot: A dataframe of subject-level patient clinical characteristics. # Transforming to other data formats The following code shows how to transform the `SpatialExperiment` class object to a `data.frame` class object, if that is preferred for analysis. The example below is shown using the `HumanOvarianVP` dataset. ```{r, message = FALSE} library(dplyr) ## Assays slots assays_slot <- assays(spe_ovarian) intensities_df <- assays_slot$intensities rownames(intensities_df) <- paste0("total_", rownames(intensities_df)) nucleus_intensities_df<- assays_slot$nucleus_intensities rownames(nucleus_intensities_df) <- paste0("nucleus_", rownames(nucleus_intensities_df)) membrane_intensities_df<- assays_slot$membrane_intensities rownames(membrane_intensities_df) <- paste0("membrane_", rownames(membrane_intensities_df)) # colData and spatialData colData_df <- colData(spe_ovarian) spatialCoords_df <- spatialCoords(spe_ovarian) # clinical data patient_level_df <- metadata(spe_ovarian)$clinical_data cell_level_df <- as.data.frame(cbind(colData_df, spatialCoords_df, t(intensities_df), t(nucleus_intensities_df), t(membrane_intensities_df)) ) ovarian_df <- full_join(patient_level_df, cell_level_df, by = "sample_id") ``` # Citation Info The objects provided in this package are rich data sources we encourage others to use in their own analyses. If you do include them in your peer-reviewed work, we ask that you cite our package and the original studies. To cite the `VectraPolarisData` package, use: ```{} @Manual{VectraPolarisData, title = {VectraPolarisData: Vectra Polaris and Vectra 3 multiplex single-cell imaging data}, author = {Wrobel, J and Ghosh, T}, year = {2022}, note = {Bioconductor R package version 1.0}, } ``` To cite the `HumanLungCancerV3()` data in `bibtex` format, use: ```{} @article{johnson2021cancer, title={Cancer cell-specific MHCII expression as a determinant of the immune infiltrate organization and function in the non-small cell lung cancer tumor microenvironment.}, author={Johnson, AM and Boland, JM and Wrobel, J and Klezcko, EK and Weiser-Evans, M and Hopp, K and Heasley, L and Clambey, ET and Jordan, K and Nemenoff, RA and others}, journal={Journal of Thoracic Oncology: Official Publication of the International Association for the Study of Lung Cancer}, year={2021} } ``` To cite the `HumanOvarianCancerVP()` data, use: ```{} @article{steinhart2021spatial, title={The spatial context of tumor-infiltrating immune cells associates with improved ovarian cancer survival}, author={Steinhart, Benjamin and Jordan, Kimberly R and Bapat, Jaidev and Post, Miriam D and Brubaker, Lindsay W and Bitler, Benjamin G and Wrobel, Julia}, journal={Molecular Cancer Research}, volume={19}, number={12}, pages={1973--1979}, year={2021}, publisher={AACR} } ``` # Data Dictionaries Detailed tables representing what is provided in each dataset are listed here ## HumanLungCancerV3 In the table below note the following shorthand: * `[marker]` represents one of: `cd3`, `cd8`, `cd14`, `cd19`, `cd68`, `ck`, `dapi`, `hladr`, * `[cell region]` represents one of: entire_cell, membrane, nucleus **Table 1: data dictionary for HumanLungCancerV3**
Variable Slot Description Variable coding
[marker] assays: intensities mean total cell intensity for [marker]  
[marker] assays: nucleus_intensities mean nucleus intensity for [marker]
[marker] assays: membrane_intensities mean membrane intensity for [marker]
sample_id image identifier, also subject id for the ovarian data
cell_id colData







cell identifier
slide_id slide identifier, also the patient id for the lung data
tissue category type of tissue (indicates a region of the image) Stroma or Tumor
[cell region]_[marker]_min min [cell region] intensity for [marker]
[cell region]_[marker]_max max [cell region] intensity for [marker]
[cell region]_[marker]_std_dev [cell region] std dev of intensity for [marker]
[cell region]_[marker]_total total [cell region] intensity for [marker]
[cell region]_area_square_microns [cell region] area in square microns
[cell region]_compactness [cell region] compactness
[cell region]_minor_axis [cell region] length of minor axis
[cell region]_major_axis [cell region] length of major axis
[cell region]_axis_ratio [cell region] ratio of major and minor axis
phenotype_[marker] cell phenotype label as determined by Inform software
cell_x_position spatialCoordsNames cell x coordinate
cell_y_position cell y coordinate
gender metadata gender "M", "F"
mhcII_status MHCII status, from Johnson et.al. 2021 "low", "high"
age_at_diagnosis age at diagnosis
stage_at_diagnosis stage of the cancer when image was collected
stage_numeric numeric version of stage variable
pack_years pack-years of cigarette smoking
survival_days time in days from date of diagnosis to date of death or censoring event
survival_status did the participant pass away? 0 = no, 1 = yes
cause_of_death cause of death
recurrence_or_lung_ca_death did the participant have a recurrence or death event? 0 = no, 1 = yes
time_to_recurrence_days time in days from date of diagnosis to first recurrent event
adjuvant_therapy whether or not the participant received adjuvant therapy "No", "Yes"
## HumanOvarianCancerVP In the table below note the following shorthand: * `[marker]` represents one of: `cd3`, `cd8`, `cd19`, `cd68`, `ck`, `dapi`, `ier3`, `ki67`, `pstat3` * `[cell region]` represents one of: cytoplasm, membrane, nucleus **Table 2: data dictionary for HumanOvarianCancerVP**
Variable Slot Description Variable coding
[marker] assays: intensities mean total cell intensity for [marker]  
[marker] assays: nucleus_intensities mean nucleus intensity for [marker]
[marker] assays: membrane_intensities mean membrane intensity for [marker]
sample_id image identifier, also subject id for the ovarian data
cell_id colData







cell identifier
slide_id slide identifier
tissue category type of tissue (indicates a region of the image) Stroma or Tumor
[cell region]_[marker]_min min [cell region] intensity for [marker]
[cell region]_[marker]_max max [cell region] intensity for [marker]
[cell region]_[marker]_std_dev [cell region] std dev of intensity for [marker]
[cell region]_[marker]_total total [cell region] intensity for [marker]
[cell region]_area_square_microns [cell region] area in square microns
[cell region]_compactness [cell region] compactness
[cell region]_minor_axis [cell region] length of minor axis
[cell region]_major_axis [cell region] length of major axis
[cell region]_axis_ratio [cell region] ratio of major and minor axis
cell_x_position spatialCoordsNames cell x coordinate
cell_y_position cell y coordinate
diagnosis metadata
primary primary tumor from initial diagnosis? 0 = no, 1 = yes
recurrent tumor from a recurrent event (not initial diagnosis tumor)? 0 = no, 1 = yes
treatment_effect was tumor treated with chemo prior to imaging? 0 = no, 1 = yes
stage stage of the cancer when image was collected I,II,II,IV
grade grade of cancer severity (nearly all 3)
survival_time time in months from date of diagnosis to date of death or censoring event
death did the participant pass away? 0 = no, 1 = yes
BRCA_mutation does the participant have a BRCA mutation? 0 = no, 1 = yes
age_at_diagnosis age at diagnosis
time_to_recurrence time in months from date of diagnosis to first recurrent event
parpi_inhibitor whether or not the participant received PARPi inhibitor N = no, Y = yes
debulking subjective rating of how the tumor removal process went optimal, suboptimal, interval
**Note**: the `debulking` variable described as `optimal` if surgeon believes tumor area was reduced to 1 cm or below; `suboptimal` if surgeon was unable to remove significant amount of tumor due to various reasons; `interval` if tumor removal came after three cycles of chemo # Session Info ```{r} sessionInfo() ```