--- title: | | Multi-sample multi-group | scRNA-seq data author: "Helena L. Crowell" date: "`r doc_date()`" vignette: > %\VignetteIndexEntry{Multi-sample multi-group scRNA-seq data} %\VignetteEngine{knitr::rmarkdown} output: BiocStyle::html_document bibliography: refs.bib --- # Overview The `muscData` package contains a set of publicly available single-cell RNA sequencing (scRNA-seq) datasets with complex experimental designs, i.e., datasets that contain multiple samples (e.g., individuals) measured across multiple experimental conditions (e.g., treatments), formatted into `SingleCellExperiment` (SCE) Bioconductor objects. Data objects are hosted through Bioconductor's ExperimentHub web resource. # Available datasets The table below gives an overview of currently available datasets, including a unique identifier (ID) that can be used to load the data (see next section), a brief description, the original data source, and a reference. Dataset descriptions may also be viewed from within R via `?ID` (e.g., `?Kang18_8vs8`). ID | Description | Availability | Reference ---|-------------|--------------|---------- `Kang18_8vs8` | 10x droplet-based scRNA-seq PBMC data from 8 Lupus patients before and after 6h-treatment with INF-beta (16 samples in total) | Gene Expression Ombnibus (GEO) accession [GSE96583](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96583) | @Kang2018 `Crowell19_4vs4` | Single-nuclei RNA-seq data of 8 CD-1 male mice, split into 2 groups with 4 animals each: vehicle and peripherally lipopolysaccharaide (LPS) treated mice | Figshare [DOI:10.6084/m9.figshare.8976473.v1](https://doi.org/10.6084/m9.figshare.8976473.v1) | @Crowell2019-muscat # Loading the data All datasets available within `muscData` may be loaded either via named functions that directly reffer to the object names, or by using the `ExperimentHub` interface. Both methods are demonstrated below. ## Via accessor functions The datasets listed above may be loaded into R by their ID. All provided SCEs contain unfiltered raw counts in their `assay` slot, and any available gene and cell metadata in the `rowData` and `colData` slots, respectively. ```{r, message = FALSE} library(muscData) Kang18_8vs8() ``` ## Via `ExperimentHub` Besides using an accession function as demonstrated above, we can browse ExperimentHub records (using `query`) or package specific records (using `listResources`), and then load the data of interest. The key differences between these approaches is that `query` will search all of ExperimentHub, while `listResources` facilitate data discovery within the specified package (here, `muscData`). ### Using `query` We first initialize a Hub instance to search for and load available data with the `ExperimentHub` function, and store the complete list of >2000 records in a variable `eh`. Using `query`, we then identify any records made available by `muscData`, as well as their accession IDs (EH1234). Finally, we can load the data into R via `eh[[id]]`. ```{r message = FALSE} # create Hub instance library(ExperimentHub) eh <- ExperimentHub() (q <- query(eh, "muscData")) ``` ```{r message = FALSE, results = 'hide'} # load data via accession ID eh[["EH2259"]] ``` ### Using `list/loadResources` Alternatively, available records may be viewed via `listResources`. To then load a specific dataset or subset thereof using `loadResources`, we require a character vector of metadata search terms to filter by. Available metadata can accessed from the ExperimentHub records found by `query` via `mcols()`, or viewed using the accessors shown above with option `metadata = TRUE`. In the example below, we use `"PMBC"` and `"INF-beta"` to select the `Kang18_8vs8` dataset. However, note that any metadata keyword(s) that uniquely identify the data of interest could be used (e.g., `"Lupus"` or `"GSE96583"`). ```{r message = FALSE} listResources(eh, "muscData") ``` ```{r message = FALSE, results = 'hide'} # view metadata mcols(q) Kang18_8vs8(metadata = TRUE) # load data using metadata search terms loadResources(eh, "muscData", c("PBMC", "INF-beta")) ``` # Exploring the data The `r Biocpkg("scater")` [@McCarthy2017] package provides an easy-to-use set of visualization tools for scRNA-seq data. For interactive visualization, we recommend the `r Biocpkg("iSEE")` (*interactive SummerizedExperiment Explorer*) package [@Albrecht2018], which provides a Shiny-based graphical user interface for exploration of single-cell data in `SummarizedExperiment` format (installation instructions and user guides are available [here](https://bioconductor.org/packages/release/bioc/html/iSEE.html)). When available, a great tool for interactive exploration and comparison of dimension-reduced embeddings is `r CRANpkg("sleepwalk")` [@Ovchinnikova2018]. # Session info ```{r} sessionInfo() ``` # References