---
title: |
  | Multi-sample multi-group
  | scRNA-seq data
author: "Helena L. Crowell"
date: "`r doc_date()`"
vignette: >
  %\VignetteIndexEntry{Multi-sample multi-group scRNA-seq data}
  %\VignetteEngine{knitr::rmarkdown}
output: 
  BiocStyle::html_document
bibliography: refs.bib
---

# Overview

The `muscData` package contains a set of publicly available single-cell RNA sequencing (scRNA-seq) datasets with complex experimental designs, i.e., datasets that contain multiple samples (e.g., individuals) measured across multiple experimental conditions (e.g., treatments), formatted into `SingleCellExperiment` (SCE) Bioconductor objects. Data objects are hosted through Bioconductor's ExperimentHub web resource.

# Available datasets

The table below gives an overview of currently available datasets, including a unique identifier (ID) that can be used to load the data (see next section), a brief description, the original data source, and a reference. Dataset descriptions may also be viewed from within R via `?ID` (e.g., `?Kang18_8vs8`).

ID | Description | Availability | Reference
---|-------------|--------------|----------
`Kang18_8vs8` | 10x droplet-based scRNA-seq PBMC data from 8 Lupus patients before and after 6h-treatment with INF-beta (16 samples in total) | Gene Expression Ombnibus (GEO) accession [GSE96583](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96583) | @Kang2018
`Crowell19_4vs4` | Single-nuclei RNA-seq data of 8 CD-1 male mice, split into 2 groups with 4 animals each: vehicle and peripherally lipopolysaccharaide (LPS) treated mice | Figshare [DOI:10.6084/m9.figshare.8976473.v1](https://doi.org/10.6084/m9.figshare.8976473.v1) | @Crowell2019-muscat

# Loading the data

All datasets available within `muscData` may be loaded either via named functions that directly reffer to the object names, or by using the `ExperimentHub` interface. Both methods are demonstrated below.

## Via accessor functions

The datasets listed above may be loaded into R by their ID. All provided SCEs contain unfiltered raw counts in their `assay` slot, and any available gene and cell metadata in the `rowData` and `colData` slots, respectively. 

```{r, message = FALSE}
library(muscData)
Kang18_8vs8()
```

## Via `ExperimentHub`

Besides using an accession function as demonstrated above, we can browse ExperimentHub records (using `query`) or package specific records (using `listResources`), and then load the data of interest. The key differences between these approaches is that `query` will search all of ExperimentHub, while `listResources` facilitate data discovery within the specified package (here, `muscData`).

### Using `query`

We first initialize a Hub instance to search for and load available data with the `ExperimentHub` function, and store the complete list of >2000 records in a variable `eh`. Using `query`, we then identify any records made available by `muscData`, as well as their accession IDs (EH1234). Finally, we can load the data into R via `eh[[id]]`.

```{r message = FALSE}
# create Hub instance
library(ExperimentHub)
eh <- ExperimentHub()
(q <- query(eh, "muscData"))
```

```{r message = FALSE, results = 'hide'}
# load data via accession ID
eh[["EH2259"]]
```

### Using `list/loadResources`

Alternatively, available records may be viewed via `listResources`. To then load a specific dataset or subset thereof using `loadResources`, we require a character vector of metadata search terms to filter by.

Available metadata can accessed from the ExperimentHub records found by `query` via `mcols()`, or viewed using the accessors shown above with option `metadata = TRUE`. In the example below, we use `"PMBC"` and `"INF-beta"` to select the `Kang18_8vs8` dataset. However, note that any metadata keyword(s) that uniquely identify the data of interest could be used (e.g., `"Lupus"` or `"GSE96583"`).

```{r message = FALSE}
listResources(eh, "muscData")
```

```{r message = FALSE, results = 'hide'}
# view metadata
mcols(q)
Kang18_8vs8(metadata = TRUE)

# load data using metadata search terms
loadResources(eh, "muscData", c("PBMC", "INF-beta"))
```

# Exploring the data

The `r Biocpkg("scater")` [@McCarthy2017] package provides an easy-to-use set of visualization tools for scRNA-seq data.

For interactive visualization, we recommend the `r Biocpkg("iSEE")` (*interactive SummerizedExperiment Explorer*) package [@Albrecht2018], which provides a Shiny-based graphical user interface for exploration of single-cell data in `SummarizedExperiment` format (installation instructions and user guides are available [here](https://bioconductor.org/packages/release/bioc/html/iSEE.html)).

When available, a great tool for interactive exploration and comparison of dimension-reduced embeddings is `r CRANpkg("sleepwalk")` [@Ovchinnikova2018].

# Session info
```{r}
sessionInfo()
```

# References