---
title: >
  scaeData User Guide
author:
- name: Ahmad Al Ajami
  affiliation: 
  - Goethe University, University Hospital Frankfurt, Neurological Institute (Edinger Institute), Frankfurt/Main
  - University Cancer Center, Frankfurt/Main
  - Frankfurt Cancer Institute, Frankfurt/Main
  email: alajami@med.uni-frankfurt.de
- name: Jonas Schuck
  affiliation: 
  - Goethe University, University Hospital Frankfurt, Neurological Institute (Edinger Institute), Frankfurt/Main
  - University Cancer Center, Frankfurt/Main
  - Frankfurt Cancer Institute, Frankfurt/Main
  email: schuck@med.uni-frankfurt.de
- name: Federico Marini
  affiliation: 
  - Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), Mainz
  - Research Center for Immunotherapy (FZI), Mainz
  email: marinif@uni-mainz.de
- name: Katharina Imkeller
  affiliation: 
  - Goethe University, University Hospital Frankfurt, Neurological Institute (Edinger Institute), Frankfurt/Main
  - University Cancer Center, Frankfurt/Main
  - Frankfurt Cancer Institute, Frankfurt/Main
  email: imkeller@med.uni-frankfurt.de
date: "`r BiocStyle::doc_date()`"
package: "`r BiocStyle::pkg_ver('scaeData')`"
output:
  BiocStyle::html_document:
    toc: true
    toc_float: true
    number_sections: true
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{scaeData User Guide}
  %\VignetteEncoding[utf8]{inputenc}
  %\VignettePackage{scaeData}
  %\VignetteKeywords{ExperimentHub, ExperimentData, Homo_sapiens_Data, SingleCellData}
---

# scaeData

`scaeData` is a complementary package to the Bioconductor package `SingleCellAlleleExperiment`. It contains three datasets to be used when testing functions in `SingleCellAlleleExperiment`. These are: 

- 5k PBMCs of a healthy donor, 3' v3 chemistry
- 10k PBMCs of a healthy donor, 3' v3 chemistry
- 20k PBMCs of a healthy donor, 3' v3 chemistry

The raw FASTQs for all three datasets were sourced from publicly accessible datasets provided by [10x Genomics](https://www.10xgenomics.com/datasets).

After downloading the raw data, the [scIGD](https://github.com/AGImkeller/scIGD) Snakemake workflow was utilized to perform HLA allele-typing processes and generate allele-specific quantification from scRNA-seq data using donor-specific references.

# Quick Start

## Installation

From Bioconductor:

```{r, eval=FALSE}
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")

BiocManager::install("scaeData")
```

Alternatively, a development version is available on GitHub and can be installed via:

```{r, eval=FALSE}
if (!require("devtools", quietly = TRUE))
    install.packages("devtools")

devtools::install_github("AGImkeller/scaeData", build_vignettes = TRUE)
```

# Usage

The datasets within `scaeData` are accessible using the `scaeDataGet()` function:

```{r libraries, include=TRUE}
library("scaeData")
```

```{r, eval = FALSE}
pbmc_5k <- scaeDataGet("pbmc_5k")
pbmc_10k <- scaeDataGet("pbmc_10k")
```

For example, we can view `pbmc_20k`:

```{r}
pbmc_20k <- scaeDataGet("pbmc_20k")

pbmc_20k
```

```{r}
cells.dir <- file.path(pbmc_20k$dir, pbmc_20k$barcodes)
features.dir <- file.path(pbmc_20k$dir, pbmc_20k$features)
mat.dir <- file.path(pbmc_20k$dir, pbmc_20k$matrix)

cells <- utils::read.csv(cells.dir, sep = "", header = FALSE)
features <- utils::read.delim(features.dir, header = FALSE)
mat <- Matrix::readMM(mat.dir)

rownames(mat) <- cells$V1
colnames(mat) <- features$V1
head(mat)
```

A `SingleCellAlleleExperiment` object, `scae` for short, can be generated using the `read_allele_counts()` function retrieved from the `SingleCellAlleleExperiment` package.

A lookup table corresponding to each dataset, facilitating the creation of relevant additional data layers during object generation, can be accessed from the package's extdata:

```{r}
lookup <- read.csv(system.file("extdata", "pbmc_20k_lookup_table.csv", package="scaeData"))

library("SingleCellAlleleExperiment")
scae_20k <- read_allele_counts(pbmc_20k$dir,
                               sample_names = "example_data",
                               filter_mode = "no",
                               lookup_file = lookup,
                               barcode_file = pbmc_20k$barcodes,
                               gene_file = pbmc_20k$features,
                               matrix_file = pbmc_20k$matrix,
                               verbose = TRUE)

scae_20k
```

Please refer to the vignette and documentation of `SingleCellAlleleExperiment` to further work with this kind of data container.

# Session info {-}

```{r sessionInfo}
sessionInfo()
```