---
title: "FlowSorted.Blood.EPIC"
author: "Lucas A Salas"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{FlowSorted.Blood.EPIC}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

 Illumina Human Methylation data from EPIC on immunomagnetic sorted adult 
 blood cell populations. The FlowSorted.Blood.EPIC package contains Illumina
 HumanMethylationEPIC (EPIC)) DNA methylation microarray data
 from the immunomethylomics group (manuscript submitted), consisting of 37 
 magnetic sorted blood cell references and 12 samples, formatted as an
 RGChannelSet object for  integration and normalization using
 most of the existing Bioconductor packages.

 This package contains data similar to the FlowSorted.Blood.450k
 package consisting of data from peripheral blood samples generated from
 adult men and women. However, when using the newer EPIC microarray minfi 
 estimates of cell type composition using the FlowSorted.Blood.450k package
 are less precise compared to actual cell counts. Hence, this package
 consists of appropriate data for deconvolution of adult blood samples
 used in for example EWAS relying in the newer EPIC technology.

 Researchers may find this package useful as these samples represent
 different cellular populations ( T lymphocytes (CD4+ and CD8+), B cells
 (CD19+), monocytes (CD14+), NK cells (CD56+) and Neutrophils of cell
 sorted blood generated with high purity estimates. As a test of accuracy
 12 experimental mixtures were reconstructed using fixed amounts of DNA from
 purified cells. 
 
**Objects included:**  
 1. *FlowSorted.Blood.EPIC* is the RGChannelSet object containing the reference 
 library
 
```{r eval=TRUE}
library(FlowSorted.Blood.EPIC)
if (memory.limit()>8000){
FlowSorted.Blood.EPIC <- libraryDataGet('FlowSorted.Blood.EPIC')
FlowSorted.Blood.EPIC
}
```
 The raw dataset is hosted in both ExperimentHub (EH1136) and GEO 
 [GSE110554](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE110554)  

 2. *IDOLOptimizedCpGs* the IDOL L-DMR library for EPIC arrays  
```{r eval=FALSE}
data("IDOLOptimizedCpGs") 
head(IDOLOptimizedCpGs)  
```  
 3. *IDOLOptimizedCpGs450klegacy* the IDOL L-DMR library for legacy 450k 
 arrays  
```{r eval=FALSE}
data("IDOLOptimizedCpGs450klegacy") 
head(IDOLOptimizedCpGs450klegacy)  
```  
 *See the object help files for additional information*  
 
**estimateCellCounts2 function for cell type deconvolution:**  
 
 We offer the function estimateCellCounts2 a modification of  the popular 
 estimatesCellCounts function in minfi. Our function corrected small glitches 
 when dealing with combining the DataFrame objects with the reference libraries.
 We allow the use of MethylSet objects such as those from GEO. However, we offer
 only Quantile normalization for those datasets (assuming that they have not 
 been previously normalized). The estimates are calculated using the CP/QP 
 method described in Houseman et al. 2012.  and adapted in minfi.
 CIBERSORT and RPC are allowed using external packages.  
 *see ?estimateCellCounts2 for details*  
```{r eval=TRUE}
# Step 1: Load the reference library to extract the artificial mixtures
if (memory.limit()>8000){
FlowSorted.Blood.EPIC <- libraryDataGet('FlowSorted.Blood.EPIC')
FlowSorted.Blood.EPIC
}
FlowSorted.Blood.EPIC
library(minfi)
# Note: If your machine does not allow access to internet you can download
# and save the file. Then load it manually into the R environment


# Step 2 separate the reference from the testing dataset if you want to run 
# examples for estimations for this function example

RGsetTargets <- FlowSorted.Blood.EPIC[,
             FlowSorted.Blood.EPIC$CellType == "MIX"]
             
sampleNames(RGsetTargets) <- paste(RGsetTargets$CellType,
                            seq_len(dim(RGsetTargets)[2]), sep = "_")
RGsetTargets

# Step 3: use your favorite package for deconvolution.
# Deconvolute a target data set consisting of EPIC DNA methylation 
# data profiled in blood, using your prefered method.

# You can use our IDOL optimized DMR library for the EPIC array.  This object
# contains a vector of length 450 consisting of the IDs of the IDOL optimized
# CpG probes.  These CpGs are used as the backbone for deconvolution and were
# selected because their methylation signature differs across the six normal 
# leukocyte subtypes. Use the option "IDOL"

head (IDOLOptimizedCpGs)
# If you need to deconvolute a 450k legacy dataset use 
# IDOLOptimizedCpGs450klegacy instead

# We recommend using Noob processMethod = "preprocessNoob" in minfi for the 
# target and reference datasets. 
# Cell types included are "CD8T", "CD4T", "NK", "Bcell", "Mono", "Neu"

# To use the IDOL optimized list of CpGs (IDOLOptimizedCpGs) use 
# estimateCellCounts2 an adaptation of the popular estimateCellCounts in 
# minfi. This function also allows including customized reference arrays. 

# Do not run with limited RAM the normalization step requires a big amount 
# of memory resources

if (memory.limit()>8000){
 propEPIC<-estimateCellCounts2(RGsetTargets, compositeCellType = "Blood", 
                                processMethod = "preprocessNoob",
                                probeSelect = "IDOL", 
                                cellTypes = c("CD8T", "CD4T", "NK", "Bcell", 
                                "Mono", "Neu"))
                                
print(head(propEPIC$prop))
percEPIC<-round(propEPIC$prop*100,1)
}
```

# Advanced user deconvolution CP/QP, CIBERSORT and/or RPC deconvolution
```{r eval=TRUE}

noobset<- preprocessNoob(RGsetTargets)
#or from estimateCellCounts2 returnAll=TRUE

if (memory.limit()>8000){
 propEPIC<-projectCellType_CP (
 getBeta(noobset)[IDOLOptimizedCpGs,],
 IDOLOptimizedCpGs.compTable, contrastWBC=NULL, nonnegative=TRUE,
 lessThanOne=FALSE)

print(head(propEPIC))
percEPIC<-round(propEPIC*100,1)
}

# If you prefer CIBERSORT or RPC deconvolution use EpiDISH or similar

# Example not to run

# library(EpiDISH)
# RPC <- epidish(getBeta(noobset)[IDOLOptimizedCpGs,],
# IDOLOptimizedCpGs.compTable, method = "RPC")
# RPC$estF#RPC proportion estimates
# percEPICRPC<-round(RPC$estF*100,1)#percentages
# 
# CBS <- epidish(getBeta(noobset)[IDOLOptimizedCpGs,],
# IDOLOptimizedCpGs.compTable, method = "CBS")
# CBS$estF#CBS proportion estimates
# percEPICCBS<-round(CBS$estF*100,1)#percentages
```
 
Umbilical Cord Blood
```{r Umbilical Cord Blood, eval=FALSE}
# UMBILICAL CORD BLOOD DECONVOLUTION

library (FlowSorted.CordBloodCombined.450k)
# Step 1: Load the reference library to extract the umbilical cord samples
FlowSorted.CordBloodCombined.450k <- 
    libraryDataGet('FlowSorted.CordBloodCombined.450k') 

FlowSorted.CordBloodCombined.450k

# Step 2 separate the reference from the testing dataset if you want to run 
# examples for estimations for this function example

RGsetTargets <- FlowSorted.CordBloodCombined.450k[,
FlowSorted.CordBloodCombined.450k$CellType == "WBC"]
sampleNames(RGsetTargets) <- paste(RGsetTargets$CellType,
                              seq_len(dim(RGsetTargets)[2]), sep = "_")
RGsetTargets

# Step 3: use your favorite package for deconvolution.
# Deconvolute a target data set consisting of 450K DNA methylation 
# data profiled in blood, using your prefered method.
# You can use our IDOL optimized DMR library for the Cord Blood,  This object
# contains a vector of length 517 consisting of the IDs of the IDOL optimized
# CpG probes.  These CpGs are used as the backbone for deconvolution and were
# selected because their methylation signature differs across the six normal 
# leukocyte subtypes plus the nucleated red blood cells.

# We recommend using Noob processMethod = "preprocessNoob" in minfi for the 
# target and reference datasets. 
# Cell types included are "CD8T", "CD4T", "NK", "Bcell", "Mono", "Gran", 
# "nRBC"
# To use the IDOL optimized list of CpGs (IDOLOptimizedCpGsCordBlood) use 
# estimateCellCounts2 from FlowSorted.Blood.EPIC. 
# Do not run with limited RAM the normalization step requires a big amount 
# of memory resources. Use the parameters as specified below for 
# reproducibility.
# 
if (memory.limit()>8000){
    propUCB<-estimateCellCounts2(RGsetTargets,
                                    compositeCellType =
                                               "CordBloodCombined",
                                    processMethod = "preprocessNoob",
                                    probeSelect = "IDOL",
                                    cellTypes = c("CD8T", "CD4T", "NK",
                                    "Bcell", "Mono", "Gran", "nRBC"))

    head(propUCB$prop)
    percUCB<-round(propUCB$prop*100,1)
}
```

Using cell counts instead of proportions. 
Note: These are random numbers, not the actual cell counts of the experiment
```{r cell counts, eval=TRUE}
library(FlowSorted.Blood.450k)
RGsetTargets2 <- FlowSorted.Blood.450k[,
                             FlowSorted.Blood.450k$CellType == "WBC"]
sampleNames(RGsetTargets2) <- paste(RGsetTargets2$CellType,
                             seq_len(dim(RGsetTargets2)[2]), sep = "_")
RGsetTargets2
propEPIC2<-estimateCellCounts2(RGsetTargets2, compositeCellType = "Blood",
                             processMethod = "preprocessNoob",
                             probeSelect = "IDOL",
                             cellTypes = c("CD8T", "CD4T", "NK", "Bcell",
                             "Mono", "Neu"), cellcounts = rep(10000,6))
head(propEPIC2$prop)
head(propEPIC2$counts)
percEPIC2<-round(propEPIC2$prop*100,1)
```


Blood Extended deconvolution

```{r Blood extended deconvolution, eval=FALSE}
# Blood Extended deconvolution or any external reference
#please contact <Technology.Transfer@dartmouth.edu>

# Do not run
library (FlowSorted.BloodExtended.EPIC)
# 
# Step 1: Extract the mix samples

FlowSorted.Blood.EPIC <- libraryDataGet('FlowSorted.Blood.EPIC')

# Step 2 separate the reference from the testing dataset if you want to run 
# examples for estimations for this function example

RGsetTargets <- FlowSorted.Blood.EPIC[,
FlowSorted.Blood.EPIC$CellType == "MIX"]
sampleNames(RGsetTargets) <- paste(RGsetTargets$CellType,
                              seq_len(dim(RGsetTargets)[2]), sep = "_")
RGsetTargets

# Step 3: use your favorite package for deconvolution.
# Deconvolute the target data set 450K or EPIC blood DNA methylation. 
# We recommend ONLY the IDOL method, the automatic method can lead to severe
# biases.

# We recommend using Noob processMethod = "preprocessNoob" in minfi for the 
# target and reference datasets. 
# Cell types included are "Bas", "Bmem", "Bnv", "CD4mem", "CD4nv", 
# "CD8mem", "CD8nv", "Eos", "Mono", "Neu", "NK", and "Treg"
# Use estimateCellCounts2 from FlowSorted.Blood.EPIC. 
# Do not run with limited RAM the normalization step requires a big amount 
# of memory resources. Use the parameters as specified below for 
# reproducibility.
# 
if (memory.limit()>8000){
    prop_ext<-estimateCellCounts2(RGsetTargets,
                                    compositeCellType =
                                               "BloodExtended",
                                    processMethod = "preprocessNoob",
                                    probeSelect = "IDOL",
                                    cellTypes = c("Bas", "Bmem", "Bnv",
                                               "CD4mem", "CD4nv",
                                              "CD8mem", "CD8nv", "Eos",
                                              "Mono", "Neu", "NK", "Treg"),
    CustomCpGs =if(RGsetTargets@annotation[1]=="IlluminaHumanMethylationEPIC"){
    IDOLOptimizedCpGsBloodExtended}else{IDOLOptimizedCpGsBloodExtended450k})

   perc_ext<-round(prop_ext$prop*100,1)
   head(perc_ext)
}
```

```{r}
sessionInfo()
```
  
 **References**  

 LA Salas et al. (2018). An optimized library for reference-based deconvolution 
 of whole-blood biospecimens assayed using the Illumina HumanMethylationEPIC
 BeadArray. Genome Biology 19, 64. doi:
 [10.1186/s13059-018-1448-7](https://dx.doi.org/10.1186/s13059-018-1448-7).  
 
 LA Salas et al. (2022). \emph{Enhanced cell deconvolution of 
 peripheral blood using DNA methylation for high-resolution immune 
 profiling}. Nat Comm (In Press). doi (biorxiv):
 [10.1101/2021.04.11.439377](https://dx.doi.org/10.1101/2021.04.11.439377).
 
 DC Koestler et al. (2016). Improving cell mixture deconvolution by
 identifying optimal DNA methylation libraries (IDOL). BMC bioinformatics.
 17, 120. doi:
 [10.1186/s12859-016-0943-7](https://dx.doi.org/10.1186/s12859-016-0943-7).
 
 K Gervin, LA Salas et al. (2019) \emph{Systematic evaluation and 
 validation of references and library selection methods for deconvolution of 
 cord blood DNA methylation data}. Clin Epigenetics 11,125. doi:
 [10.1186/s13148-019-0717-y](https://dx.doi.org/10.1186/s13148-019-0717-y).
 
 EA Houseman et al. (2012) DNA methylation arrays as surrogate
 measures of cell mixture distribution. BMC Bioinformatics 13, 86.
 doi:
 [10.1186/1471-2105-13-86](https://dx.doi.org/10.1186/1471-2105-13-86).  
 
 [minfi](http://bioconductor.org/packages/release/bioc/html/minfi.html)
 Tools to analyze & visualize Illumina Infinium methylation arrays.