%\VignetteIndexEntry{DMRcaller}
%\VignetteEngine{knitr::knitr}
%\VignetteKeywords{differentially methylated regions, methylation, epigenetics, bisulfite sequencing}
%\VignettePackage{DMRcaller}
%\VignetteDepends{DMRcaller}
\documentclass[10pt]{article}

\usepackage[margin=2cm,nohead]{geometry}
\PassOptionsToPackage{hyphens}{url}\usepackage{hyperref}
\usepackage{natbib}

\newcommand{\R}{\texttt{R} }
\newcommand{\Rfun}[1]{{\texttt{#1}}}
\newcommand{\Robj}[1]{{\texttt{#1}}}
\newcommand{\RObj}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}}
\newcommand{\link}[1]{\href{#1}{\normalfont\texttt{#1}}}


\author{Nicolae Radu Zabet\footnote{
e-mail: \email{nzabet@essex.ac.uk},
School of Biological Sciences, University of Essex, UK}
}


\begin{document}

\title{Overview of the \Rpackage{DMRcaller} package}

\maketitle

\tableofcontents

\section{Introduction}\label{sec:intro}


DNA methylation is an epigenetic modification of the DNA where a methyl group is added to the cytosine nucleotides. This modification is heritable, able to control the gene regulation and, in general, is associated with transcriptional gene silencing. While in mammals the DNA is predominantly methylated in CG context, in plants non-CG methylation (CHG and CHH, where H can be any of the A, C or T nucleotides) is also present and is important for the epigenetic regulation of transcription. Sequencing of bisulfite converted DNA has become the method of choice to determine genome wide methylation distribution. The \Rpackage{DMRcaller} package computes the set of Differentially Methylated Regions (DMRs) between two samples. \Rpackage{DMRcaller} will compute the differentially methylated regions from Whole Genome Bisulfite Sequencing (WGBS) or Reduced Representation Bisulfite Sequencing (RRBS) data. There are several tools able to call DMRs, but most work has been done in mammalian systems and, thus, they were designed to primarily call CG methylation.


\section{Methods}
The package computes the DMRs using the CX report files generated by Bismark \citep{krueger_2011}, which contain the number of methylated and unmmethylated reads for each cytosine in the genome. The coverage at each position on the genome is not homogeneous and this makes it difficult to compute the differentially methylated cytosines. Here, we implemented three methods:
\begin{itemize}
\item \textbf{noise\_filter} where we use a kernel  \citep{hebestreit_2013} to smooth the number of methylated reads and the total number of reads (the \Rpackage{DMRcaller} package provides four kernels: \Rfun{"uniform"}, \Rfun{"triangular"}, \Rfun{"gaussian"} and \Rfun{"epanechnicov"})
\item \textbf{neighbourhood} where individual cytosines in a specific context are considered in the analysis without any smoothing
\item \textbf{bins} where the genome is split into equal bins where all the reads are pooled together
\end{itemize}

The DMRs are then computed by performing a statistical test between the number of methylated reads and the total number of reads in the two conditions for each position, cytosine or bin. In particular, we implemented two statistical tests: $(i)$ Fisher's exact test and $(ii)$ the Score test. The former (Fisher's exact test) uses the \Rfun{fisher.test} in the \Rpackage{stats} package.

The Score test is a statistical test of a simple null hypothesis that a parameter of interest is equal to some particular value. In our case, we are interested if the methylation levels in the two samples are equal or different. Given that $m_1$ is the number of methylated reads in condition 1, $m_2$ is the number of methylated reads in condition 2, $n_1$ is the total number of reads in condition 1 and $n_2$ is the total number of reads in condition 2, the Z-score of the Score test is
\begin{equation}
Z = \frac{\left(p_1 - p_2\right)\nu}{\sqrt{p(1-p)}}
\end{equation}
where $p_1=m_1/n_1$, $p_2 = m_2/n_2$,
\begin{equation}
p =  \frac{m_1 + m_2}{n_1 + n_2}\quad \textrm{and}\quad \nu = \sqrt{\frac{n_1n_2}{n_1 + n_2}}
\end{equation}

We then convert the Z-score to the p-value assuming a normal distribution and a two sided test.
<<pvalue, echo=TRUE, message=FALSE, cache=FALSE, eval=FALSE>>=
pValue <- 2*pnorm(-abs(zScore))
@

Finally, for both statistical tests (Fisher's exact test and Score test), we adjust the p-values for multiple testing using Benjamini and Hochberg's method \citep{benjamini_1995} to control the false discovery
<<pvalue_adjust, echo=TRUE, message=FALSE, cache=FALSE, eval=FALSE>>=
pValue <- p.adjust(pValue, method="fdr")
@

The algorithm performs the statistical test for each position, cytosine or bin and then marks as DMRs all positions/cytosines/bins that satisfy the following three conditions:
\begin{itemize}
\item the difference in methylation levels between the two conditions is statistically significant according to the statistical test;
\item the difference in methylation proportion between the two conditions is higher than a threshold value;
\item the mean number of reads per cytosine is higher than a threshold.
\end{itemize}
To group adjacent DMRs, we run an iterative process, where neighbouring DMRs (within a certain distance of each other) are joined only if these three conditions are still met after joining the DMRs.

Finally, we filter the DMRs as follow
\begin{itemize}
\item Remove DMRs whose lengths are less than a minimum size.
\item Remove DMRs with fewer cytosines than a threshold value.
\end{itemize}

For a set of potential DMRs (e.g. genes, transposable elements or CpG islands) the user can call the function \Rfun{filterDMRs} where all reads in a set of provided regions are pooled together and then the algorithm performs the statistical test for each region.


\section{Description}

\subsection{Data}
Bismark \citep{krueger_2011} is a popular tool for methylation call on WGBS or RRBS data. \Rpackage{DMRcaller} takes as inputs the CX report files generated by Bismark and stores this data in a \Robj{GRanges} object. In the package, we included two CX report files that contain the methylation calls of WT and \emph{met1-3} \emph{Arabidopsis thaliana} \citep{stroud_2013}. \emph{MET1} gene encodes for the main DNA methyltransferase in \emph{Arabidopsis thaliana} and the \emph{met1-3} mutation results in a genome-wide loss of DNA methylation (mainly in CG context). Due to running time, we restricted the data and analysis to the first $1\ Mb$ of the third chromosome of \emph{A. thaliana}.

<<load_presaved, message=FALSE,cache=FALSE>>=
library(DMRcaller)

#load presaved data
data(methylationDataList)
@

To load a different dataset, one can use \Rfun{readBismark} function, which takes as input the filename of the CX report file to be loaded.

<<load, echo=TRUE, message=FALSE, cache=FALSE, eval=FALSE>>=
# specify the Bismark CX report files
saveBismark(methylationDataList[["WT"]],
            "chr3test_a_thaliana_wt.CX_report")
saveBismark(methylationDataList[["met1-3"]],
            "chr3test_a_thaliana_met13.CX_report")

# load the data
methylationDataWT <- readBismark("chr3test_a_thaliana_wt.CX_report")
methylationDataMet13 <- readBismark("chr3test_a_thaliana_met13.CX_report")
methylationDataList <- GRangesList("WT" = methylationDataWT,
                                   "met1-3" = methylationDataMet13)
@

\RObj{methylationDataList} is a \Robj{GRangesList} object , where the \Robj{GRanges} elements contain four metadata columns
\begin{itemize}
\item \textbf{context} - the context of the Cytosine (CG, CHG or CHH)
\item \textbf{readsM} - the number of methylated reads
\item \textbf{readsN} - the total number of reads
\item \textbf{trinucleotide\_context} - the specific context of the cytosine (H is replaced by the actual nucleotide)
\end{itemize}

If the data consists of two or more replicates, these can be pooled together using the function \Rfun{poolMethylationDatasets} or \Rfun{poolTwoMethylationDatasets} (in the case of pooling only two datasets). The latter function (\Rfun{poolTwoMethylationDatasets}) is useful when the datasets are large and creating a \Robj{GRangesList} object is not possible (e.g. the \Robj{GRanges} objects are two large).

<<pool_data, echo=TRUE, message=FALSE, cache=FALSE, eval=FALSE>>=
# load the data
methylationDataAll <- poolMethylationDatasets(methylationDataList)

# In the case of 2 elements, this is equivalent to
methylationDataAll <- poolTwoMethylationDatasets(methylationDataList[[1]],
                                                 methylationDataList[[2]])
@

Alternatively, one can use \Rfun{readBismarkPool} to directly read a list of CX report files and pool them together.
<<_read_pool_data, echo=TRUE, message=FALSE, cache=FALSE, eval=FALSE>>=
# load the data
methylationDataAll <- readBismarkPool(c(file_wt, file_met13))
@



\subsection{Low resolution profiles}

The \Rpackage{DMRcaller} package  also offers the possibility to visualise context specific global changes in the methylation profile. To achieve this, the user can call \Rfun{plotMethylationProfileFromData} function, which computes the mean methylation proportion in tiling bins of fixed size; see Figure \ref{fig:lowResProfile}.

\begin{figure}[htp]
\begin{center}
<<profile, fig.width=12,fig.height=4,out.width='0.95\\textwidth', message=FALSE,cache=FALSE>>=
par(mar=c(4, 4, 3, 1)+0.1)
plotMethylationProfileFromData(methylationDataList[["WT"]],
                               methylationDataList[["met1-3"]],
                               conditionsNames = c("WT","met1-3"),
                               windowSize = 10000,
                               autoscale = FALSE,
                               context = c("CG"))
@
\end{center}
\caption{\emph{Low resolution profile in CG context for WT and met1-3}.}\label{fig:lowResProfile}
\end{figure}



Alternatively, for a finer control, the user can use \Rfun{computeMethylationProfile} function to compute the methylation profile at certain locations on the genome. This function returns a \Robj{GRanges} object with four metadata columns
\begin{itemize}
\item \textbf{sumReadsM} - the number of methylated reads
\item \textbf{sumReadsN} - the total number of reads
\item \textbf{Proportion} - the proportion of methylated reads
\item \textbf{context} - the context
\end{itemize}

One or more of these \Robj{GRanges} objects can be put in a \Robj{GRangesList} object which is then passed as a parameter to the \Rfun{plotMethylationProfile} function.
<<profile_fine, echo=TRUE, message=FALSE, cache=FALSE, fig.width=5,fig.height=5,out.width='0.95\\textwidth', eval=FALSE>>=
regions <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E6))

# compute low resolution profile in 10 Kb windows
profileCGWT <- computeMethylationProfile(methylationDataList[["WT"]],
                                         regions,
                                         windowSize = 10000,
                                         context = "CG")

profileCGMet13 <- computeMethylationProfile(methylationDataList[["met1-3"]],
                                            regions,
                                            windowSize = 10000,
                                            context = "CG")

profilesCG <- GRangesList("WT" = profileCGWT, "met1-3" = profileCGMet13)

#plot the low resolution profile
par(mar=c(4, 4, 3, 1)+0.1)
par(mfrow=c(1,1))
plotMethylationProfile(profilesCG,
                       autoscale = FALSE,
                       labels = NULL,
                       title="CG methylation on Chromosome 3",
                       col=c("#D55E00","#E69F00"),
                       pch = c(1,0),
                       lty = c(4,1))
@


\subsection{Coverage of the bisulfite sequencing data}

The number of reads from the bisulfite sequencing can differ significantly between different locations on the genome in the sense that cytosines in the same context (including neighbouring cytosines) can display large variability in the coverage. To plot the coverage of the bisulfite sequencing datasets, one can use \Rfun{plotMethylationDataCoverage} function which takes as input one or two datasets and the vector with the thresholds used to compute the proportion of cytosines with at least that many reads; see Figure \ref{fig:coverage}.

\begin{figure}[htp]
\begin{center}
<<coverage, fig.width=6,fig.height=5,out.width='0.6\\textwidth', message=FALSE,cache=FALSE>>=
# plot the coverage in the two contexts
par(mar=c(4, 4, 3, 1)+0.1)
plotMethylationDataCoverage(methylationDataList[["WT"]],
                            methylationDataList[["met1-3"]],
                            breaks = c(1,5,10,15),
                            regions = NULL,
                            conditionsNames=c("WT","met1-3"),
                            context = c("CHH"),
                            proportion = TRUE,
                            labels=LETTERS,
                            contextPerRow = FALSE)
@
\end{center}
\caption{\emph{Coverage}. For example, this figure shows that in WT only $30\%$ of the cytosines in CHH context have at least 10 reads.}\label{fig:coverage}
\end{figure}


Alternatively, the \Rpackage{DMRcaller} also provides the \Rfun{computeMethylationDataCoverage} function which returns a numeric vector with the number or proportion of cytosines in a specific context that have at least a certain number of reads specified by the input vector \Rfun{breaks}.
<<coverage_compute, message=FALSE,cache=FALSE, eval=FALSE>>=
# compute the coverage in the two contexts
coverageCGWT <- computeMethylationDataCoverage(methylationDataList[["WT"]],
                                               context="CG",
                                               breaks = c(1,5,10,15))
@


\subsection{Spatial correlation of methylation levels}

Methylation levels are often spatially correlated and some methods to detect DMRs assume this spatial correlation. Nevertheless, different tissues, samples and even methylation context will display different levels of correlation. \Rpackage{DMRcaller} implements \Rfun{plotMethylationDataSpatialCorrelation} function that plots the correlation of methylation levels as function of distance between cytosines. This function takes as input one or two datasets and the vector with the distances between cytosines; see Figure \ref{fig:coverage}.

\begin{figure}[htp]
\begin{center}
<<correlation_plot, fig.width=6,fig.height=5,out.width='0.6\\textwidth', message=FALSE,cache=FALSE>>=
# compute the spatial correlation of methylation levels
plotMethylationDataSpatialCorrelation(methylationDataList[["WT"]],
                            distances = c(1,100,10000), regions = NULL,
                            conditionsNames = c("WT"),
                            context = c("CG"),
                            labels = LETTERS, col = NULL,
                            pch = c(1,0,16,2,15,17), lty = c(4,1,3,2,6,5),
                            contextPerRow = FALSE,
                            log = "x")
@
\end{center}
\caption{\emph{Spatial correlation of methylation levels}. }\label{fig:correlation}
\end{figure}


Alternatively, the \Rpackage{DMRcaller} also provides the \Rfun{computeMethylationDataSpatialCorrelation} function which returns a numeric vector with the correlation of methylation levels of cytosines separated by certain distances in a specific context.
<<correlation_compute, message=FALSE,cache=FALSE, eval=FALSE>>=
# compute the coverage in the two contexts
correlation_CG_wt <- computeMethylationDataSpatialCorrelation(methylationDataList[["WT"]],
                                                              context="CG",
                                                              distances=c(1,10,100,1000,10000))
@


\subsection{Calling DMRs}
\Rpackage{DMRcaller} package provides \Rfun{computeDMRs} function to call DMRs. The output of this function is a \Robj{GRanges} with 11 metadata columns.
\begin{itemize}
\item \textbf{direction} - a numeric value indicating whether the methylation was lost in the second condition compared to the first one ($-1$) or gained ($+1$)
\item \textbf{context} - the context of the cytosine (CG, CHG or CHH)
\item \textbf{sumReadsM1} - the number of methylated reads in the DMR in condition 1
\item \textbf{sumReadsN1} - the total number of reads in the DMR in condition 1
\item \textbf{proportion1} - the proportion of methylated reads in the DMR in condition 1
\item \textbf{sumReadsM2} - the number of methylated reads in the DMR in condition 2
\item \textbf{sumReadsN2} - the total number of reads in the DMR in condition 2
\item \textbf{proportion2} - the proportion of methylated reads in the DMR in condition 2
\item \textbf{cytosinesCount} - the number of cytosines in the DMR
\item \textbf{pValue} - the adjusted p-value of the statistical test
\item \textbf{regionType} - a character string indicating whether the methylation was lost in the second condition compared to the first one (\Rfun{"loss"}) or gained (\Rfun{"gain"})
\end{itemize}
For predifined regions (e.g. genes, transposons or CpG islands) the user can call \Rfun{filterDMRs} function to extract the list of regions that are differentially methylated. The output of this function is again a GRanges with the same 11 metadata columns.


Below we present examples of calling both functions.
<<compute_DMRs_CG_noise_filter, message=FALSE,cache=FALSE>>=
chr_local <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(5E5,6E5))

# compute the DMRs in CG context with noise_filter method
DMRsNoiseFilterCG <- computeDMRs(methylationDataList[["WT"]],
                      methylationDataList[["met1-3"]],
                      regions = chr_local,
                      context = "CG",
                      method = "noise_filter",
                      windowSize = 100,
                      kernelFunction = "triangular",
                      test = "score",
                      pValueThreshold = 0.01,
                      minCytosinesCount = 4,
                      minProportionDifference = 0.4,
                      minGap = 0,
                      minSize = 50,
                      minReadsPerCytosine = 4,
                      cores = 1)

print(DMRsNoiseFilterCG)
@


<<compute_DMRs_CG_neighbourhood, message=FALSE,cache=FALSE>>=
# compute the DMRs in CG context with neighbourhood method
DMRsNeighbourhoodCG <- computeDMRs(methylationDataList[["WT"]],
                                   methylationDataList[["met1-3"]],
                                   regions = chr_local,
                                   context = "CG",
                                   method = "neighbourhood",
                                   test = "score",
                                   pValueThreshold = 0.01,
                                   minCytosinesCount = 4,
                                   minProportionDifference = 0.4,
                                   minGap = 200,
                                   minSize = 1,
                                   minReadsPerCytosine = 4,
                                   cores = 1)

print(DMRsNeighbourhoodCG)
@

<<compute_DMRs_CG_bins, message=FALSE,cache=FALSE>>=
# compute the DMRs in CG context with bins method
DMRsBinsCG <- computeDMRs(methylationDataList[["WT"]],
                          methylationDataList[["met1-3"]],
                          regions = chr_local,
                          context = "CG",
                          method = "bins",
                          binSize = 100,
                          test = "score",
                          pValueThreshold = 0.01,
                          minCytosinesCount = 4,
                          minProportionDifference = 0.4,
                          minGap = 200,
                          minSize = 50,
                          minReadsPerCytosine = 4,
                          cores = 1)

print(DMRsBinsCG)
@


<<compute_DMRs_CG_GE, message=FALSE,cache=FALSE>>=
# load the gene annotation data
data(GEs)

#select the genes
genes <- GEs[which(GEs$type == "gene")]

# compute the DMRs in CG context over genes
DMRsGenesCG <- filterDMRs(methylationDataList[["WT"]],
                          methylationDataList[["met1-3"]],
                          potentialDMRs = genes[overlapsAny(genes, chr_local)],
                          context = "CG",
                          test = "score",
                          pValueThreshold = 0.01,
                          minCytosinesCount = 4,
                          minProportionDifference = 0.4,
                          minReadsPerCytosine = 3,
                          cores = 1)
print(DMRsGenesCG)
@

\subsection{Merge DMRs}
Finally, for merging adjacent DMRs, \Rpackage{DMRcaller} provides the function \Rfun{mergeDMRsIteratively} which can be used as follows:
<<merge_DMRs_CG_noise_filter, message=FALSE,cache=FALSE>>=
DMRsNoiseFilterCGMerged <- mergeDMRsIteratively(DMRsNoiseFilterCG,
                                                minGap = 200,
                                                respectSigns = TRUE,
                                                methylationDataList[["WT"]],
                                                methylationDataList[["met1-3"]],
                                                context = "CG",
                                                minProportionDifference = 0.4,
                                                minReadsPerCytosine = 4,
                                                pValueThreshold = 0.01,
                                                test="score")
print(DMRsNoiseFilterCGMerged)
@
Note that two neighbouring DMRs will be merged if all the conditions below are met
\begin{itemize}
\item they are within a distance from each other smaller than minGap
\item the difference in methylation levels between the two conditions is statistically significant according to the statistical test when the two DMRs are joined
\item the difference in methylation proportion between the two conditions is higher than a threshold value  when the two DMRs are joined
\item the number of reads per cytosine is higher than a threshold when the two DMRs are joined
\end{itemize}

\subsection{Extract methylation data in regions}

\Rfun{analyseReadsInsideRegionsForCondition} function can extract additional information in a set of genomic regions (including DMRs) from any \RObj{methylationData} object. For example, to establish a link between the CG and CHH methylation, one might want to extract the number of methylated reads and the total number of reads in CHH context inside DMRs called in CG context.
<<analyse_reads_inside_regions_for_condition, message=FALSE,cache=FALSE>>=
#retrive the number of reads in CHH context in WT in CG DMRs
DMRsNoiseFilterCGreadsCHH <- analyseReadsInsideRegionsForCondition(
                              DMRsNoiseFilterCGMerged,
                              methylationDataList[["WT"]], context = "CHH",
                              label = "WT")
print(DMRsNoiseFilterCGreadsCHH)
@

\subsection{Plotting the distribution of DMRs}

Sometimes, it is useful to obtain the distribution of the DMRs over the chromosomes. The \Rpackage{DMRcaller} provides the \Rfun{computeOverlapProfile} function, which computes this distribution. The \Robj{GRanges} object generated by this function can then be added to a  \Robj{GRangesList} object, which can be plotted using \Rfun{plotOverlapProfile} function; see Figure \ref{fig:DMRdensity}. Additionally, the \Rfun{plotOverlapProfile} function allows the user to specify two \Robj{GRangesList}, thus, allowing the plotting of distributions of hypo or hyper methylated DMRs separately.
\begin{figure}[htp]
\begin{center}
<<DMR_distribution, fig.width=15,fig.height=2.5,out.width='0.95\\textwidth', message=FALSE,cache=FALSE>>=
# compute the distribution of DMRs
hotspots <- computeOverlapProfile(DMRsNoiseFilterCGMerged, chr_local,
                                  windowSize=5000, binary=TRUE)
# plot the distribution of DMRs
plotOverlapProfile(GRangesList("Chr3"=hotspots))
@
\end{center}
\caption{\emph{Distribution of DMRs}. Darker colour indicates higher density, while lighter colour lower density.}\label{fig:DMRdensity}
\end{figure}


\subsection{Plotting profiles with DMRs}

Finally, \Rpackage{DMRcaller} package also provides a function to plot methylation profiles at a specific location on the genome. To plot the methylation profile the user needs to call the \Rfun{plotLocalMethylationProfile} function; see Figure \ref{fig:localMethylationProfile}.


\begin{figure}[htp]
\begin{center}
<<local_profile, fig.width=15,fig.height=5,out.width='0.95\\textwidth', message=FALSE,cache=FALSE>>=
# select a 20 Kb location on the Chr3
chr3Reg <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(510000,530000))

# create a list with all DMRs
DMRsCGList <- list("noise filter" = DMRsNoiseFilterCGMerged,
                   "neighbourhood" = DMRsNeighbourhoodCG,
                   "bins" = DMRsBinsCG,
                   "genes" = DMRsGenesCG)
# plot the local profile
par(cex=0.9)
par(mar=c(4, 4, 3, 1)+0.1)
plotLocalMethylationProfile(methylationDataList[["WT"]],
                            methylationDataList[["met1-3"]],
                            chr3Reg,
                            DMRsCGList,
                            conditionsNames = c("WT", "met1-3"),
                            GEs,
                            windowSize = 300,
                            main="CG methylation")
@
\end{center}
\caption{\emph{Local methylation profile}. The points on the graph represent methylation proportion of individual cytosines, their colour (red or blue) which sample they belong to and the intensity of the the colour how many reads that particular cytosine had. This means that darker colours indicate stronger evidence that the corresponding cytosine has the corresponding methylation proportion, while lighter colours indicate a weaker evidence. The solid lines represent the smoothed profiles and the intensity of the colour the coverage at the corresponding position (darker colours indicate more reads while lighter ones less reads). The boxes on top represent the DMRs, where a filled box will represent a DMR which gained methylation while a box with a pattern represent a DMR that lost methylation. The DMRs need to have a metadata column \Rfun{regionType} which can be either \Rfun{"gain"} (where there is more methylation in condition 2 compared to condition 1) or \Rfun{"loss"} (where there is less methylation in condition 2 compared to condition 1). In case this metadata column is missing all DMRs are drawn using filled boxes. Finally, we also allow annotation of the DNA sequence. We represent by black boxes all the exons, which are joined by a horizontal black line, thus, marking the full body of the gene. With grey boxes we mark the transposable elements. Both for genes and transposable elements we plot them over a mid line if they are on the positive strand and under the mid line if they are on the negative strand. }\label{fig:localMethylationProfile}
\end{figure}


\section{Parallel computation}
Computing the DMRs can be computationally intensive. For example, in the case of \emph{A. thaliana} (with a genome of $\approx 130\ Mb$), it can take several hours to compute the DMRs depending on the method used and on the number of DMRs. To speed up computations, \Rpackage{DMRcaller} supports parallel computing of DMRs using the package \Rpackage{parallel}, but parallel computation is currently not supported on Windows.

The five functions used for computing and filtering the DMRs (\Rfun{computeDMRs}, \Rfun{filterDMRs}, \Rfun{mergeDMRsIteratively} and \Rfun{analyseReadsInsideRegionsForCondition}) accept the parameter \Rfun{cores}, which specifies the number of cores that can be used when performing the corresponding computations. When using 10 cores, it can take between 10 and 30 minutes to compute the DMRs in \emph{A. thaliana} depending on the selected parameters.

\section{Analysis of biological replicates}\label{sec:replicates}

The package also contains a set of functions for the analysis of multiple biological replicates.

The synthetic dataset is made by 300 different cytosines, extracted from those present in the \emph{A. thaliana} dataset. The value for \textbf{readsN} are created using the function \Rfun{rnorm}, while the values for \textbf{readsM} are generated using the function \Rfun{rbinom}. The probabilities used are 0.1 in the external region and 0.8 in the central region. In this way a DMR should be detected in the central region of the synthetic dataset.

The difference in proportion is plotted in figure \ref{fig:difference_methylation_replicates}

\begin{figure}[htp]
\begin{center}
<<difference_methylation_proportions, fig.width=12,fig.height=5,out.width='0.95\\textwidth', message=FALSE,cache=FALSE>>=
# loading synthetic data
data("syntheticDataReplicates")

# create vector with colours for plotting
cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442",
                "#0072B2", "#D55E00", "#CC79A7")

# plotting the difference in proportions
plot(start(methylationData), methylationData$readsM1/methylationData$readsN1,
     ylim=c(0,1), col=cbbPalette[2], xlab="Position in Chr3 (bp)",
     ylab="Methylation proportion")
points(start(methylationData), methylationData$readsM2/methylationData$readsN2,
       col=cbbPalette[7], pch=4)
points(start(methylationData), methylationData$readsM3/methylationData$readsN3,
       col=cbbPalette[3], pch=2)
points(start(methylationData), methylationData$readsM4/methylationData$readsN4,
       col=cbbPalette[6], pch=3)
legend(x = "topleft", legend=c("Treatment 1", "Treatment 2", "Control 1",
                               "Control 2"), pch=c(1,4,2,3),
       col=cbbPalette[c(2,7,3,6)], bty="n", cex=1.0)
@
\end{center}
\caption{\emph{Methylation proportions in the synthetic dataset.}}\label{fig:difference_methylation_replicates}
\end{figure}

The DMRs are computed using the function \Rfun{computeDMRsReplicates}, which uses beta regression \citep{ferrari_2004} to detect differential methylation.
<<computing biological replicates, fig.width=15,fig.height=5,out.width='0.95\\textwidth', message=FALSE,cache=FALSE>>=

# loading betareg library to allow using computeDMRsReplicates function
library(betareg)

# creating condition vector
condition <- c("a", "a", "b", "b")

# computing DMRs using the neighbourhood method
DMRsReplicatesBins <- computeDMRsReplicates(methylationData = methylationData,
                                            condition = condition,
                                            regions = NULL,
                                            context = "CG",
                                            method = "bins",
                                            binSize = 100,
                                            test = "betareg",
                                            pseudocountM = 1,
                                            pseudocountN = 2,
                                            pValueThreshold = 0.01,
                                            minCytosinesCount = 4,
                                            minProportionDifference = 0.4,
                                            minGap = 0,
                                            minSize = 50,
                                            minReadsPerCytosine = 4,
                                            cores = 1)
print(DMRsReplicatesBins)
@

\section{Session information}

<<session_info,echo=TRUE>>=
sessionInfo()
@

\bibliographystyle{apalike}
\bibliography{DMRcaller}

\end{document}