%\VignetteIndexEntry{OUTRIDER: OUTlier in RNA-seq fInDER}
%\VignettePackage{OUTRIDER}
%\VignetteEngine{knitr::knitr}
%\VignetteEncoding{UTF-8}

\documentclass[11pt]{article}

<<style-knitr, eval=TRUE, echo=FALSE, cache=FALSE, results="asis">>=
BiocStyle::latex()
@

<<knitr, echo=FALSE, cache=FALSE, results="hide">>=
library(knitr)
opts_chunk$set(
    tidy=FALSE,
    dev="png",
    fig.width=7,
    fig.height=7,
    dpi=300,
    message=FALSE,
    warning=FALSE,
    cache=TRUE
)
@

\usepackage{amsmath}
\usepackage{verbatim}
\usepackage[nottoc]{tocbibind}

\newcommand{\outrider}{\Biocpkg{OUTRIDER}}
\newcommand{\rna}{RNA-seq}
\newcommand{\deseq}{\Biocpkg{DESeq2}}
\newcommand{\ods}{\Rclass{OutriderDataSet}}

\title{OUTRIDER - OUTlier in RNA-Seq fInDER}

\author{
    Felix Brechtmann$^{1}$, Christian Mertes$^{1}$, Agne Matuseviciute$^{1}$, 
        Vicente Yepez$^{1,2}$, Julien Gagneur$^{1,2}$ \\
    \small{$^{1}$ Technical University Munich, Department of 
        Informatics, Munich, Germany}\\
    \small{$^{2}$ Quantitative Biosciences Munich, Gene Center, 
        Ludwig-Maximilians Universit\"at M\"unchen, Munich, Germany}
}

\begin{document}

<<include=FALSE>>=
opts_chunk$set(concordance=TRUE)
@

\maketitle

\begin{abstract}

In the field of diagnostics of rare diseases, \rna{} is emerging as an 
important and complementary tool for whole exome and whole genome sequencing. 
\outrider{} is a framework that detects aberrant gene expression within a 
group of samples. It uses the negative binomial distribution which is fitted 
for each gene over all samples. We additionally provide an autoencoder, 
which automatically controls for co-variation before fitting. After fitting,
each sample can be tested for aberrantly expressed genes. Furthermore,
\outrider{} provides functionality to easily filter unexpressed genes, to 
analyse the data as well as to visualize the results.

\vspace{3em}

\begin{center}
\begin{tabular}{ | l | }
\hline
If you use \outrider{} in published research, please cite:  \\
\\
Brechtmann F*, Mertes C*, Matuseviciute A*, Yepez V, Avsec Z, Herzog M, \\
Bader D M, Prokisch H, Gagneur J; \textbf{OUTRIDER: A statistical method} \\
\textbf{for detecting aberrantly expressed genes in RNA sequencing data}; \\
\emph{AJHG}; 2018; DOI: \url{https://doi.org/10.1016/j.ajhg.2018.10.025}
\\
\hline
\end{tabular}
\end{center}

\end{abstract}

\newpage

\tableofcontents

\newpage


<<hiden_config, echo=FALSE>>=
suppressPackageStartupMessages({
    library(OUTRIDER)
    library(beeswarm)
})
if(!is(bpparam(), "SerialParam")){
    bp <- bpparam()
    bpworkers(bp) <- min(4, bpworkers(bp))
    register(bp)
}
@

\section{Introduction}
\outrider{} (OUTlier in RNA-seq fInDER) is a tool for finding aberrantly 
expressed
genes in RNA-seq samples. It does so by fitting a negative binomial model 
to RNA-seq read counts, correcting for variations in sequencing depth and 
apparent co-variations across samples. Read counts that significantly deviate 
from the distribution are detected as outliers. \outrider{} makes use of an
autoencoder to control automatically for confounders within the data. A scheme
of this approach is given in Figure \ref{autoencoder_sketch}.

\incfig{autoencoder_sketch}{1\textwidth}{Context-dependent outlier detection.}{
The algorithm identifies 
gene expression outliers whose read counts are significantly aberrant given the
co-variations typically observed across genes in an RNA sequencing data set.
This is illustrated by a read count (left panel, fifth column, second row from
the bottom) that is exceptionally high in the context of correlated samples
(left six samples) but not in absolute terms for this given gene. To capture
commonly seen biological and technical contexts, an autoencoder models
co-variations in an unsupervised fashion and predicts read count expectations.
By comparing the earlier mentioned read count with these context-dependent
expectations, it is revealed as exceptionally high (right panel). The lower
panels illustrate the distribution of read counts before and after applying
the correction for the relevant gene. The red dotted lines depict significance
cutoffs.}

Differential gene expression analysis from RNA-seq data is well-established.
The packages \deseq{}\cite{Love2014} or \Biocpkg{edgeR}\cite{Robinson2010} 
provide effective workflows and preprocessing steps to perform differential gene
expression analysis. However, these methods aim at detecting significant 
differences between groups of samples. In contrast, \outrider{} aims at 
detecting outliers within a given population. A scheme of this difference is 
given in figure \ref{fig:deVsOutlier}.

<<deVsOutlier, echo=FALSE, fig.height=5, fig.cap="Scheme of workflow differences. Differences between differential gene expression analysis and outlier detection.">>=
par.old <- par(no.readonly=TRUE)
par(mfrow=c(1,2), cex=1.2)
ylim <- c(80, 310)
a <- rnorm(10, 250, 10)
b <- rnorm(10, 120, 10)
c <- rnorm(100, 250, 20)
c[1] <- 105
beeswarm(list(A=a, B=b), 
        main="Differential\nexpression analysis\n(DESeq2/edgeR)",
        xlab="Condition", ylab="Expression", ylim=ylim, pch=20, 
        col=c("darkblue", "firebrick"))
beeswarm(c, main="Outlier\ndetection\n(OUTRIDER)", ylim=ylim, pch=20, 
        xlab="Population", ylab="Expression", col="firebrick")

par(par.old)
@

\section{Prerequisites}
To get started on the preprocessing step, we recommend to read the 
introductions of \deseq{}\cite{Love2014}, \Biocpkg{edgeR}\cite{Robinson2010}
or the RNA-seq workflow from
Bioconductor: \Biocpkg{rnaseqGene}. In brief, one usually starts with the raw
FASTQ files from
the RNA sequencing run. Those are then aligned to a given reference genome. 
At the time of writing
(October 2018), we recommend the STAR aligner\cite{Dobin2013}. After 
obtaining the aligned BAM files, one can map the reads to exons or genes of a 
GTF annotation file using HT-seq or by using \Rfunction{summarizedOverlaps} from
\Biocpkg{GenomicAlignments}. The resulting count table can then be loaded 
into the \outrider{} package as we will describe below.

\section{A quick tour}

Here we assume that we already have a count table and no additional 
preprocessing needs to be done. We can start and obtain results with 
3 commands. First, create an \ods{} from a count table or a Summarized 
Experiment object. Second, run the full 
pipeline using the command \Rfunction{OUTRIDER}. In the third and last step the 
results table is extracted from the \ods{} with the \Rfunction{results} 
function. Furthermore, analysis plots that are described in section 
\ref{sec:Results} can be directly created from the \ods{} object.

<<quick_guide, fig.height=5>>=
library(OUTRIDER)

# get data
ctsFile <- system.file('extdata', 'KremerNBaderSmall.tsv', 
        package='OUTRIDER')
ctsTable <- read.table(ctsFile, check.names=FALSE)
ods <- OutriderDataSet(countData=ctsTable)

# filter out non expressed genes
ods <- filterExpression(ods, minCounts=TRUE, filterGenes=TRUE)

# run full OUTRIDER pipeline (control, fit model, calculate P-values)
ods <- OUTRIDER(ods)

# results (only significant)
res <- results(ods)
head(res)

# example of a Q-Q plot for the most significant outlier
plotQQ(ods, res[1, geneID])
@

\section{An \outrider{} analysis in detail}

Apart from running the full pipeline using the single wrapper function 
\Rfunction{OUTRIDER}, 
the analysis can also be run step by step. The wrapper function does
not include any preprocessing functions. Discarding non expressed genes or 
samples failing quality measurements should be done manually before running the 
\Rfunction{OUTRIDER} function or starting the analysis pipeline.

In this section we will explain the analysis functions step by step.

For this tutorial we will use the rare disease data set from Kremer 
\textit{et al}.\cite{Kremer2017}. For testing purposes, this package 
contains a small subset of it.

\subsection{OutriderDataSet}
To use \outrider{} create an \ods, which derives from a 
RangedSummarizedExperiment
object. The \ods can be created by supplying a count matrix and optional sample 
annotation matrices. Alternatively, an existing Summarized experiment object
from other Biocnductor backages can be used.

<<GetDataSet>>=
# small testing data set
odsSmall <- makeExampleOutriderDataSet(dataset="Kremer")

# full data set from Kremer et al.
count_URL <- paste0("http://media.nature.com/original/nature-assets/", 
        "ncomms/2017/170612/ncomms15824/extref/ncomms15824-s1.txt")
anno_URL <- paste0("http://media.nature.com/original/nature-assets/", 
        "ncomms/2017/170612/ncomms15824/extref/ncomms15824-s8.txt")

ctsTable <- read.table(count_URL, sep="\t")
annoTable <- read.table(anno_URL, sep="\t", header=TRUE)
annoTable$sampleID <- annoTable$RNA_ID

# create OutriderDataSet object
ods <- OutriderDataSet(countData=ctsTable, colData=annoTable)
@

\subsection{Preprocessing}

It is recommended to preprocess the data before fitting. Our model 
requires that for every gene at least one sample has a non-zero count and 
that we observe at least one read for every 100 samples. Therefore, 
all genes that are not expressed must be discarded.

We provide the function \Rfunction{filterExpression} to remove genes that have 
low FPKM (Fragments Per Kilobase of transcript per Million mapped reads) 
expression values. The needed annotation to estimate FPKM values 
from the counts should be the same as for the counting. Here, we normalize 
by the total exon length of a gene.
To do so the joint lenght of all exons needs to be provided. When providing a 
gtf, gff or TxDb object to the \Rfunction{filterExpression}, we extract this 
information automatically. But therfore the geneID's of the count table and the 
gtf need to match.

By default the cutoff is set to an FPKM value of one and only the filtered
\ods{} object is returned. If required, the FPKM values can be stored in the
\ods{} object and the full object can be returned to visualize the 
distribution of reads before and after filtering. 


<<Preprocessing, fig.height=5>>=
# get annotation
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(org.Hs.eg.db)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
map <- select(org.Hs.eg.db, keys=keys(txdb, keytype = "GENEID"), 
        keytype="ENTREZID", columns=c("SYMBOL"))
@

However, the \Robject{TxDb.Hsapiens.UCSC.hg19.knownGene} contains only 
well annotated 
genes. This annotation will miss a lot of genes captured by RNA-seq. To include 
all predicted annotations as well as non-coding RNAs please download the txdb 
object from our homepage\footnote{\url{https://cmm.in.tum.de/public/paper/mitoMultiOmics/ucsc.knownGenes.db}}
or create it yourself from the UCSC website\footnote{\url{https://genome.ucsc.edu/cgi-bin/hgTables}}$^,$\footnote{\url{http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format}}.

<<create txdb, eval=FALSE>>=
try({
    library(RMariaDB)
    library(AnnotationDbi)
    con <- dbConnect(MariaDB(), host='genome-mysql.cse.ucsc.edu', 
            dbname="hg19", user='genome')
    map <- dbGetQuery(con, 'select kgId AS TXNAME, geneSymbol from kgXref')
    
    txdbUrl <- paste0("https://cmm.in.tum.de/public/",
            "paper/mitoMultiOmics/ucsc.knownGenes.db")
    download.file(txdbUrl, "ucsc.knownGenes.db")
    txdb <- loadDb("ucsc.knownGenes.db")
    
})
@

<<filtering, fig.height=5>>=
# calculate FPKM values and label not expressed genes
ods <- filterExpression(ods, txdb, mapping=map, 
        filterGenes=FALSE, savefpkm=TRUE)

# display the FPKM distribution of counts.
plotFPKM(ods)

# display gene filter summary statistics 
plotExpressedGenes(ods)

# do the actual subsetting based on the filtering labels
ods <- ods[mcols(ods)$passedFilter,]
@

\subsection{Controlling for Confounders}

The next step in any analysis workflow is to visualize the correlations
between samples. In most RNA-seq experiments correlations between the samples 
can be observed. These are often due to technical confounders 
(e.g. sequencing batch) 
or biological confounders (e.g. sex, age). These confounders can adversely
affect the 
detection of aberrant features. Therefore, we provide options to control for 
them.

<<plotting_between_sample_correlations>>=

# Heatmap of the sample correlation
# it can also annotate the clusters resulting from the dendrogram
ods <- plotCountCorHeatmap(ods, colGroups=c("SEX", "RNA_HOX_GROUP"),
        normalized=FALSE, nRowCluster=4)

# Heatmap of the gene/sample expression
ods <- plotCountGeneSampleHeatmap(ods, colGroups=c("SEX", "RNA_HOX_GROUP"),
        normalized=FALSE, nRowCluster=4)

@

We have different ways to control for confounders present in the data.
The first and standard way is to calculate the \Rfunction{sizeFactors} 
as done in \deseq{}\cite{Love2014}.

Additionally, the \Rfunction{controlForConfounders} function calls an 
autoencoder that automatically controls for confounders present in the data.
Therefore an encoding dimension $q$ needs to be set or the default value 20 is 
used. The optimal value of $q$ can be determined using the
\Rfunction{findEncodingDim} function. 
After controlling for confounders, the heatmap should be plotted again.
If it worked, no batches should be present and the correlations between 
samples should be reduced and close to zero.

<<controlling_for_confounders>>=

# automatically control for confounders
# we use only 3 iterations to make the vignette faster. The default is 15. 
ods <- estimateSizeFactors(ods)
ods <- controlForConfounders(ods, q=21, iterations=3)

# Heatmap of the sample correlation after controlling
ods <- plotCountCorHeatmap(ods, normalized=TRUE, 
        colGroups=c("SEX", "RNA_HOX_GROUP"))

@

Alternatively, other methods can be used to control for confounders. In addition
to the \textit{autoencoder}, we implemented a PCA based approach. The PCA 
implementation can be utilized by setting \Robject{implementation="pca"}. 
Also PEER can be used together with the OUTRIDER framework. A detailed 
description on how to do this can be found in section \ref{sec:PEER}.
Furthermore, any other method can be used by providing the 
\Rfunction{normalizationFactor} matrix. 
This matrix must be computed beforehand using the appropriate method. 
Its purpose is to normalize for technical effects or control for additional
expression patterns. 


\subsection{Finding the right encoding dimension \textit{q}}

In the previous section, we fixed the encoding dimension $q=21$. But having 
the right encoding dimension is crucial in finding outliers in the data. 
On the one hand, if \textit{q} is too big the autoencoder will learn the 
identity matrix and will overfit the data. On the other hand, if \textit{q}
is too small the autoencoder cannot learn the necessary covariates existing
in the data. Therefore, it is recommended for any new dataset to estimate the
optimal encoding dimension to gain the best performance. With the function 
\Rfunction{findEncodingDim} one can find the optimal encoding dimension. 
To this end, we artificially introduce corrupted counts randomly into the 
dataset and monitor the performance calling those corrupted counts. The optimal
dimension \textit{q} is then selected as the dimension maximizing the area under
the precision-recall curve for identifying corrupted counts.

<<findEncodingDim, eval=FALSE>>=
# find the optimal encoding dimension q
ods <- findEncodingDim(ods)

# visualize the hyper parameter optimization
plotEncDimSearch(ods)

@

Since this function runs a full OUTRIDER fit for a range of encoding dimensions,
it is quite CPU intensive, but can increase the overall performance of the
autoencoder and is recommended for any data set. If \textit{q} is not provided
by the user, it will be estimated based on the number of samples.


\subsubsection{Excluding samples from the autoencoder fit}

Since OUTRIDER expects that each sample within the population is independent 
of all others, replicates could mask effects specific to this sample. This is 
also true if trios are present in the data, where the parents can be seen as
biological replicates. Here, we recommend to exclude the sample of interest or 
the replicates from the fitting. Later on, for all samples P-values 
are calculated. 

In this rare disease data set we know that two samples (MUC1344 and MUC1365)
have the same defect. To exclude one or both of them, we can use
the \Rfunction{sampleExclusionMask} function.

<<maskSamples>>=

# set exclusion mask
sampleExclusionMask(ods) <- FALSE
sampleExclusionMask(ods[,"MUC1365"]) <- TRUE

# check which samples are excluded from the autoencoder fit
sampleExclusionMask(ods)

@

\subsection{Fitting the negative binomial model}

The fit of the negative binomial model is done during the autoencoder fitting.
This step is only needed if alternative methods to control the data are used.
To fit the dispersion and the mean, the \Rfunction{fit} function is applied
to the \ods{}.

<<fitting, eval=FALSE>>=

# fit the model when alternative methods where used in the control step
ods <- fit(ods)
hist(theta(ods))

@

\subsection{P-value calculation}

After determining the fit parameters, two-sided P-values are computed using 
the following equation:

\begin{equation}
    p_{ij} = 2 \cdot min \left\{\frac{1}{2}, \sum_{0}^{k_{ij}} 
        NB(\mu_{ij} ,\theta_i), 1 - \sum_{0}^{k_{ij-1}} 
        NB(\mu_{ij} ,\theta_i) \right\},
\end{equation}

where the $\frac{1}{2}$ term handles the case of both terms exceeding 0.5, 
which can happen due to the discrete nature of counts. Here $\mu_{ij}$ are 
computed as the product of the fitted correction values from the autoencoder and
the fitted mean adjustements. If required a one-sided test can be performed 
using the argument \Robject{alternative} and specifying 'less' or 'greater' 
depending on the research question. Multiple testing correction is done across 
all genes in a per-sample fashion using Benjamini-Yekutieli's false discovery 
rate method\cite{Benjamini2001}. Alternatively, all adjustment methods supported
by \Rfunction{p.adjust} can be used via the \Robject{method} 
argument.

<<pValue_computation>>=
# compute P-values (nominal and adjusted)
ods <- computePvalues(ods, alternative="two.sided", method="BY")
@

\subsection{Z-score calculation}
The Z-scores on the log transformed counts can be used for visualization,
filtering, and ranking of samples. By running the \Rfunction{computeZscores}
function, the Z-scores are computed and stored in the \ods{} object.
The Z-scores are calculated using:

\begin{equation}
    z_{ij} = \frac{l_{ij} - \mu_j^l}{\sigma_j^l}
\end{equation}
\begin{equation*}
    l_{ij} = \log_2{(\frac{k_{ij} + 1}{c_{ij} + 1})},
\end{equation*}

where $\mu_j^l$ is the mean and $\sigma_j^l$ the 
standard deviation of gene $j$ and $l_{ij}$ is the log 
transformed count after correction for confounders.

<<zScore_computation>>=
# compute the Z-scores
ods <- computeZscores(ods)
@

\section{Results}
\label{sec:Results}
The \outrider{} package offers multiple ways to display the results. It 
creates a results table containing all the values computed during the analysis.
Furthermore, it offers various plot functions that guide the user 
through the analysis.

\subsection{Results table}

The \Rfunction{results} function gathers all the previously computed values
and combines them into one table.

<<results fun>>=
# get results (default only significant, padj < 0.05)
res <- results(ods)
head(res)
dim(res)

# setting a different significance level and filtering by Z-scores
res <- results(ods, padjCutoff=0.1, zScoreCutoff=2)
head(res)
dim(res)
@


\subsection{Number of aberrant genes per sample}

One quantity of interest is the number of aberrantly expressed genes 
per sample. This can be displayed using the plotting function 
\Rfunction{plotAberrantPerSample}. Alternatively, the function 
\Rfunction{aberrant} can be used to identify aberrant events, which can be 
summed by sample or gene using the paramter \Robject{by}. These numbers depend 
on the cutoffs, which can be specified in both functions (\Robject{padjCutoff} 
and \Robject{zScoreCutoff}).

<<aberrantperSample, fig.height=5>>=
# number of aberrant genes per sample
tail(sort(aberrant(ods, by="sample")))
tail(sort(aberrant(ods, by="gene", zScoreCutoff=1)))

# plot the aberrant events per sample
plotAberrantPerSample(ods, padjCutoff=0.05)
@

\subsection{Volcano plots}

To view the distribution of P-values on a sample level, volcano plots can 
be displayed. Most of the plots make use of the \CRANpkg{plotly} 
framework to create interactive plots. To only use basic R functionality 
from \CRANpkg{graphics} the \Rcode{basePlot} argument can be set to 
\Rcode{TRUE}.

<<volcano, fig.height=5>>=
# MUC1344 is a diagnosed sample from Kremer et al.
plotVolcano(ods, "MUC1344", basePlot=TRUE)
@


\subsection{Gene level plots}

Additionally, we include two plots at the gene level. 
\Rfunction{plotExpressionRank} plots the counts in ascending order. 
By default, the controlled counts are plotted. To plot raw counts, the argument 
\Robject{normalized} can be set to \Rcode{FALSE}. 

When using the \CRANpkg{plotly} framework for plotting, all computed values 
are displayed for each data point. The user can access this information 
by hovering over each data point with the mouse.

<<visualization2, fig.height=5>>=
# expression rank of a gene with outlier events
plotExpressionRank(ods, "TIMMDC1", basePlot=TRUE)
@

The quantile-quantile plot can be used to see whether the fit converged well.
In presence of an outlier, it can happen that most of the points end up below 
the confidence band. This is fine and indicates that we have conservative 
P-values for the other points. Here is an example with two outliers:

<<visualization3, fig.height=5>>=
## QQ-plot for a given gene
plotQQ(ods, "TIMMDC1")
@

Since we do test how fare the observed count is away from the exprected 
expression level, it is also helpful to visualize the predictions against the 
observed counts. 

<<visualization4, fig.height=5>>=
## Observed versus expected gene expression
plotExpectedVsObservedCounts(ods, "TIMMDC1", basePlot=TRUE)
@

\section{Additional features}

\subsection{Using PEER to control for confounders}
\label{sec:PEER}

PEER\cite{Stegle2012} is a well known tool to control for unknown effects in
RNA-seq data. PEER is only available through the \Githubpkg{PMBio/peer} 
GitHub repository. The R source code can be downloaded form here: 
\url{https://github.com/downloads/PMBio/peer/R_peer_source_1.3.tgz}. The 
installation of the package has to be done manually by the user. 
After the installation one can use the following function to control for 
confounders with PEER. 

<<peer_function, eval=FALSE>>=
#'
#' PEER implementation
#'
peer <- function(ods, maxFactors=NA, maxItr=1000){
    
    # check for PEER
    if(!require(peer)){
        stop("Please install the 'peer' package from GitHub to use this ",
                "functionality.")
    }
    
    # default and recommendation by PEER: min(0.25*n, 100)
    if(is.na(maxFactors)){
        maxFactors <- min(as.integer(0.25* ncol(ods)), 100)
    }
    
    # log counts
    logCts <- log2(t(t(counts(ods)+1)/sizeFactors(ods)))
    
    # prepare PEER model
    model <- PEER()
    PEER_setNmax_iterations(model, maxItr)
    PEER_setNk(model, maxFactors)
    PEER_setPhenoMean(model, logCts)
    PEER_setAdd_mean(model, TRUE)
    
    # run fullpeer pipeline
    PEER_update(model)
    
    # extract PEER data
    peerResiduals <- PEER_getResiduals(model)
    peerMean <- t(t(2^(logCts - peerResiduals)) * sizeFactors(ods))
    
    # save model in object
    normalizationFactors(ods) <- pmax(peerMean, 1E-8)
    metadata(ods)[["PEER_model"]] <- list(
            alpha     = PEER_getAlpha(model),
            residuals = PEER_getResiduals(model),
            W         = PEER_getW(model))
    
    return(ods)
}
@ 

With the function above we can run the full OUTRIDER pipeline as follows:

<<how to run peer, eval=FALSE>>=
# Control for confounders with PEER
ods <- estimateSizeFactors(ods)
ods <- peer(ods)
ods <- fit(ods)
ods <- computeZscores(ods, peerResiduals=TRUE)
ods <- computePvalues(ods)

# Heatmap of the sample correlation after controlling
ods <- plotCountCorHeatmap(ods, normalized=TRUE)
@


\subsection{Power anaylsis}

We provide the \Rfunction{plotPowerAnalysis} function to show, what kind of 
changes can be significant depending on the mean count.
<<visualizationSigLevels, fig.height=5>>=
## P-values versus Mean Count
plotPowerAnalysis(ods)
@
Here, we see that it is only for sufficiently high expressed genes possible, to 
obtain significant P-values, especially for the downregulation cases.

\bibliography{bibliography}

\section*{Session info}

Here is the output of \Rfunction{sessionInfo()} on the system on which 
this document was compiled:

<<sessionInfo, echo=FALSE>>=
sessionInfo()
@

\end{document}