%\VignetteIndexEntry{Description of ExiMiR}

\documentclass[11pt,a4paper]{article}
\usepackage[OT1]{fontenc}
\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{Sweave}
\usepackage{subfigure}
\usepackage{textcomp}
\usepackage{fancyvrb}
\DeclareGraphicsExtensions{.png}
\graphicspath{{images}}

\textwidth=6.2in
\textheight=8.5in
%\textheight=9.0in
%\parskip=.3cm
\oddsidemargin=.1in
\evensidemargin=.1in
\headheight=-.3in

\renewcommand{\thefootnote}{\alph{footnote}}

\begin{document}
\title{Description of \emph{ExiMiR}}
\author{Sylvain Gubian, Alain Sewer, PMP SA}

\maketitle
\tableofcontents

\section{Introduction}

The \emph{ExiMiR} package provides tools for normalizing miRNA expression data obtained
primarily from Exiqon miRCURY LNA\texttrademark\ arrays (though other formats are supported
as well). It gives the possibility of applying a
novel miRNA-specific normalization method using spike-in probes and is based on
controlled assumptions \cite{bibi}. These features allow to take into account the
differences between miRNA and gene (mRNA) expression data, as discussed in a recent
study \cite{sarkar}. \\

\emph{ExiMiR} is particularly suited for two-color microarray experiments using a
common reference. In such cases the spike-in probe-based normalization method allows to
treat the raw data as if they were coming from single-channel arrays, like
Affymetrix\textsuperscript{\textregistered} Genechip\textsuperscript{\textregistered}.
This is why the objects and functions in \emph{ExiMiR} have been chosen to
closely resemble those of the ''single-color`` \emph{affy} package, while remaining
compatible with those of the ''two-color`` \emph{limma} package, as shown on
Figure \ref{fig0}.\\

The latest versions of \emph{ExiMiR} explicitly integrate the raw data objects
from the \emph{limma} package. Thus the spike-in probe-based normalization method
can be applied to raw data formats other than Exiqon or Affymetrix, as long as
suitable ''spike-in probes`` are provided.\\

\begin{figure}[t]
\begin{center}
\fbox{\includegraphics{fig0}}
\caption{Overview of the functions of the ExiMiR package (in blue).}
\label{fig0}
\end{center}
\end{figure}

Further features of \emph{ExiMiR} include:
\begin{itemize}
\item
reading raw data in the ImaGene\textsuperscript{\textregistered} TXT format
used for the Exiqon miRCURY LNA\texttrademark\ arrays;
\item
allowing to update the array probe annotations to the latest
\href{http://www.mirbase.org}{miRBase} releases incorporated in the Exiqon GAL files;
\item
enabling a background correction adapted to the raw data format by wrapping the
\emph{limma} function \verb+backgroundCorrect+;
\item
entering a user-defined set of appropriate probe IDs that can be used by the normalization
method instead of the automatically selected spike-in probes;
\item
enabling to generate the same objects for summarized miRNA expression data after
having applied the alternative normalization methods contained in both \emph{affy} and
\emph{limma} packages.

\end{itemize}

This vignette shows how to process raw expression data obtained from the
Affymetrix\textsuperscript{\textregistered} Genechip\textsuperscript{\textregistered}
miRNA array (CEL and CDF files) in order to illustrate the similarities between
\emph{ExiMiR} and \emph{affy}. An example with other raw data formats supported by
\emph{limma} is also discussed.


\section{Raw and annotation data}
\label{sec:data}

This section describes how to find raw and annotation data on which \emph{ExiMiR} can
be applied.
\nolinebreak
\footnote{N.B.: \texttt{R} objects corresponding to the raw and annotation data
described in this section are provided by the \emph{ExiMiR} package, see Section
\ref{sec:example}}
It covers both Affymetrix (CEL and CDF) and Exiqon/ImaGene (TXT and
GAL) cases. If you have your own expression data in CEL or TXT formats, then you just
need to complete them with the annotation files in CDF or GAL formats, respectively,
as described below. Do no forget the appropriate \verb+samplesinfo.txt+ file for the
Exiqon case. In case your raw data are neither in Affymetrix CEL nor in Exiqon/ImaGene
TXT formats, then first read Subsection \ref{subsec:othernorm} to see whether
\emph{ExiMiR} can be applied.


\subsection{Affymetrix}
\label{subsec:affydata}
First create a directory \verb+Affymetrix+ in your file system. The GEO repository
\linebreak(\href{http://www.ncbi.nlm.nih.gov/geo/}{http://www.ncbi.nlm.nih.gov/geo/}) contains several datasets using
the Affymetrix \mbox{miRNA-1\_0} array. We choose the series
\href{http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE19183}{GSE19183} for which
the raw data file \verb+GSE19183_RAW.tar+ can be downloaded from this
\href{http://www.ncbi.nlm.nih.gov/geosuppl/?acc=GSE19183}{URL}.
Extract the CEL files into the \verb+Affymetrix+ directory. As long as \verb+R+ can
connect to the Bioconductor server (\href{http://www.bioconductor.org/}{http://www.bioconductor.org/}), the annotation
will be generated automatically by the \emph{affy} package when reading the CEL files
(see Subsection \ref{subsec:affynorm}). If this is not the case, you need to get and
load the package
\href{http://bioconductor.org/packages/mirna10cdf/}
{\emph{mirna10cdf}}, which will create the appropriate annotation environment for
the dataset GSE19183 in your \verb+R+ session.


\subsection{Exiqon}
\label{subsec:exidata}

First create a directory \verb+Exiqon+ in your file system. The GEO series
\href{http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20122}{GSE20122} contains a
suitable raw data file \verb+GSE20122_RAW.tar+ at the following
\href{http://www.ncbi.nlm.nih.gov/geosuppl/?acc=GSE20122}{URL}.
After downloading it, extract and expand (\verb+gunzip+) the enclosed raw data file in
ImaGene TXT format into the \verb+Exiqon+ directory. Then download the corresponding
annotation GAL file from the following
\href{http://shop.exiqon.com/annotations/download/gal_208200-A,208201-A,208202-A_lot31022-31022_hsa,mmu,rno-and-related-vira_from_mb160,miRPlus.gal}{URL}
into the \verb+Exiqon+ directory as well. Finally, copy and paste the content of
Appendix \ref{sec:app} of this vignette into a TAB-separated text file called
\verb+samplesinfo.txt+ and also located the \verb+Exiqon+ directory. This file is
required by \emph{ExiMiR} to match the raw data results from the Hy3 and Hy5 channels.
\footnote{In the Exiqon common reference experimental design, Hy3 and Hy5 are usually the samples
and the reference channels, respectively.}
It contains the names of all the TXT files in the experiment, organized in a table with
one row for each array and two columns corresponding to the two channels Hy3 and Hy5.
\footnote{For practical purposes only 12 of the 54 Hy3-Hy5 raw data filename
pairs from the GEO series GSE20122 are listed in Appendix \ref{sec:app}. Feel free to
complete \texttt{samplesinfo.txt} with the 42 remaining ones if you want to be
exhaustive!}
It is very similar to the file \verb+Targets.txt+ required by the \emph{limma} package
and is usually provided by Exiqon together with the raw data TXT files.


\section{Raw data normalization}
\label{sec:norm}

This section describes how to apply \emph{ExiMiR} to normalize raw miRNA
expression data obtained from the Affymetrix\textsuperscript{\textregistered} miRNA
Genechip\textsuperscript{\textregistered} or from the Exiqon miRCURY LNA\texttrademark\
arrays.
Under specific conditions, this is also possible for other raw data formats
(see Subsection \ref{subsec:othernorm}).
Notice that although these descriptions are generic, some of the
filenames given in the command-line examples might differ from case to case (e.g. GAL
filenames). Begin by launching an \verb+R+ session at the same level as the
\verb+Affymetrix+ and \verb+Exiqon+ directories created in Section \ref{sec:data}.


\subsection{Affymetrix: from CEL files to ExpressionSet objects}
\label{subsec:affynorm}

\emph{You don't need to run the following three commands if your} \verb+R+ \emph{installation can
connect the Bioconductor server} (\href{http://www.bioconductor.org/}{http://www.bioconductor.org/}).\\

The array annotation environment is created by loading the \emph{mirna10cdf} package:
\begin{Sinput}
R> library(mirna10cdf)
\end{Sinput}
Using the \verb+makecdfenv+ package, the above corresponds to the following
two commands
\begin{Sinput}
R> library(makecdfenv)
R> cdfenv <- make.cdf.env(cdf.path='Affymetrix', filename='miRNA-1_0.CDF')
\end{Sinput}
provided that the CDF file \verb+miRNA-1_0.CDF+ (formerly downloadable from the Affymetrix
website) is located in the \verb+Affymetrix+ folder.\\

Load the CEL file raw data into an AffyBatch object using the \emph{affy}
package:
\footnote{
Drop the argument \texttt{cdf.name='cdfenv'} if the environment \texttt{cdfenv} has not been
defined previously with \texttt{make.cdf.env}.
}
\begin{Sinput}
R> library(affy)
R> abatch <- ReadAffy(cdfname='cdfenv', celfile.path='Affymetrix')
\end{Sinput}
Raw data normalization is directly applied on \verb+abatch+ to create an
ExpressionSet object. For instance:
\footnote{This is equivalent to \texttt{eset.rma <- rma(abatch)}.}
\begin{Sinput}
R> eset.rma <- expresso(abatch, bgcorrect.method='rma',
                                normalize.method='quantiles',
                                pmcorrect.method='pmonly',
                                summary.method='medianpolish')
\end{Sinput}
As an alternative to the above RMA quantile normalization validated for gene (mRNA)
expression \cite{RMA}, the spike-in probe-based method implemented in the \emph{ExiMiR} function
\verb+NormiR+ might give better results for miRNA expression data \cite{bibi,sarkar}:

\begin{Sinput}
R> library(ExiMiR)
R> eset.spike <- NormiR(abatch, bgcorrect.method='rma',
                                normalize.method='spikein',
                                pmcorrect.method='pmonly',
                                summary.method='medianpolish')
\end{Sinput}
For the GSE19183 dataset, the assumptions allowing the application of the \verb+NormiR+
spike-in probe-based normalization are not satisfied and a default median normalization
is performed instead. Section \ref{sec:trouble} describes this safeguarding strategy
and the options allowing to deal with problematic cases. If the \verb+NormiR+
assumptions are satisfied, a series of control figures are generated. Their
description is given in Section \ref{subsec:figs} below.


\subsection{Exiqon: from TXT files to ExpressionSet objects}
\label{subsec:exinorm}


First load the \emph{ExiMiR} package:

\begin{Sinput}
R> library(ExiMiR)
\end{Sinput}
Then create the array annotation environment using the GAL file and the
\verb+make.gal.env+ function:

\begin{Sinput}
R>  make.gal.env(galname='galenv', gal.path='Exiqon')
\end{Sinput}
Read the raw data TXT files into an AffyBatch object using the \verb+ReadExi+ function:
\footnote{
\texttt{ReadExi} stores the dual-channel foreground and background intensity data as follows:
the foreground data matrix of the Hy3 channel (second column of \texttt{samplesinfo.txt})
and the foreground data matrix of the Hy5 channel (third column of \texttt{samplesinfo.txt})
are concatenated and stored in the \texttt{exprs} slot of the resulting AffyBatch object.
The same goes for the background data in the \texttt{se.exprs} slot.\label{footnote:readexi}
}

\begin{Sinput}
R> ebatch <- ReadExi(galname='galenv', txtfile.path='Exiqon')
\end{Sinput}
Similarly to the Affymetrix case in Subsection \ref{subsec:affynorm}, the raw data
normalization is applied on \verb+ebatch+ to create an ExpressionSet object. For
instance the RMA quantile normalization from the \emph{affy} package \cite{RMA}, using the option \verb+bg.correct=FALSE+, as recommended by a recent study\cite{lopez}:
\footnote{This command is not equivalent to \texttt{eset.rma <- rma(ebatch,background=FALSE)}
because \texttt{NormiR} applies the quantile normalization separately on the data
from the Hy3 and Hy5 channels (which have been concatenated in \texttt{ebatch} by
\texttt{ReadExi}) while \texttt{rma} mixes the data from two channels.\label{footnote:rma}
}
\begin{Sinput}
R> eset.rma <- NormiR(ebatch, bg.correct=FALSE,
                              normalize.method='quantile',
                              summary.method='medianpolish')
\end{Sinput}
However, the assumptions for applying the quantile normalization are not guaranteed to be satisfied in
the case of miRNA expression data \cite{bibi, sarkar}. It might be better to use
the spike-in probe-based method from \emph{ExiMiR}:

\begin{Sinput}
R> eset.spike <- NormiR(ebatch, bg.correct=FALSE,
                                normalize.method='spikein',
                                summary.method='medianpolish')
\end{Sinput}
If all the assumptions underlying the application of the \verb+NormiR+ spike-in probe-based
normalization are satisfied, a series of control figures are generated, which will be
explained in Subsection \ref{subsec:figs}. If one or more assumptions are not met,
then the median normalization is applied instead. However, \emph{ExiMiR} offers several
options to deal with such situations and still allow the application of the spike-in
probe-based method, as explained in Subsections \ref{subsec:problems} and
\ref{subsec:opt}.


\subsection{Other formats: reading with \emph{limma} and normalizing with \emph{ExiMiR}}
\label{subsec:othernorm}

The approach consists in reading the raw data using the \emph{limma} package and then
transforming the result into an AffyBatch object using the function \verb+creatAB+, as
illustrated on Figure \ref{fig0}. Therefore the
conditions for applying \emph{ExiMiR} on other raw data forms are
(1) the ability to load the raw data into \emph{limma} objects (EList or RGList), and
(2) the availability of probe sets that can be used as ''spike-in probes`` during
normalization. The parameter \verb+probeset.list+ of the function \verb+NormiR+ allows to
select the IDs of such ''spike-in probes`` for running the normalization method, as long
as they satisfy its underlying assumptions (see Subsections \ref{subsec:figs} and
\ref{subsec:opt}).\\

As a toy example, this approach is applied on the Exiqon raw data from Subsection
\ref{subsec:exinorm}, which satisfy the two conditions discussed above. You will need to
set the parameters of the function \verb+read.maimage+ according to your
particular case (see Chapter 4 of the
\href{www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf}{
\emph{limma} User's Guide} for more details).\\

First create the \verb+Targets.txt+ file in the \verb+Exiqon+ directory. In our case
this consists simply in replacing the header line of the file \verb+samplesinfo.txt+ by
''\verb+FileName+\hspace{0.5 cm}\verb+Cy3+\hspace{0.5 cm}\verb+Cy5+``
(TAB-separated) and saving the result as \verb+Targets.txt+. Then proceed by
reading the raw data and annotating the probe sets using the available GAL file:
\begin{Sinput}
R> library(limma)
R> targets <- readTargets(path='Exiqon')
R> RGList <- read.maimages(targets[,c('Cy3','Cy5')], source='imagene', path='Exiqon')
R> RGList$genes <- readGAL(path='Exiqon')
R> RGList$printer <- getLayout(RGList$genes)	# optional
\end{Sinput}
The last line enables plotting the spatial distributions of the intensities of the arrays
using the \verb+image+ method of the AffyBatch object created below. Running
\verb+getLayout+ is not always possible (see Section 4.7 of of the
\href{www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf}
{\emph{limma} User's Guide}) and not needed for the spike-in probe-based normalization.
The AffyBatch object is created using the function \verb+createAB+ of the \emph{ExiMiR}
package:
\footnote{
\texttt{createAB} stores dual-channel foreground and background intensity data in the same
way as \texttt{ReadExi}, see footenote \ref{footnote:readexi} in Subsection \ref{subsec:exinorm}.
}
\begin{Sinput}
R> library(ExiMiR)
R> obatch <- createAB(RGList)
\end{Sinput}
Next the ''spike-in probes`` used for normalization must be specified. In our toy example
they will consist in a subset of the available annotated spike-in probe sets, excluding
\verb+spike_control_f+. Other choices are possible, as long as \verb+NormiR+
can be run (see Subsections \ref{subsec:figs} and \ref{subsec:opt}). The raw
data normalization is then performed similarly as above (see Subsection
\ref{subsec:exinorm}):
\begin{Sinput}
R> spikein.all <- grep('^spike', featureNames(obatch), value=TRUE)
R> spikein.subset <- setdiff(spikein.all, 'spike_control_f')
R> eset.spike <- NormiR(obatch, bg.correct=FALSE,
                                normalize.method='spikein',
                                normalize.param=list(probeset.list=spikein.subset),
                                summary.method='medianpolish')
\end{Sinput}
The comments made after the execution of \verb+NormiR+ at the end of the
previous subsection remain valid in the present case.


\subsection{More information about \texttt{NormiR}}
\label{subsec:normir}

\emph{This paragraph provides additional information about the \texttt{NormiR} function
and can be skipped on first reading.}\\

The function \verb+NormiR+ has been designed to closely resemble the \verb+expresso+
function of the \emph{affy} package. Therefore their usages are very similar
(see Section 3.3.1 of the \emph{affy}
\href{http://www.bioconductor.org/packages/release/bioc/vignettes/affy/inst/doc/affy.pdf}
{vignette}):
\begin{Sinput}
ESet <- NormiR(AffyBatch,
             # background correction
               bg.correct=TRUE,
               bgcorrect.method='auto',
               bgcorrect.param=list(),
             # normalization
               normalize=TRUE,
               normalize.method='spikein',
               normalize.param=list(),
             # PM correction (enabled only when MM-values are available)
               pmcorrect.method='pmonly',
               pmcorrect.param=list(),
             # summarization
               summary.method='medianpolish',
               summary.param=list(),
             # miscellaneous
               verbose=TRUE
               ...)
\end{Sinput}
The methods proposed by \verb+NormiR+ for the first step of background
correction are provided by the \emph{affy} or \emph{limma} packages,
depending on whether the input AffyBatch object has been created with
\verb+ReadAffy+ or \verb+ReadExi+/\verb+createAB+, respectively. The
\emph{ExiMiR} function \verb+NormiR.bgcorrect.methods+ lists the
available methods, as for instance in the case of the AffyBatch
objects defined in Subsections \ref{subsec:affynorm} and
\ref{subsec:exinorm}:
\begin{Sinput}
R> NormiR.bgcorrect.methods(abatch)
[1] "bg.correct"	"mas"	"none"	"rma"
R> NormiR.bgcorrect.methods(ebatch)
[1] "edwards"	"half"	"minimum"	"movingmin" "none" "normexp" ...
\end{Sinput}
The \emph{limma} background correction methods requiring background intensity
values are provided the suitable input data, automatically extracted from the
AffyBatch object previously created with \verb+ReadExi+/\verb+creatAB+.
\footnote{
The way \texttt{ReadExi} and \texttt{creatAB} store dual-channel foreground
and background intensity data in an AffyBatch object has been described above,
see Footnote \ref{footnote:readexi}.
}
Therefore, applying for instance the \verb+normexp+ background correction on the
\verb+ebatch+ AffyBatch object is straightforward with \verb+NormiR+:
\begin{Sinput}
R> eset.spike <- NormiR(ebatch, bgcorrect.method='normexp'
			        bgcorrect.param=list(offset=50),
			        normalize.method='spikein',
                                summary.method='medianpolish')
\end{Sinput}
For the next step of normalization, the methods are independent of the
AffyBatch objects and explicitly enumerated by the \emph{ExiMiR} function
\verb+NormiR.normalize.methods+:
\begin{Sinput}
R> NormiR.normalize.methods()
[1] "mean"	"median"	"none"	"quantile"	"spikein"
\end{Sinput}
As explained above (see Footnote \ref{footnote:rma}), the quantile
normalization \verb+quantile+ treats separately the two channels
in case of dual-channel raw data processed by \verb+ReadExi+ or
\verb+creatAB+. When the spike-in probe-based normalization \verb+spikein+
is chosen, \verb+NormiR+ automatically detects the spike-in probes,
based on the Affymetrix or Exiqon annotations. If this detection is not
successful or if you want to use your own selection of ''spike-in probes``,
the parameter \verb+probeset.list+
in \verb+NormiR+ allows to enter the appropriate list of probe set IDs
(see also Subsection \ref{subsec:opt}). More generally, all the \verb+NormiR+
parameters enabling fine-tuning of the spike-in probe-based
normalization \verb+spikein+ are provided as a list with their default
values by the function \verb+NormiR.spikein.params+. You can modify this
list and submit it to \verb+NormiR+ using the parameter
\verb+normalize.param+.
\footnote{
Due to compatibility constraints with older versions of \texttt{NormiR},
the normalization parameters provided by \texttt{NormiR.spikein.params}
are also adjustable as explicit \texttt{NormiR} arguments, for instance
\texttt{NormiR(ebatch, normalize.method='spikein', normalize.param=list(figures.show=FALSE))}
is equivalent to
\texttt{NormiR(ebatch, method='spikein', figures.show=FALSE)}.
}
This constitutes the starting point for controlling the execution of \verb+NormiR+
(see Section \ref{sec:trouble} for details). For instance, using the subset
of spike-in probes \verb+spikein.subset+ defined in
Subsection \ref{subsec:othernorm}:
\begin{Sinput}
R> NormiR.spikein.params()
R> spikein.params <- list(probeset.list=spikein.subset,
			  loess.span=0.6,
			  force.zero=TRUE)
R> eset.spike <- NormiR(ebatch, bgcorrect.method='normexp'
			        bgcorrect.param=list(offset=50),
			        normalize.method='spikein',
			        normalize.param=spikein.params,
                                summary.method='medianpolish')
\end{Sinput}
The next step of PM correction is enabled only when the CDF environment
provides values for the MM probes in the AffyBatch object, which is not
the case of the Exiqon data considered here.\\
The last step is the probe summarization, for which \verb+NormiR+
proposes the methods of the \emph{affy} package:
\begin{Sinput}
R> NormiR.summary.methods()
[1] "avgdiff"	"liwong"	"mas"	"medianpolish"	"playerout"
\end{Sinput}
Actually, \verb+NormiR+ is a wrapper for three lower-level
functions, which all accept the corresponding \verb+NormiR+
parameters. Therefore the previous \verb+NormiR+ command is
equivalent to:
\begin{Sinput}
R> ebatch.tmp <- bg.correct.miR(ebatch, bgcorrect.method='normexp',
    			                bgcorrect.param=list(offset=50))
R> ebatch.tmp <- norm.miR(ebatch.tmp, normalize.method='spikein',
			              normalize.param=spikein.params)
R> eset.spike <- summarize.miR(ebatch.tmp, summary.method='medianpolish')
\end{Sinput}
Separating the normalization steps allows to more efficiently search
the most appropriate parameters for the spike-in probe-based
normalization without running the time-consumming summarization step
at every attempt (see Section \ref{sec:trouble} for details).\\

The low-level summary function \verb+summarize.miR+ can also be run in
combination with \verb+createAB+ in order to summarize a MAList object
obtained with a normalization method of the \emph{limma} package
(see Figure \ref{fig0}). For instance, using the RGList object defined in
Subsection \ref{subsec:othernorm}:
\begin{Sinput}
R> MAList.tmp <- normalizeWithinArrays(RGList, method='loess')
R> MAList <- normalizeBetweenArrays(MAList.tmp, method='Rquantile')
R> obatch <- createAB(MAList)
R> oset.rrma <- summarize.miR(obatch, summary.method='medianpolish')
\end{Sinput}


\section{Troubleshooting and fine-tuning options}
\label{sec:trouble}

\emph{ExiMiR} provides several functionalities to ensure a safe application of the
spike-in probe-based normalization. This section first describes the control figures
generated during the execution of the \verb+NormiR+ function (Subsection
\ref{subsec:figs}). They allow to identify the problems that may arise when applying
\emph{ExiMiR}. Subsection \ref{subsec:problems} will help understanding their origin
and deciding whether to still use \emph{ExiMiR} (with different parameters) or to
switch to another method like median normalization. The fine-tuning options offered
by \emph{ExiMiR} are explained in Subsection \ref{subsec:opt}.


\subsection{Control figures}
\label{subsec:figs}

In order to follow the execution of the spike-in probe-based normalization implemented
in \texttt{NormiR}, a series of three control figures are generated for each channel of
the input data. They allow to confirm the successful application of the normalization
method but also to detect possible anomalies, that can be then treated with the options
described in Subsection \ref{subsec:opt}. This feature runs by default and can be
deactivated by setting  \verb+figures.show=FALSE+ in \verb+NormiR+.\\

The three control figures generated for the Hy3 channel of the Exiqon example from
Subsections \ref{subsec:exidata} and \ref{subsec:exinorm} are briefly described
hereafter. For more details see \cite{bibi}.

\begin{figure}[t]
\begin{center}
\fbox{\includegraphics{fig1}}
\caption{Correction of the spike-in probe set intensities (Hy3 channel)}
\label{fig1}
\end{center}
\end{figure}


\begin{description}

\item[Correction of the spike-in probe set intensities] The four panels in Figure
\ref{fig1} show the successive steps in removing the array-dependent biases from the
spike-in probe set intensities. A meaningful application of \verb+NormiR+ indeed
requires that the spike-in probe set intensities display coherent deviations across the
arrays of the experiment. Such a behavior manifests itself by roughly parallel curves
on the upper-left panel and by collapsing ones on the upper-right panel. The
normalization correction consists first in subtracting this common variance (lower-left
panel) and second in transforming back to the original intensity range (lower-right
panel). Correcting the curves is proved efficient when the final ones appear
'straighter' than the initial ones.

\begin{figure}[t]
\begin{center}
\fbox{\includegraphics{fig2}}
\caption{Performance of the spike-in probe set intensity correction (Hy3 channel)}
\label{fig2}
\end{center}
\end{figure}

\item[Performance of the spike-in probe set intensity correction] Figure \ref{fig2}
contains two measures for quantitatively assessing the performance of the spike-in
probe set intensity correction used by \verb+NormiR+. The upper panel shows a heatmap
of the Pearson correlations between the array-dependent raw intensities of the spike-in
probe sets, i.e. between the curves displayed on the upper-left panel of Figure
\ref{fig1}. If the values are globally larger than 0.5, then the array-dependent biases
are sufficiently coherent and applying \verb+NormiR+ is justified. The lower panel
displays the variance ratio of the spike-in probe sets intensities before and after the
correction. They correspond to the curves in the upper-left and lower-right panels of
Figure \ref{fig1}. If these ratios are sufficiently low, then the \verb+NormiR+
approach was effective.

\begin{figure}[t]
\begin{center}
\fbox{\includegraphics{fig3}}
\caption{Intensity-dependent correction functions (Hy3 channel)}
\label{fig3}
\end{center}
\end{figure}

\item[Intensity-dependent correction functions for all probes] Figure \ref{fig3}
displays the intensity and array dependent correction functions that \verb+NormiR+
applies to all miRNA probes to perform the normalization. It is constructed based on the
spike-in probe corrections, already shown on Figures \ref{fig1} and \ref{fig2}. Several
requirements are necessary to ensure a stable coverage of the whole range of probe
intensities measured on the array. \emph{ExiMiR} automatically performs checks to
prevent critical situations where its meaningful application is not guaranteed.
Sometimes the constructed correction functions do not look good, even if \verb+NormiR+
ran smoothly. Dealing with such situations is also described in Section
\ref{subsec:opt} below.

\end{description}


\subsection{Possible problems}
\label{subsec:problems}

The application of \emph{ExiMiR} fundamentally assumes that the spike-in probes capture
the greatest part of the between-array technical variability in the miRNA expression data. This
is normally the case when the processing of the RNA samples prior to the addition of
the spike-in RNAs is suitably standardized and controlled. If this condition is
satisfied, then \emph{ExiMiR} requires three features from the spike-in control probes
to be meaningfully applied, see \cite{bibi}. These features are automatically tested by
the software. In case of failure, median normalization is used instead of spike-in
probe-based method. However the threshold values used in these tests can be changed to
force the application of the spike-in probe-based method. Its consequences can then be
investigated on the control figures described in Subsection \ref{subsec:figs} to decide
whether the application of \emph{ExiMiR} was justified or not. Other problems like
annotation conflicts are also supported by \emph{ExiMiR}. \\

Here is the list of the problematic situations covered by \emph{ExiMiR}, arranged by
potential order of appearance.

\begin{description} \item[Incompatibility between GAL and TXT files] If the array
annotation contained in the GAL file is not compatible with the one contained in the
TXT files, or if there is no GAL file available, then \verb+ReadExi+ directly generates
a default \verb+galenv+ environment from the annotation contained in the TXT files.

\item[Insufficient coherence between spike-in probe sets] If the raw intensities of the
spike-in probe sets are not sufficiently coherent across the arrays of the experiment,
i.e. if the mean of the off-diagonal elements of the Pearson correlation matrix shown
on the upper panel of Figure \ref{fig2} is smaller than 0.5, then a median
normalization is applied. The value can be changed by using the \verb+min.corr+ of
\verb+NormiR+.

\item[Specificity of the spike-in probeset intensities] If the spike-in probeset
intensities are not specific, i.e. if the intensity ranges covered by the probes
mapping to the same probe sets are too large, then computing the intensity-dependent
correction functions from Figure \ref{fig3} becomes problematic. The
intensity-independent median normalization is preferred in this case. The \verb+NormiR+
parameter \verb+max.log2span+ can be changed to allow for probeset intensity ranges larger
than the default value 1.

\item[Insufficient coverage of the probe intensity range] If the range
[$\sim$6,$\sim$16] of all array probe intensities is not appropriately covered by the
spike-in probe intensities, then computing the intensity dependence of the correction
functions from Figure \ref{fig3} becomes unstable. The \verb+NormiR+ parameter
\verb+cover.int+ tests the size of the largest intensity interval between two
consecutive spikes. Its default value is 1/3. The \verb+NormiR+ parameter
\verb+cover.ext+ tests the minimal ratio between the intensity range covered by the
spike-in probes and the one covered by all probes on the array. Its default value is
1/2. These two values can be changed but an eye must be kept on their consequences on
the correction functions from Figure \ref{fig3}, since the latter are not explicitly
tested by \emph{ExiMiR}. The \verb+NormiR+ options for computing these correction
functions are explained in Subsection \ref{subsec:opt} below.

\end{description}

\subsection{\emph{NormiR} options for computing the correction functions}
\label{subsec:opt}

The results for the spike-in probe-based correction functions displayed on Figure
\ref{fig3} are not tested automatically by \emph{ExiMiR} and might not be entirely
satisfactory. This might be due to multiple reasons, ranging from inhomogeneous
affinities across the spike-in probe sets to an inappropriate coverage of the probe
intensity range. \emph{ExiMiR} offers the possibility of fine-tuning the
parameters used by \verb+NormiR+ to improve the stability of the correction
functions (see also Subsection \ref{subsec:normir} for command line examples).

\begin{description}

\item[Selection of the spike-in probes] If the control figures show that only a
subset of the spike-in probe sets displays the appropriate behavior, it can be
manually selected for the calculation of the correction functions. Set the
\verb+NormiR+ parameter \verb+probeset.list+ to the list of the appropriate
spike-in probe sets IDs.
\footnote{
Actually \texttt{probeset.list} allows to select any set of probe set IDs and
tentatively run \texttt{NormiR}. However, the safeguarding strategy implemented
in \texttt{NormiR} prevents inappropriate choices of ''spike-in probes``.
}

\item[Overall LOESS smoothing] If the correction functions 'wiggle' too much, the
\verb+NormiR+ parameter \verb+loess.span+ can be set to higher values to better smooth the
resulting curves. By default, it takes the value 5/(number of spike-in probe sets), e.g.
5/10 in the Exiqon case. In the extreme cases of values close to 1, the intensity
dependence of the correction is lost and the results become very similar to a mean or a
median normalization.

\item[Low-intensity stabilization] If one correction function change its sign in the
low intensity range, then an inclusion into the LOESS smoothing of a zero value at the
intensity minimum will prevent this feature. Set the \verb+NormiR+ parameter
\verb+force.zero+ to \verb+TRUE+  to activate this functionality.

\item[High-intensity extrapolation] It often occurs that the largest spike-in probeset
intensities are lower than the largest probe intensities on the array. In this case
\verb+NormiR+ needs to include extrapolated values into the LOESS smoothing in order to
compute the correction functions in the high-intensity range. Fortunately this step is
quite stable thanks to the fact that high intensity values are less noisy. By default
\verb+NormiR+ uses the mean of the correction values of two spike-in probe sets with the
largest intensities. The parameter \verb+extrap.points+ allows to change the number of
spike-in probe sets used in the extrapolation and \verb+extrap.method+ determines the
extrapolation method.

\end{description}

\section{Concrete example with provided data}
\label{sec:example}
\emph{ExiMiR} provides datasets that allows one to test the functions described in this vignette. The test data are in the \verb+R+ objects obtained as
described in Section \ref{sec:data}. They can be used as follows,  which reproduces the commands explained in Section \ref{sec:norm}.

\subsection{Affymetrix}
Start by loading the \emph{ExiMiR} package and the AffyBatch object, which includes the appropriate annotation environment and corresponds to the data described in Section \ref{subsec:affydata}:
<<>>=
library(ExiMiR)
data(GSE19183)
@
Apply the RMA quantile normalization on the AffyBatch object \verb+GSE19183+ to create the ExpressionSet object \verb+eset.rma+ containing the normalized data:
<<>>=
library(affy)
eset.rma <- expresso(GSE19183, bgcorrect.method='rma',
			       normalize.method='quantiles',
			       pmcorrect.method='pmonly',
			       summary.method='medianpolish')

@
The spike-in probe-based normalization implemented in \emph{ExiMiR} can be applied similarly:
<<>>=
eset.spike <- NormiR(GSE19183, bgcorrect.method='rma',
			       normalize.method='spikein',
			       normalize.param=list(figures.show=FALSE),
			       pmcorrect.method='pmonly',
			       summary.method='medianpolish')
@
As explained at the end of Section \ref{subsec:affynorm}, the spike-in probe-based normalization method implemented in \verb+NormiR+ can not be applied with its default settings.
Use the \verb+figures.show=TRUE+ option to diagnose graphically the problem (see Section \ref{subsec:figs}). The safeguarding strategies are described in Section \ref{subsec:problems}
and the \verb+NormiR+ options described in Section \ref{subsec:opt}. Here \verb+NormiR+ cannot be
satisfactorily applied to \verb+GSE19183+ because the spike-in probe intensities do not appropriately cover the intensity range of all the array probes.


\subsection{Exiqon}
Load the \emph{ExiMiR} package, the GAL annotation environment and the AffyBatch objects corresponding to the data described in Section \ref{subsec:exidata}:
<<>>=
library(ExiMiR)
data(galenv)
data(GSE20122)
@
Apply the RMA quantile normalization on the AffyBatch object \verb+GSE20122+, using the \verb+rma+ option \verb+background=FALSE+ as recommended by a recent study\cite{lopez}.
This creates the ExpressionSet object \verb+eset.rma+ containing the normalized data:
<<>>=
eset.rma <- NormiR(GSE20122, bg.correct=FALSE,
			     normalize.method='quantile',
			     summary.method='medianpolish')
@
The spike-in probe-based normalization implemented in \emph{ExiMiR} is applied as follows:
<<>>=
eset.spike <- NormiR(GSE20122, bg.correct=FALSE,
			       normalize.method='spikein',
			       normalize.param=list(figures.show=FALSE),
			       summary.method='medianpolish')
@
To obtain the same control figures as the ones displayed in Section \ref{subsec:figs}, use the \verb+NormiR+ option \verb+figures.show=TRUE+.

\newpage
\begin{thebibliography}{2}

\bibitem{bibi}
Sewer A et \textit{al}., to be published.

\bibitem{sarkar}
Sarkar D et \textit{al}., Quality assessment and data analysis for miRNA expression
arrays, Nucleic Acids Research 2009, \textbf{37}(2).

\bibitem{RMA}
Irizarry RA et \textit{al}., Biostatistics 2003, \textbf{4}(2).

\bibitem{lopez}
L\'{o}pez-Romero P et \textit{al}., Procession of Agilent microRNA array data,
BMC Research Notes 2010, \textbf{3}:18.


\end{thebibliography}


\newpage
\appendix{}
\section{Content of the file ''samplesinfo.txt''}
\label{sec:app}
\scriptsize{
\begin{Sinput}
Names	Hy3	Hy5
1	GSM503402_Hy3_Exiqon_14114402_S01_Cropped.txt	GSM503402_Hy5_Exiqon_14114402_S01_Cropped.txt
2	GSM503403_Hy3_Exiqon_14114403_S01_Cropped.txt	GSM503403_Hy5_Exiqon_14114403_S01_Cropped.txt
3	GSM503404_Hy3_Exiqon_14114404_S01_Cropped.txt	GSM503404_Hy5_Exiqon_14114404_S01_Cropped.txt
4	GSM503405_Hy3_Exiqon_14114405_S01_Cropped.txt	GSM503405_Hy5_Exiqon_14114405_S01_Cropped.txt
5	GSM503406_Hy3_Exiqon_14114406_S01_Cropped.txt	GSM503406_Hy5_Exiqon_14114406_S01_Cropped.txt
6	GSM503407_Hy3_Exiqon_14114407_S01_Cropped.txt	GSM503407_Hy5_Exiqon_14114407_S01_Cropped.txt
7	GSM503408_Hy3_Exiqon_14114408_S01_Cropped.txt	GSM503408_Hy5_Exiqon_14114408_S01_Cropped.txt
8	GSM503409_Hy3_Exiqon_14114409_S01_Cropped.txt	GSM503409_Hy5_Exiqon_14114409_S01_Cropped.txt
9	GSM503410_Hy3_Exiqon_14114410_S01_Cropped.txt	GSM503410_Hy5_Exiqon_14114410_S01_Cropped.txt
10	GSM503411_Hy3_Exiqon_14114411_S01_Cropped.txt	GSM503411_Hy5_Exiqon_14114411_S01_Cropped.txt
11	GSM503412_Hy3_Exiqon_14114412_S01_Cropped.txt	GSM503412_Hy5_Exiqon_14114412_S01_Cropped.txt
12	GSM503413_Hy3_Exiqon_14114413_S01_Cropped.txt	GSM503413_Hy5_Exiqon_14114413_S01_Cropped.txt
\end{Sinput}
}

\end{document}