%\VignetteIndexEntry{ABarray gene expression}
%\VignetteKeywords{Applied Biosystems AB1700}
%\VignetteDepends{ABarray}
%\VignettePackage{ABarray}

\documentclass[10pt]{article}

\title{AB1700 Microarray Data Analysis}
\author{
        Yongming Andrew Sun, Applied Biosystems
		  \\
		  sunya@appliedbiosystems.com
}


\usepackage{graphicx}
\usepackage{fancyhdr}

\newcommand{\Rpackage}[1]{\textit{#1}}
\newcommand{\Rfunction}[1]{\textit{#1}}
\newcommand{\Robject}[1]{\texttt{#1}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\textit{#1}}}
\newcommand{\Rfunarg}[1]{{\textit{#1}}}

\oddsidemargin  -0.2in
\evensidemargin 0.0in
\textwidth      6.9in
\topmargin      -0.75in
\textheight     9.25in

\pagestyle{fancy}
\rhead{AB1700 Gene Expression}
\rfoot{ABarray package}

\begin{document}

\setkeys{Gin}{width=0.6\textwidth}

\maketitle
\tableofcontents

\section{ABarray Package Introduction}
The package \Rpackage(ABarray) is designed to work with Applied Biosystems
whole genome microarray platform, as well as any other platform whose data
can be tranformed into expression data matrix. In the following functions
described, items 1 and 2 are specific to AB1700 datasets. Items 3 to 9
can be applied to any datasets. However, for AB1700 data, filtering is
performed using S/N ratio threshold (default = 3), while filtering is not
applied to other data.

The package \Rpackage(ABarray) will perform the following functions:
\begin{enumerate}
\item
Read output from AB1700 software output (AB1700 specific)
\item
Perform analysis on hybridization control spike-ins (AB1700 specific)
\item
Perform raw data QA and associated plots (see \Rfunction{doPlotEset}) including:
\begin{itemize}
\item boxplot for signal distrubution range
\item MA plot for signal distribution and signal variability
\item CV plot for variation among hybridization replicates
\item Scatter plot for correlation between hybridization arrays
\item Correlation heatmap for visualization
\item S/N detection concordance
\end{itemize}
\item
Perform quantile normalization
\item
Repeat data QA after normalization. See \Rfunction{doPlotEset}
\item
Perform t test, fold change and ANOVA and produce graphics to visualize t test results.
The default t test assumes unequal variances. See \Rfunction{doPlotFCT} for more details.
\item
Peform LPE for low number of replicates. See \Rfunction{doLPE} 
for more details
\item
For more details on ANOVA analysis, see \Rfunction{doANOVA}
\item
For drawing Venn diagram, see \Rfunction{doVennDiagram}
\end{enumerate}

\subsection{Required Files and Format}
To take full advantage of automated process provided by package
\Rpackage{ABarray}, two files should be available before running
\Rpackage{ABarray}: a data file and an experiment design file.
\begin{itemize}
\item Experiment Design File. The rows of the file are samples 
or arrays. The first column should be sampleName. Perhaps, sampleName should be 
concise and no spaces between characters. Additional columns maybe 
assayName and arrayName (one of these). Additional columns should specify what 
type of samples. Note: It is best to have assayName the same as in dataFile. 
See an example of experiment design file (Table~\ref{tab:design1}).

\begin{table}[h]
\begin{center}
\caption{Experiment design and sample information.}
\label{tab:design1}
\begin{tabular}{rlll}
\hline
 & sampleName & assayName & tissue \\
\hline
1 & A1 & HB003U2\_9/2/05\_1:55\_PM & TissueA \\
2 & A2 & HB003U3\_9/2/05\_2:11\_PM & TissueA \\
3 & A3 & HB003U8\_9/2/05\_2:26\_PM & TissueA \\
4 & B1 & HB003TV\_9/2/05\_2:41\_PM & TissueB \\
5 & B2 & HB003TW\_9/2/05\_2:55\_PM & TissueB \\
6 & B3 & HB003TZ\_9/2/05\_3:10\_PM & TissueB \\
\hline
\end{tabular}\end{center}
\end{table}

In the example (Table~\ref{tab:design1}), assayName is included and arrayName 
is not. There is no need for both assayName and arrayName to be listed in the 
design file. Note: every column name in the header is ONE word, there needs 
NO spaces in each word. Make sure sampleName, assayName, arrayName spelled 
correctly. There is no need to worry about lower case or upper case.

\item Data File. 
The rows of the file represent probes. The first column is probeID (or Probe 
Name), the second column is geneID, next set of columns 
should contain Signal, S/N and FLAGS for all samples. Optionally columns
with Assay\_Normalized\_Signal, SDEV, CV can be included in the data file, but
these columns will be ignored in the analysis.
The columns are: probeID, geneID, Signal, S/N, Flags, and optionally SDEV, 
CV, and Assay\_Normalized\_Signal.

\item Control Probes. It is optional to have control probes. 
If they are present, plots will be generated for the control probes, 
but they will be removed for further analysis. 
\end{itemize}


\subsection{Data File and Experiment Design File}
The data file and experiment design file are essential. They should be either
tab-delimt txt file or comma-delimit csv file. It is very
important to make these files compliant. If array name in the header of
data file is present and arrayName is defined in experiment design
file, arrayName will be used to distinguish which column is for 
which hybridization sample. In the absence of arrayName in experiment
design file, assayName will be used to identify which column in 
data file is for which hybridization sample. It is important to 
spell assayName in the experiment design file the same as in data
file. For example, if the assayName in data file is called
\Rfunarg{HA004J7\_1A\_2}, which will show as 
\Rfunarg{Signal HA004J7\_1A\_2} in data file, the assayName in
experiment design file should be called exactly as 
\Rfunarg{HA004J7\_1A\_2}.

Note: assayName should be unique and not part of another assayName. For example,
assayName S1 is part of another assayName S11. In this case, S1 is better
renamed to S01.

Sometimes, there could be more assays in data file than we actually want 
to analyze. For example, we decided to ignore one or two of the arrays, 
because the images are bad and there are reasons to exlude these arrays. 
But instead to delete the related columns in data file, we simply change 
the experimental design file to just include those arrays we need to 
analyze. However, on the other hand, if there are more hybridization samples 
in experiment design file than in data file, it will produce an error, 
as this does not make sense.

\subsection{ABarray Function Details}

\subsubsection{Processing of control probes}
There are a number of internal control probes for AB1700 Expression Array
Systems. The QC analysis is performed on the following controls before any
data normalization.
\begin{enumerate}
\item Hybridization Controls.
\item IVT Labeling Controls.
\item RT Labeling Controls.
\item Negative Controls.
\end{enumerate}
All other controls are ignored in the analysis. The behavior of signals of the 
controls are plotted in the figures. The control probes and
associated measurements are removed before data normalization and transformation.

\subsubsection{Data normalization and transformation}
Once the control probes and associated measurements are removed, the quantile
normalization procedure is applied. The data is then transformed into log2 
scale. If the imputation is selected (default is to use average imputation),
the missing value imputation is applied after quantile normalization and log2
transformation. What is considered missing value? If the \textit{Flag} value of 
a probe is above 5000 (actually above $2^{12}= 4096$), the signal value of the 
probe is considered missing value and is replaced with NA. The imputation is
then performed on these NA values. If 'No imputation' is used, the probe
signal values are used as is (they are not replaced with NA, nor are they 
imputed) in the downstream analysis.

There are 2 choices in the program for imputation.
\begin{enumerate}
\item No imputation. Imputation is not performed, and the signal values are
used as is.
\item Average imputation. The probe signal values are averaged from other arrays
in the same subgroup. The missing values are replaced with the averaged probe.
If there is not enough values to average for certain probes, the signal values
of these probes remain missing (NA). 
\end{enumerate}

\subsubsection{Array QC Analysis}
A number of array QC analysis is performed and associated figures are 
generated. The QC analysis is performed on both the pre-normalized data and 
normalized data.
\begin{itemize}
\item boxplot
\item MA plot
\item scatter plot
\item correlation plot
\item CV plot
\item percentage of probes detected
\item probe detection concordance
\end{itemize}

\subsubsection{Statistical Analysis for Differential Expression}
A default two sample \textit{t} test is performed between any 2 subgroups. 
A probe filtering procedure precedes the \textit{t} test. If a probe is not
detected (S/N < 3) in both subgroups, it is not included in the \textit{t}
test. If a probe is not detected in more than 50\% of samples in a subgroup,
it is not detected in that subgroup. The 50\% is default, and can be set to 
a different threshold value.

The fold change (FC) and log2(FC) for the detected probes are also calculated.
The FC is divided into 5 FC bins. In addition, the \textit{p} values from the 
\textit{t} test are used to calculate adjusted \textit{p} value as FDR using 
Biocondutor package \Rpackage{multtest}. The FDR is calculated using 
Benjamini-Hochberg procedure. The results are written into files and plotted
into MA and volcano plots.

If there are more than 2 subgroups, ANOVA will also be performed. The probe 
detection threshold (same as \textit{t} test) is also applied.

\subsubsection{Clustering heatmap}
The clustering heatmap is also produced if the number of differentially expressed 
probes is less than 800.

\subsubsection{Description of Results}
There are 3 folders generated after the analysis for each group. 
\begin{itemize}
\item DataResult. This folder contains several text files. 
\begin{itemize} 
\item 'ControlRawData.csv' contains signals of control probes.
\item 'ProbeImputed\_' contains probeID whose expression measurement may be imputed.
\item 'QuantileNormalizedExpression' contains quantile normalized, log2
transformed, and imputed values.
\item 'RawExpressionData.csv' contains non-normalized, non-imputed raw measurement.
\item 'ST\_DetectPct' files contain statistical analysis results.
\end{itemize}
\item jpgFigure. This folder contains figures in jpg or bitmap format.
\item pdfFigure. This folder contains figures in pdf format.
\end{itemize}


\section{ABarray Function Demonstration}
After loading the data file and experiment design file, the ABarray functions
check to see if control signals are present in data file. If they are 
present, several plots for control assesement will be produced. The program
then removes these control probes and perform QA analysis on raw data. It also
produces a number of plots to visualize those QA assesement. After this,
it performs quantile normalization and repeat QA assessment for
quantile normalized data. Futhermore, fold changes between subgroups and 
\textit{t} test to identify differential expressed genes will be performed
(and ANOVA if there are more than 2 subgroups in the group).

\subsection{Required Libraries}
The bioconductor library \Rpackage{Biobase} and \Rpackage{multtest} are 
required to run ABarray. In addition, several other optional libraries
may be needed to perform various functions. \Rpackage{impute} is needed 
if KNN imputation will be performed, \Rpackage{limma} is needed if drawing 
Venn diagram function is performed.

\subsection{Preparing R for Data Analysis}
After launching R, we need to set the R working directory to inform R
where we find data file and design file. It also inform R where to store
the analysis results and associated figures from the analysis.

To set R working directory, choose \textsf{'Change dir...'} from 
\textsf{'File'} menu, and a dialog box appears to let you browse which directory 
(folder) to set as working directory (See Figure~\ref{fig:setWorkDir1} and
Figure~\ref{fig:setWorkDir2}).

\begin{figure}[htp]
\begin{minipage}{0.45\textwidth}
\includegraphics[width=1\textwidth]{Figure/dirChangeFig.pdf}
\caption{Change R working directory. Select Change dir... from File menu.}
\label{fig:setWorkDir1}
\end{minipage}
\hfill
\begin{minipage}{0.45\textwidth}
\includegraphics[width=1\textwidth]{Figure/dirSelectFig.pdf}
\caption{Browse the directory to set as working directory.}
\label{fig:setWorkDir2}
\end{minipage}
\end{figure}


\subsection{File List in Working Directory}
To see the content of the working directory, type the following command
\begin{Schunk}
\begin{Sinput}
> dir()
\end{Sinput}
\begin{Soutput}
[1] "ABarray_Example.tex"      "ExperimentDesign.csv"    "RawExpressionABData.txt"
\end{Soutput}
\end{Schunk}

\subsection{Loading ABarray library}
The \Rpackage{ABarray} has to be loaded to perform the automated analysis.
To load \Rpackage{ABarray}, type the following command after launching R.

\begin{Schunk}
\begin{Sinput}
> library(ABarray)
\end{Sinput}
\end{Schunk}

\subsection{Performing Automated Expression Analysis}
The automated gene expression analysis requires only one command. The function
takes 3 parameter arguments. The first argument is the name of the data file.
The second argument is the name of the expriment design file. The third
argument is the name of the experiment group defined in the experiment design
file on which t test and other analysis should be performed. It will generate
an \Robject{ExpressionSet} expression value object. 

In the following example, the name of the data file is called 'RawExpressionABData.txt'; 
the name of the experiment design file is called 'ExperimentDesign.csv'; and the 'tissue'
is the group name selected to perform analysis on. The expression value object
is assigned to \Robject{eset} object. Since the 'tissue' has 2 subgroup, ANOVA
will NOT be performed in this example.

\begin{Schunk}
\begin{Sinput}
> eset = ABarray("RawExpressionABData.txt", "ExperimentDesign.csv", "tissue")
\end{Sinput}
\begin{Soutput}
Using assayName to match experiment with signal in file: RawExpressionABData.txt 
The sample names are:
[1] "A1" "A2" "A3" "B1" "B2" "B3"
AssayNames in dataFile: RawExpressionABData.txt 
[1] "HB003U8_9/2/05_2:26_PM" "HB003U2_9/2/05_1:55_PM" "HB003TZ_9/2/05_3:10_PM" "HB003U3_9/2/05_2:11_PM"
[5] "HB003TW_9/2/05_2:55_PM" "HB003TV_9/2/05_2:41_PM"
Reading data from RawExpressionABData.txt ......
    This may take several minutes....
Checking data format:::::::::::::::::::||

The results will be in the folder: Result_tissue/ 
[1] "Creating plot for Signal Hybridization_Control ..."
[1] "Creating plot for Signal Negative_Control ..."
[1] "Creating plot for Signal IVT_Kit_Control_BIOB ..."
[1] "Creating plot for Signal IVT_Kit_Control_BIOC ..."
[1] "Creating plot for Signal IVT_Kit_Control_BIOD ..."
[1] "Creating plot for Signal RT_Kit_Control_DAP ..."
[1] "Creating plot for Signal RT_Kit_Control_LYS ..."
[1] "Creating plot for Signal RT_Kit_Control_PHE ..."

Perform basic analysis for non-normalized data ...
[1] "Creating barplot for probes detectable ... QC_DetectableProbeSN3"
[1] "Creating boxplot for Raw_tissue  ..."
[1] "Creating correlation matrix and plot Raw_tissue  ..."
[1] "Creating detection concordance plot Raw_tissue ..."
Perform Average imputation for probes with FLAG > 5000
[1] "Array A1 has 72 FLAGS > 5000" "Array A2 has 89 FLAGS > 5000" "Array A3 has 47 FLAGS > 5000"
[4] "Array B1 has 91 FLAGS > 5000" "Array B2 has 96 FLAGS > 5000" "Array B3 has 75 FLAGS > 5000"
The signals of each flagged probe were replaced with average signals
   from replicate arrays within the same subgroup (TissueA, TissueB).
The imputed probes were written to file: Result_tissue/DataResult/ProbeImputed_tissue.csv 

 Quantile normalized data was saved to:
     Result_tissue/DataResult/QuantileNormalizedExpression.csv 
The following probes still contain missing values even after imputation:
[1] "128976" "215914" "226588" "249634"

The expression data object was saved to R workspace file:
     ExperimentDesign_eset_28Mar2006.Rdata 

Perform data analysis on Quantile normalized data ...
[1] "Creating boxplot for Quantile_tissue  ..."
[1] "Creating MA plot for Quantile_tissue"
[1] "Creating MA Plot Quantile_tissue: TissueA ..."
[1] "Creating QC_CVplot_Quantile_tissue_TissueA ..."
[1] "Creating MA Plot Quantile_tissue: TissueB ..."
[1] "Creating QC_CVplot_Quantile_tissue_TissueB ..."
[1] "Creating scatter plot filtering S/N < 3  for Quantile_tissue  ..."
[1] "Creating correlation matrix and plot Quantile_tissue  ..."
[1] "Creating detection concordance plot Quantile_tissue ..."

Performing t test between TissueA and TissueB ...
1 probes had sdev = 0, they were removed in the t test
[1] "520680"
[1] "Creating volcano plot ..."
[1] "Writing t test result to file:  Result_tissue/DataResult/ST_DetectPct50FCpvalFDR_TissueA-TissueB.csv"

The t test was performed if the probe shows S/N >= 3 in 50% or more samples
    in either TissueA or TissueB
The number of probes after filtering is: 22323 

p=0.01, significant probes: 12611
[1] "Creating Fold Change plot for: TissueA-TissueB pVal 0.01  ..."
[1] "Clustering was not performed, because there are 12611 probes."
[1] "    The limit of probe counts for clustering is: 800"
p=0.05, significant probes: 16461
[1] "Creating Fold Change plot for: TissueA-TissueB pVal 0.05  ..."
[1] "Clustering was not performed, because there are 16461 probes."
[1] "    The limit of probe counts for clustering is: 800"
FDR=0.01, significant probes: 10666
[1] "Creating Fold Change plot for: TissueA-TissueB FDR 0.01  ..."
[1] "Clustering was not performed, because there are 10666 probes."
[1] "    The limit of probe counts for clustering is: 800"
FDR=0.05, significant probes: 15681
[1] "Creating Fold Change plot for: TissueA-TissueB FDR 0.05  ..."
[1] "Clustering was not performed, because there are 15681 probes."
[1] "    The limit of probe counts for clustering is: 800"
FDR=0.1, significant probes: 17416
[1] "Creating Fold Change plot for: TissueA-TissueB FDR 0.1  ..."
[1] "Clustering was not performed, because there are 17416 probes."
[1] "    The limit of probe counts for clustering is: 800"
FDR=0.25, significant probes: 19373
[1] "Creating Fold Change plot for: TissueA-TissueB FDR 0.25  ..."
[1] "Clustering was not performed, because there are 19373 probes."
[1] "    The limit of probe counts for clustering is: 800"
\end{Soutput}
\end{Schunk}

\section{Information about QC and Analysis Results}
The function \Rfunction{ABarray} produced a lot of results. In addition,
it returns an \Robject{ExpressionSet} object. This make it possible to use
vast amount of packages and functions in Bioconductor (www.bioconductor.com)
for further statistical analysis, e.g., classification, prediction.

The results will be saved in a folder within the R working directory. The name
of the folder begins with \textsf{Result\_} and the name of the group. In the
example run, the fold name is \textsf{Result\_tissue}. There will be 3 directories 
created inside this folder to store the analysis results: One \textsf{DataResult}
folder to store normalized signals, \textit{p} values and FDR from \textit{t}
test, and ANOVA \textit{p} values if ANOVA is performed. One \textsf{jpgFigure}
folder to store QC and analysis plots, and one \textsf{pdfFigure} folder. In
\textsf{pdfFigure}, the plots are the same as in \textsf{jpgFigure}, but in
higer resolution of pdf format.

Now, let's take a look at the \Robject{eset}.
\begin{Schunk}
\begin{Sinput}
> eset
\end{Sinput}
\begin{Soutput}
Expression Set (exprSet) with 
	32878 genes
	6 samples
		 phenoData object with 3 variables and 6 cases
	 varLabels
		sampleName: read from file
		assayName: read from file
		tissue: read from file
\end{Soutput}
\end{Schunk}

We can also check the content of the experiment design by method
in \Rclass{Bioconductor} package \Rpackage{Biobase}, \Rfunction{pData}.

\begin{Schunk}
\begin{Sinput}
> pData(eset)
\end{Sinput}
\begin{Soutput}
  sampleName              assayName  tissue
1         A1 HB003U2_9/2/05_1:55_PM TissueA
2         A2 HB003U3_9/2/05_2:11_PM TissueA
3         A3 HB003U8_9/2/05_2:26_PM TissueA
4         B1 HB003TV_9/2/05_2:41_PM TissueB
5         B2 HB003TW_9/2/05_2:55_PM TissueB
6         B3 HB003TZ_9/2/05_3:10_PM TissueB
\end{Soutput}
\end{Schunk}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%- Function doPlotEset
\section{Data Analysis on eSet object: doPlotEset}
In this example, we will show the utility of \Rfunction{doPlotEset}. This function perform
a number of data quality assessment and some statistical analysis (see \Rfunction{doPlotFCT}
for fold change and t test, \Rfunction{doANOVA} for one- or two-way ANOVA analysis).

\subsection{Data Quality Plots}
\begin{itemize}
\item
Boxplot showing distribution of signal range.
\item
MA plot showing data distribution and signal variability between hybridizations.
There might be several MA plots produced. All pairs of hybridizations within the group
will be plotted if number of pairs is less than 35 pairs. In addition, MA plots will
be produced for each pairs within subgroup of the group to see signal variability
between each individual members of the subgroup.
\item
CV plot showing amount of variation within replicate hybridizations of each subgroup.
\item
Scatterplot showing visually the correlation between any two hybridizations within
the group. Correlation coefficient is also calculated for each pair.
\item
Correlation heatmap showing correlation between any pairs of hybridization. The
heatmap is color coded to quickly visualize the extent of the correlation.
\item
S/N detection concordance. The signal is detected if the probe shows S/N ratio
equal to or above a user defined threshold (default is 3). The S/N detection
heatmap shows detection concordance between any pairs of hybridization.
\end{itemize}

\subsection{Statistical Analysis}
The function \Rfunction{doPlotEset} accepts a parameter \Rfunarg{test} either
\Rclass{TRUE} (the default) or \Rclass{FALSE}. If \Rfunarg{test} = \Rclass{TRUE}
(the default), regular t test will be performed. For details, please refer to
\Rfunction{doPlotFCT} in \Rpackage{ABarray} package. If there are more than
two subgroups within the group, pairwise t test will be performed on each pairs
of subgroup. In addition, one way ANOVA test will also be performed. The t test
results are written to files and several graphics plots will be produced. ANOVA
test results are written to file as well if it is performed.

\subsection{Requred Data and Function Arguments}
\begin{itemize}
\item
An \Robject{ExpressionSet} object. The S/N ratio information is stored in
\Robject{assayDataElement(eset, "snDetect")}. This will automatically be created from
\Rfunction{ABarray} in \Rpackage{ABarray} package. The analysis will filter probes based
on S/N >= 3 in 75\% of samples. If S/N is not available, all probes will be used for
analysis (no filtering will be applied).
\item
An \Rfunarg{group} parameter. The group is a factor that t test should be performed on.
The name of the group should correspond to one of the names in experiment design file
that define sample type. If there are 3 or more levels for the group, pairwise t test
and fold change will be peformed for all possible pairs.
\item
An optional \Rfunarg{name} parameter. The name will be used for output filenames to
distinguish different types of runs. For example, \Rmethod{quantile} to indicate
the results were produced after \Rmethod{quantile} process. \Rmethod{raw} to indicate
the results were produced with raw data. If \Rfunarg{name} is missing, the analysis
and output files will still be produced.
\item
An optional \Rfunarg{snThresh} parameter. If this is ommitted, default snThresh = 3
will be used. To use a different threshold, use \Rfunarg{snThresh} = \Rfunarg{newValue}.
\item
Parameter \Rfunarg{test}. By default, \Rfunarg{test} = \Rclass{TRUE}. This indicates to
perform statistical test (\textit{t} test and ANOVA if applicable). To avoid these test, use
\Rfunarg{test} = \Rclass{FALSE}. For more details about the \textit{t} test and fold change,
see \Rfunction{doPlotFCT} function. For more details for ANOVA analysis, see
\Rfunction{doANOVA}.
\end{itemize}

\subsection{doPlotEset Function Demonstration}
First let's see what the \Robject{eset} contains
\begin{Schunk}
\begin{Sinput}
> eset
\end{Sinput}
\begin{Soutput}
Expression Set (exprSet) with
	32878 genes
	6 samples
		 phenoData object with 3 variables and 6 cases
	 varLabels
		sampleName: read from file
		assayName: read from file
		tissue: read from file
\end{Soutput}
\end{Schunk}

Let's check the experiment design and pick a group for testing
\begin{Schunk}
\begin{Sinput}
> pData(eset)
\end{Sinput}
\begin{Soutput}
  sampleName              assayName  tissue
1         A1 HB003U2_9/2/05_1:55_PM TissueA
2         A2 HB003U3_9/2/05_2:11_PM TissueA
3         A3 HB003U8_9/2/05_2:26_PM TissueA
4         C1 HB003UA_9/2/05_3:24_PM TissueC
5         C2 HB003UB_9/2/05_3:44_PM TissueC
6         C3 HB003UD_9/2/05_3:58_PM TissueC
\end{Soutput}
\end{Schunk}

Let's begin to perform the analysis
\begin{Schunk}
\begin{Sinput}
> doPlotEset(eset, "tissue", name = "quantile")
\end{Sinput}
\begin{Soutput}
[1] "Creating boxplot for quantile_tissue  ..."
[1] "Creating MA plot for quantile_tissue"
[1] "Creating MA Plot quantile_tissue: TissueA ..."
[1] "Creating CVplot_quantile_tissue_TissueA ..."
[1] "Creating MA Plot quantile_tissue: TissueC ..."
[1] "Creating CVplot_quantile_tissue_TissueC ..."
[1] "Creating scatter plot for quantile_tissue  ..."
[1] "Creating scatter plot filtering S/N < 3  for quantile_tissue  ..."
[1] "Creating correlation matrix and plot quantile_tissue  ..."
[1] "Creating detection concordance plot quantile_tissue ..."

Performing t test between TissueA and TissueC ...
2 probes had sdev = 0, they were removed in the t test
[1] "207943" "520680"
[1] "Creating volcano plot ..."
[1] "Writing t test result to file:  Result_tissue/DataResult/DetectPct50FCpvalFDR_TissueA-TissueC.csv"

The t test was performed if the probe shows S/N >= 3 in 50% or more samples
    in either TissueA or TissueC
The number of probes after filtering is: 21628

p=0.01, significant probes: 3470
[1] "Creating Fold Change plot for: TissueA-TissueC pVal 0.01  ..."
[1] "Clustering was not performed, because there are 3470 probes."
[1] "    The limit of probe counts for clustering is: 800"
p=0.05, significant probes: 7595
[1] "Creating Fold Change plot for: TissueA-TissueC pVal 0.05  ..."
[1] "Clustering was not performed, because there are 7595 probes."
[1] "    The limit of probe counts for clustering is: 800"
FDR=0.01, significant probes: 429
[1] "Creating Fold Change plot for: TissueA-TissueC FDR 0.01  ..."
[1] "Creating cluster heatmap for FDR 0.01 using Correlation ..."
[1] "Creating cluster heatmap for FDR 0.01 using Euclidean ..."
FDR=0.05, significant probes: 2700
[1] "Creating Fold Change plot for: TissueA-TissueC FDR 0.05  ..."
[1] "Clustering was not performed, because there are 2700 probes."
[1] "    The limit of probe counts for clustering is: 800"
FDR=0.1, significant probes: 5577
[1] "Creating Fold Change plot for: TissueA-TissueC FDR 0.1  ..."
[1] "Clustering was not performed, because there are 5577 probes."
[1] "    The limit of probe counts for clustering is: 800"
FDR=0.25, significant probes: 10894
[1] "Creating Fold Change plot for: TissueA-TissueC FDR 0.25  ..."
[1] "Clustering was not performed, because there are 10894 probes."
[1] "    The limit of probe counts for clustering is: 800"
\end{Soutput}
\end{Schunk}

%-  End funciton doPlotEset


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%- Function doPlotFCT
\section{Fold Change and t test: doPlotFCT}
In this example, we will show the utility of \Rfunction{doPlotFCT}. This function perform
calculation for fold change and \textit{t} test. The \textit{t} test \textit{p} values
are adjusted using Benjamini-Hochberg FDR at preset levels: 0.01, 0.05, 0.1, 0.25, and
raw \textit{p} values 0.01 (no FDR) and 0.05 (no FDR). The results are written to a cvs
file, whose name begin with DetectPctFCpvalFDR and the group name that the test was
performed on. In addition, fold change are plotted into graphics for various FDR levels,
and volcano plot of fold change and \textit{t} test \textit{p} values are plotted.

The analysis uses S/N ratio for filtering. By default, if a probe is detected in
50\% or more samples within any subgroup, it is included in the analysis. These thresholds
can be changed at run time with optional parameters.

If there are less than 800 probes for a particular threshold level, clustering heatmap
may also be produced. There will be 2 clustering figures, one uses correlation coefficient
for clustering, and the other one uses distance for clustering.

The \textit{t} test \textit{p} values will be calculated using R implementation of
\textit{t} test assuming unequal variances and using log2 transformed normalized data.
The fold changes will also be calculated. If a paired \textit{t} test is performed,
the fold changes will be average of paired fold changes. If not paired \textit{t}
test, the fold changes will be fold changes of averaged expression values.

\subsection{Required Data and Function Arguments}
\begin{itemize}
\item
An \Robject{ExpressionSet} object. The S/N ratio information is stored in
\Robject{assayDataElement(eset, "snDetect")}. This will automatically be created
from \Rfunction{ABarray} in \Rpackage{ABarray} package. The analysis will filter
probes based on S/N and percentage of samples detectable within subgroups. If S/N
is not available, all probes will be used for analysis (no filtering will be applied).
\item
An \Rfunarg{group} parameter. The group is a factor that \textit{t} test should be
performed on. The name of the group should correspond to one of the names in
experiment design file that define sample type. If there are 3 or more levels for
the group, pairwise \textit{t} test and fold change will be peformed for all possible pairs.
\item
An optional \Rfunarg{grpMember} parameter. If this is missing, all members in the
group will be used for analysis. If it is defined and passed to the \Rfunction{doPlotFCT}
analysis will be performed only on those members within the group.
\item{order1}{ optional, For a pairwise comparison the ordering of
  the first group of replicates}
\item{order2}{ optional, For a pairwise comparison the ordering of
  the second group of replicates}
\item{snThresh}{ optional S/N ratio threshold. Default = 3 }
\item{detectSample}{ Percentage of samples the probe is detected in order to consider
  in \textit{t} test. Default = 0.5}
\end{itemize}

\subsection{doPlotFCT Function Demonstration}
First let's see what the \Robject{eset} contains
\begin{Schunk}
\begin{Sinput}
> eset
\end{Sinput}
\begin{Soutput}
Expression Set (exprSet) with
	32878 genes
	6 samples
		 phenoData object with 3 variables and 6 cases
	 varLabels
		sampleName: read from file
		assayName: read from file
		tissue: read from file
\end{Soutput}
\end{Schunk}

Let's check the experiment design and pick a group for testing
\begin{Schunk}
\begin{Sinput}
> pData(eset)
\end{Sinput}
\begin{Soutput}
  sampleName              assayName  tissue
1         A1 HB003U2_9/2/05_1:55_PM TissueA
2         A2 HB003U3_9/2/05_2:11_PM TissueA
3         A3 HB003U8_9/2/05_2:26_PM TissueA
4         C1 HB003UA_9/2/05_3:24_PM TissueC
5         C2 HB003UB_9/2/05_3:44_PM TissueC
6         C3 HB003UD_9/2/05_3:58_PM TissueC
\end{Soutput}
\end{Schunk}

Let's begin to perform the analysis
\begin{Schunk}
\begin{Sinput}
> doPlotFCT(eset, "tissue")
\end{Sinput}
\begin{Soutput}
Performing t test between TissueA and TissueC ...
2 probes had sdev = 0, they were removed in the t test
[1] "207943" "520680"
[1] "Creating volcano plot ..."
[1] "Writing t test result to file:  Result_tissue/DataResult/DetectPct50FCpvalFDR_TissueA-TissueC.csv"

The t test was performed if the probe shows S/N >= 3 in 50% or more samples
    in either TissueA or TissueC
The number of probes after filtering is: 21628

p=0.01, significant probes: 3470
[1] "Creating Fold Change plot for: TissueA-TissueC pVal 0.01  ..."
[1] "Clustering was not performed, because there are 3470 probes."
[1] "    The limit of probe counts for clustering is: 800"
p=0.05, significant probes: 7595
[1] "Creating Fold Change plot for: TissueA-TissueC pVal 0.05  ..."
[1] "Clustering was not performed, because there are 7595 probes."
[1] "    The limit of probe counts for clustering is: 800"
FDR=0.01, significant probes: 429
[1] "Creating Fold Change plot for: TissueA-TissueC FDR 0.01  ..."
[1] "Creating cluster heatmap for FDR 0.01 using Correlation ..."
[1] "Creating cluster heatmap for FDR 0.01 using Euclidean ..."
FDR=0.05, significant probes: 2700
[1] "Creating Fold Change plot for: TissueA-TissueC FDR 0.05  ..."
[1] "Clustering was not performed, because there are 2700 probes."
[1] "    The limit of probe counts for clustering is: 800"
FDR=0.1, significant probes: 5577
[1] "Creating Fold Change plot for: TissueA-TissueC FDR 0.1  ..."
[1] "Clustering was not performed, because there are 5577 probes."
[1] "    The limit of probe counts for clustering is: 800"
FDR=0.25, significant probes: 10894
[1] "Creating Fold Change plot for: TissueA-TissueC FDR 0.25  ..."
[1] "Clustering was not performed, because there are 10894 probes."
[1] "    The limit of probe counts for clustering is: 800"
\end{Soutput}
\end{Schunk}

%- End function doPlotFCT

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%- Funciton doANOVA
\section{ANOVA analysis}
The function \Rfunction{doANOVA} in package \Rpackage{ABarray} is designed
to perform one way or two way ANOVA analysis. If only one factor is provided
in parameter, one way ANOVA is performed. If two factors are provided,
two way ANOVA is performed.

The S/N ratio will be used to perform filtering. If a probe is detected
(S/N >= 3, default value) in 50\% (default value) or more samples within
any subgroups, it is included in the ANOVA analysis. If a probe is not
detected in 50\% or more samples in all the subgroups, it is removed
from ANOVA analysis.

\subsection{Required Data Object}
The function \Rfunction{doANOVA} works on either an \Rclass{ExpressionSet} object
or expression \Robject{matrix}. If \Rclass{ExpressionSet} object is used, experiment
design information should be already in the \Rclass{ExpressionSet} object. If
expression \Robject{matrix} is used, rows of the matrix should be probes or genes,
and columns of the matrix should be hybridization (or arrays).

\begin{itemize}
\item \Robject{ExpressionSet} object. The object can be generated from function
\Rfunction{ABarray} in package \Rpackage{ABarray}. For additional information
about functionality of \Rfunction{ABarray}, refer to document from
\Rfunction{ABarray}. The object can also be generated from other packages
in \Rpackage{Bioconductor} project. See www.bioconductor.org for more
information about other packages.
\item factor1. Factor name that the ANOVA test will be performed on.
If an \Robject{ExpressionSet} object is used, either name or labels can be used.
The name should be present in experiment design informaton in the
\Robject{ExpressionSet} object. Function \Rfunction{ABarray} in package
\Rpackage{ABarray} automatically creates such information if appropriate experiment
design file is used when function \Rfunction{ABarray} was run. If expression matrix
is to be used, \Robject{lables} should be used. A \Robject{label} is a vector which
labels each hybridization (or array) into groups. For example, a \Robject{label} <- 
c('drug1', 'drug1', 'drug2', 'drug2', 'none', 'none') indicates samples for arrays 1
and 2 were treated with drug1, and samples for arrays 5 and 6 were not treated.
\item optional factor2. If the second factor is provided, two way ANOVA will
be performed.
\end{itemize}

\subsection{doANOVA Analysis Output}
If one way ANOVA is performed, a vector with ANOVA p values will be returned.
The names of the vector are probeIDs. If two way ANOVA is performed, a matrix
will be returned. The rows of the matrix are probeIDs and the columns are
the levels of factors and interactions of the levels.

The ANOVA results will be saved into file automatically.

\subsection{doANOVA Function Demonstration}

\subsubsection{Required Data}
First let's see what the \Robject{eset} contains
\begin{Schunk}
\begin{Sinput}
> eset
\end{Sinput}
\begin{Soutput}
Expression Set (exprSet) with
	33096 genes
	12 samples
		 phenoData object with 6 variables and 12 cases
	 varLabels
		sampleName: read from file
		assayName: read from file
		arrayName: read from file
		pool: read from file
		IVT: read from file
		multIVT: read from file
\end{Soutput}
\end{Schunk}

Let's check the experiment design and pick factors for testing
\begin{Schunk}
\begin{Sinput}
> pData(eset)
\end{Sinput}
\begin{Soutput}
   sampleName    assayName arrayName  pool  IVT multIVT
1     EL_1A_1 HA004J2_1A_1   HA004J2 pool1 IVT1    IVT1
2     EL_1A_2 HA004J7_1A_2   HA004J7 pool1 IVT1    IVT1
3     EL_1A_3 HA004JB_1A_3   HA004JB pool1 IVT1    IVT1
4     EL_1B_1 HA004JD_1B_1   HA004JD pool1 IVT2    IVT2
5     EL_1B_2 HA004JI_1B_2   HA004JI pool1 IVT2    IVT2
6     EL_1B_3 HA004JK_1B_3   HA004JK pool1 IVT2    IVT2
7     EL_2A_1 HA004JN_2A_1   HA004JN pool2 IVT1    IVT3
8     EL_2A_2 HA004JO_2A_2   HA004JO pool2 IVT1    IVT3
9     EL_2A_3 HA004JP_2A_3   HA004JP pool2 IVT1    IVT3
10    EL_2B_1 HA004JS_2B_1   HA004JS pool2 IVT2    IVT4
11    EL_2B_2 HA004JT_2B_2   HA004JT pool2 IVT2    IVT4
12    EL_2B_3 HA004JU_2B_3   HA004JU pool2 IVT2    IVT4
\end{Soutput}
\end{Schunk}

\subsubsection{One Way ANOVA}
Let's begin to perform the analysis. Here we take a subset of probes (the
first 20 probes) to perform the ANOVA. We chose \Robject{multIVT}, since
it has 4 levels: IVT1, IVT2, IVT3, IVT4.
\begin{Schunk}
\begin{Sinput}
> aResult <- doANOVA(eset[1:20, ], "multIVT")
\end{Sinput}
\begin{Soutput}
Performing one-way ANOVA for multIVT ...
    Probes with S/N < 3 in at least 50% of samples in all subgroups of multIVT were not considered.
[1] "One-way ANOVA results (9 probes) were written to file:
     Result_multIVT/DataResult/ANOVA_oneway_multIVT.csv"
\end{Soutput}
\end{Schunk}

The results of this ANOVA analysis were saved into a file called
\Robject{ANOVA\_oneway\_multIVT.txt}.

Since we chose to perform one way ANOVA, \Robject{aResult} should be
a vector of length 20 (we only used 20 probes).
\begin{Schunk}
\begin{Sinput}
> aResult
\end{Sinput}
\begin{Soutput}
      100002       100037       100039       100058
1.620190e-03 5.705200e-03 1.009991e-03 9.374965e-05
      100060       100079       100089       100027
1.016003e-01 5.021948e-06 1.462993e-04 1.652952e-01
      100052
5.952003e-01
\end{Soutput}
\end{Schunk}

To save the results into file, we use R \Rfunction{write.table} function.
It takes an object whos data need to be written to file. Here our object
is \Robject{aResult}. A file argument (i.e., the name of the file). In
the following command, \Rfunarg{sep = ``,''} tells it to save the file as
comma delimit format. Argument \Rfunarg{row.names = TRUE} tells it to
write probeID into file as well.
\begin{Schunk}
\begin{Sinput}
> write.table(aResult, file = "multIVT_oneWay_ANOVA.csv",
+     sep = ",", row.names = TRUE)
\end{Sinput}
\end{Schunk}

You can check the content of the file. The first column is probeID, the
second column is p values from one way ANOVA analysis.


\subsubsection{Two Way ANOVA}
Let's perform two way ANOVA. Here we pass 2 factors ('pool' and 'IVT'),
and as before we will use only 20 probes for demonstration purpose.
\begin{Schunk}
\begin{Sinput}
> bResult <- doANOVA(eset[1:20, ], "pool", "IVT")
\end{Sinput}
\begin{Soutput}
Performing two-way ANOVA for pool and IVT ...
    Probes with S/N < 3 in at least 50% of samples in all subgroups of pool and IVT were not considered.
[1] "Two-way ANOVA results (8 probes) were written to file: Result_pool_IVT/ANOVA_pool_IVT.csv"
\end{Soutput}
\end{Schunk}

The results of the ANOVA analysis were saved into file
\Robject{ANOVA\_pool\_IVT.txt}.

Let's check if the \Robject{bResult} is a matrix
\begin{Schunk}
\begin{Sinput}
> dim(bResult)
\end{Sinput}
\begin{Soutput}
[1] 8 3
\end{Soutput}
\end{Schunk}

Let's get the p values for the two way ANOVA
\begin{Schunk}
\begin{Sinput}
> bResult
\end{Sinput}
\begin{Soutput}
            p(pool)      p(IVT)  p(pool*IVT)
100002 4.215936e-03 0.002562953 3.098940e-02
100037 1.116917e-02 0.156183104 5.328180e-03
100039 5.423037e-04 0.173732795 5.336480e-03
100058 6.514385e-04 0.009500397 8.888504e-05
100060 7.294050e-01 0.051155428 1.058891e-01
100079 3.244984e-05 0.001540973 5.688470e-06
100089 6.787365e-04 0.961322381 8.243031e-05
100052 7.863245e-01 0.206235388 8.623153e-01
\end{Soutput}
\end{Schunk}

To save the result to file, we can use the following command
\begin{Schunk}
\begin{Sinput}
> write.table(bResult, file = "pool_IVT_twoway_ANOVA.csv",
+     sep = ",", row.names = TRUE, col.names = NA)
\end{Sinput}
\end{Schunk}

%- End function doANOVA

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%- Function doLPE
\section{LPE analysis}
The function \Rfunction{doLPE} in package \Rpackage{ABarray} is based
on \Rpackage{LPE} package from www.bioconductor.org.

The \Rpackage{LPE} package describes local-pooled-error (LPE)
test for identifying differentially expressed genes especially
for low number of replicates (2-3) (See Jain:2003).

The test statistics is calculated as follows, which calculates
difference in medians between the two experimental conditions:

\begin{huge}
$\textit{Z} = \frac{Med_{1} - Med_{2}}{\sigma_{pooled}}$
\end{huge}

where the variance is estimated as follows:

\begin{huge}
$\sigma^{2}_{pooled} = \frac{\pi}{2}[\frac{\sigma^{2}_{1}(Med_{1})}{n1} + \frac{\sigma^{2}_{2}(Med_{2})}{n2}]$
\end{huge}

The function \Rfunction{doLPE} will automatically adjust p values using
Benjamini-Hochberg FDR approach.

Jain et. al. (2003) Local pooled error test for identifying differentially
expressed genes with a small number of replicated microarrays. Bioinformatics,
19, 1945-1951

\subsection{Required Data Object}
The function \Rfunction{doANOVA} works on an \Rclass{ExpressionSet} object.

\begin{itemize}
\item \Robject{ExpressionSet} object. The object can be generated from function
\Rfunction{ABarray} in package \Rpackage{ABarray}. For additional information
about functionality of \Rfunction{ABarray}, refer to document from
\Rfunction{ABarray}. The object can also be generated from other packages
in \Rpackage{Bioconductor} project. See www.bioconductor.org for more
information about other packages.

The S/N ratio information is stored in \Robject{assayDataElement(eset, "snDetect")}.
This will automatically be created from \Rfunction{ABarray} in \Rpackage{ABarray}
package. The analysis will filter probes based on S/N >= 3 in 75\% of samples. If S/N
is not available, all probes will be used for analysis (no filtering will be applied).

\item \Rfunarg{group} parameter. The group is a factor that the test should be performed on.
The name of the group should correspond to one of the names in experiment design file
that define sample type. If there are 3 or more levels for the group, the members
of the group should be specified for LPE test.

\item \Rfunarg{member} parameter. Where there is more than 2 levels in the
\Rfunarg{group} factor, any two members that need to be compared should be
specified in this paramter. For example, \Rfunarg{member <- c('drug1', 'drug4')} will
indicate to perform LPE between \Rfunarg{drug1} and \Rfunarg{drug4} only.

\end{itemize}

\subsection{doLPE Analysis Output}
The function \Rfunction{doLPE} returns a dataframe with p values and FDR adjusted
p values for each probes (if S/N available, filtered probes). It also produces
two files for preset FDR = 0.01 and FDR = 0.05.

\subsection{doLPE Function Demonstration}

\subsubsection{Required Data}
First let's see what the \Robject{eset} contains
\begin{Schunk}
\begin{Sinput}
> eset
\end{Sinput}
\begin{Soutput}
Expression Set (exprSet) with
	32878 genes
	6 samples
		 phenoData object with 3 variables and 6 cases
	 varLabels
		sampleName: read from file
		assayName: read from file
		tissue: read from file
\end{Soutput}
\end{Schunk}

Let's check the experiment design and pick factors for testing
\begin{Schunk}
\begin{Sinput}
> pData(eset)
\end{Sinput}
\begin{Soutput}
  sampleName              assayName  tissue
1         A1 HB003U2_9/2/05_1:55_PM TissueA
2         A2 HB003U3_9/2/05_2:11_PM TissueA
3         A3 HB003U8_9/2/05_2:26_PM TissueA
4         B1 HB003TV_9/2/05_2:41_PM TissueB
5         B2 HB003TW_9/2/05_2:55_PM TissueB
6         B3 HB003TZ_9/2/05_3:10_PM TissueB
\end{Soutput}
\end{Schunk}

\subsubsection{LPE analysis}
Let's begin to perform the analysis.
\begin{Schunk}
\begin{Sinput}
> aLPE.result <- doLPE(eset, "tissue")
\end{Sinput}
\begin{Soutput}
Performing LPE analysis for tissue ...
[1] "LPE results were written to file Result_tissue/DataResult/LPE_tissue_TissueB_TissueA.csv"
\end{Soutput}
\end{Schunk}

Let's take a few entries in aLPE.result to see what the dataframe looks like.
\begin{Schunk}
\begin{Sinput}
> aLPE.result[1:4, ]
\end{Sinput}
\begin{Soutput}
         x.x.B1   x.x.B2   x.x.B3 median.1  std.dev.1   y.y.A1   y.y.A2   y.y.A3 median.2  std.dev.2 median.diff
520680 21.78399 21.78399 21.78399 21.78399 0.07155299 21.78399 21.78399 21.78399 21.78399 0.11310696   0.0000000
643139 21.62803 21.62801 21.62809 21.62803 0.07155299 21.62810 21.50560 21.43475 21.50560 0.11310696   0.1224310
701229 21.50559 21.50557 21.50566 21.50559 0.07155299 21.43456 21.62804 21.62819 21.62804 0.11310696  -0.1224466
128174 21.43441 21.43437 21.43454 21.43441 0.07155299 18.45148 18.63286 18.47317 18.47317 0.05467501   2.9612377
       pooled.std.dev abs.z.stats p.adj.adjp.rawp p.adj.adjp.BH p.adj.index    z.real
520680     0.09682188    0.000000       1.0000000     1.0000000       17602  0.000000
643139     0.09682188    1.264497       0.2060517     0.2422529       18865  1.264497
701229     0.09682188    1.264658       0.2059938     0.2421976        1972  1.264658
128174     0.06514452   45.456439       0.0000000     0.0000000        8984 45.456439
\end{Soutput}
\end{Schunk}


To selectively view the result
\begin{Schunk}
\begin{Sinput}
> aLPE.result[1:5, c("median.1", "median.2", "median.diff", "p.adj.adjp.rawp", "p.adj.adjp.BH")]
\end{Sinput}
\begin{Soutput}
       median.1 median.2 median.diff p.adj.adjp.rawp p.adj.adjp.BH
520680 21.78399 21.78399   0.0000000       1.0000000     1.0000000
643139 21.62803 21.50560   0.1224310       0.2060517     0.2422529
701229 21.50559 21.62804  -0.1224466       0.2059938     0.2421976
128174 21.43441 18.47317   2.9612377       0.0000000     0.0000000
703444 21.34799 19.30107   2.0469234       0.0000000     0.0000000
\end{Soutput}
\end{Schunk}

%- End function doLPE

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%- Function doVennDiagram
\section{Creating Venn Diagram: doVennDiagram}
Venn Diagrams are frequently used for examining similarities and differences
in gene lists generated from different analysis. It is made up of
two or more overlapping circles. The functionality of the \Rfunction{doVennDiagram}
in \Rpackage{ABarray} relies on some functions provided in \Rpackage{limma}
package.

\subsection{Required Data}
\Rfunction{doVennDiagram} accepts two or three vectors of gene lists. If two
gene lists are provided, two way Venn diagram is produced. If three gene lists
are provided, three way Venn diagram is produced.
\begin{itemize}
\item list1. The first vector of gene list.
\item list2. The second vector of gene list.
\item list3. Optional third vector of gene list.
\item names. The names for use for each list in diagram.
\end{itemize}

\subsection{doVennDiagram Function Demonstration}
We will demonstrate two-way and three-way Venn diagram by creating three
gene lists. Note: the result of joined lists are automatically saved
into file.
\begin{itemize}
\item list1 contains probe names from index 1 to 100
\item list2 contains probe names from index 61 to 160
\item list3 contains probe names from index 81 to 200
\end{itemize}

After these three gene lists are created, we perform \Rfunction{doVennDiagram}.

\begin{Schunk}
\begin{Sinput}
> list1 <- geneNames(eset)[1:100]
> list2 <- geneNames(eset)[61:160]
> list3 <- geneNames(eset)[81:200]
> length(list1)
\end{Sinput}
\begin{Soutput}
[1] 100
\end{Soutput}
\begin{Sinput}
> length(list2)
\end{Sinput}
\begin{Soutput}
[1] 100
\end{Soutput}
\begin{Sinput}
> length(list3)
\end{Sinput}
\begin{Soutput}
[1] 120
\end{Soutput}
\end{Schunk}

\subsubsection{Two Way Venn Diagram}
There are 40 probes common between \Robject{list1} and \Robject{list2}.
Let's create Venn diagram between \Robject{list1} and \Robject{list2}.

\begin{Schunk}
\begin{Sinput}
> doVennDiagram(list1, list2)
\end{Sinput}
\begin{Soutput}
[1] "List information is written to file Venn_list_1+2.csv"
\end{Soutput}
\end{Schunk}
\includegraphics[width=0.5\textwidth]{Figure/venn-twoWay}


\subsubsection{Three way Venn Diagram}
There are 40 probes common between \Robject{list1} and \Robject{list2},
and 20 probes between \Robject{list1} and \Robject{list3}, and 80 probes
between \Robject{list2} and \Robject{list3}. Let's create Venn diagram
between \Robject{list1} and \Robject{list2}.

\begin{Schunk}
\begin{Sinput}
> doVennDiagram(list1, list2, list3)
\end{Sinput}
\begin{Soutput}
[1] "List information is written to file Venn_list_1+2+3.csv"
\end{Soutput}
\end{Schunk}
\includegraphics[width=0.5\textwidth]{Figure/venn-threeWay}

%- End function doVennDiagram


\end{document}