%\VignetteIndexEntry{4. XPS Vignette: Function express()} %\VignetteDepends{xps} %\VignetteKeywords{Expression, Affymetrix, Oligonucleotide Arrays} %\VignettePackage{xps} \documentclass{article} \usepackage{hyperref} \usepackage[authoryear,round]{natbib} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rcode}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textsf{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\xps}{\Rpackage{xps}} \newcommand{\ROOT}{\Robject{ROOT}} \begin{document} %----------------------------------------------------------------------- \title{Introduction to the xps Package: Function \Rfunction{express()}} %----------------------------------------------------------------------- \date{September, 2010} \author{Christian Stratowa} \maketitle \tableofcontents %----------------------------------------------------------------------- \section{Introduction} %----------------------------------------------------------------------- This document describes how to use function \Rfunction{express} to combine different preprocessing methods, which are currently available in package \xps\, i.e. methods for (i) background correction, (ii) probe-level normalization, and (iii) summarization. These different methods, or a subset thereof, can be computed with a single call to function \Rfunction{express}, or in a stepwise manner by calling \Rfunction{express} consecutively with the result of one call used as input for the next call. Applying \Rfunction{express} stepwise has certain advantages, which will be explained later. Since \Rfunction{express} is a wrapper function to method \Rmethod{xpsPreprocess} both function calls can be used. \\ In addition to the mentioned preprocessing steps it is sometimes advisable to add a further preprocessing step after the summarization step, i.e. (iv) probeset-level normalization. One example is the MAS5 algorithm where the expression measures are usually scaled to a predefined target intensity \citep{affy:tech:2002}. This additional step is handled by function \Rfunction{normalize}, which is a wrapper to method \Rmethod{xpsNormalize}. \\ Using a subset of the Affymetrix Exon Array Data Set "Tissue Mixture" for the expression array "Human Genome U133 Plus 2.0", examples will be presented for all available algorithms for the different preprocessing methods. The aim of this document is to explain all parameters for all methods in detail based on the examples presented. {\bf Note:} In order to keep this document short, only example code will be presented here, the full source code is available in file "script4xpsPreprocess.R" located in directory "xps/examples". %----------------------------------------------------------------------- \section{Computing RMA using function \Rfunction{express()}} %----------------------------------------------------------------------- In the following subsections we will demonstrate how to compute RMA using the general function \Rfunction{express}. However, first it is necessary to load the \ROOT\ \Rclass{scheme} and \ROOT\ \Rclass{data} files, as explained in vignette "xps.pdf" (Overview): <>= scheme.u133p2 <- root.scheme(file.path(scmdir, "Scheme_HGU133p2_na28.root")) data.u133p2 <- root.data(scheme.u133p2, file.path(datdir, "HuTissuesU133P2_cel.root")) @ Usually, RMA is computed with a simple call to function \Rfunction{rma}: <>= data.rma <- rma(data.u133p2, "MixU133P2RMA", filedir=outdir) @ %- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \subsection{Compute RMA with a single call to \Rfunction{express}} %- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Using a single call to function \Rfunction{express} RMA is computed as follows: <>= expr.rma <- express(data.u133p2, "U133P2Exprs", filedir=outdir, tmpdir="", update=FALSE, bgcorrect.method="rma", bgcorrect.select="none", bgcorrect.option="pmonly:epanechnikov", bgcorrect.params=c(16384), normalize.method="quantile", normalize.select="pmonly", normalize.option="transcript:together:none", normalize.logbase="0", normalize.params=c(0.0), summarize.method="medianpolish", summarize.select="pmonly", summarize.option="transcript", summarize.logbase="log2", summarize.params=c(10, 0.01, 1.0)) @ Valid expression levels can be obtained by calling <>= expr <- validExpr(expr.rma) @ %- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \subsection{Compute RMA stepwise} %- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Alternatively, function \Rfunction{express} can be used to compute RMA step-by-step. This can have the following advantages: \begin{itemize} \item different algorithms can be tested at each step and combined w/o the need to re-calculate the earlier step(s) \item quality control tests for each array can be done separately at each step \item stepwise computation is useful when computing RMA for 10,000 CEL-files or more \\ \end{itemize} \begin{tabbing} {\bf 1. step:} background correction for \Rclass{DateTreeSet} \Robject{data.u133p2}: \end{tabbing} The code for computing \Rcode{rma} background correction is as follows: <>= bgrd.rma <- express(data.u133p2, "BgrdRMA", filedir=outdir, tmpdir="", update=FALSE, bgcorrect.method="rma", bgcorrect.select="none", bgcorrect.option="pmonly:epanechnikov", bgcorrect.params=c(16384)) @ {\bf Note:} For background correction is is not allowed to define a \Rcode{tmpdir}, otherwise all trees would be stored in a temporary \ROOT\ file and file \Rcode{BgrdRMA.root} would be empty. \\ Now we can do some quality controls such as drawing images, density plots and boxplots. \\ First we draw the images for background values and background-corrected intensities, respectively, for e.g. array "ProstateA": <>= bgrdnames <- colnames(validBgrd(bgrd.rma)) datanames <- colnames(validData(bgrd.rma)) image(bgrd.rma, bg=TRUE, transfo=log2, col=heat.colors(12), names=bgrdnames[4]) image(bgrd.rma, transfo=log2, col=heat.colors(12), names=datanames[4]) @ \begin{figure}[h] \begin{center} \includegraphics[width=70mm]{Image_BgrdRMA_bgrd4.png} \includegraphics[width=70mm]{Image_BgrdRMA_data4.png} % \caption{xxx} % \label{xxx} \end{center} \end{figure} \pagebreak Then we draw the density plots for raw and background-corrected intensities, respectively: <>= hist(data.u133p2, which="pm") hist(bgrd.rma, which="pm") @ \begin{figure}[h] \begin{center} \includegraphics[width=70mm]{DensityPlot_DataU133P2.png} \includegraphics[width=70mm]{DensityPlot_BgrdRMA.png} % \caption{xxx} % \label{xxx} \end{center} \end{figure} Finally we draw the boxplots for raw and background-corrected intensities, respectively: <>= boxplot(data.u133p2, which="pm") boxplot(bgrd.rma, which="pm") @ \begin{figure}[h] \begin{center} \includegraphics[width=70mm]{BoxPlot_DataU133P2.png} \includegraphics[width=70mm]{BoxPlot_BgrdRMA.png} % \caption{xxx} % \label{xxx} \end{center} \end{figure} Alternatively, it is also possible to draw boxplots for background values and background-corrected intensities, respectively, as follows: <>= bgrd <- validBgrd(bgrd.rma, which="pm") data <- validData(bgrd.rma, which="pm") boxplot(log2(bgrd), las=2) boxplot(log2(data), las=2) @ \begin{figure}[h] \begin{center} \includegraphics[width=70mm]{BoxPlot_BgrdRMA_bgrd.png} \includegraphics[width=70mm]{BoxPlot_BgrdRMA_data.png} % \caption{xxx} % \label{xxx} \end{center} \end{figure} \pagebreak \begin{tabbing} {\bf 2. step:} quantile normalization for resulting \Rclass{DateTreeSet} \Robject{bgrd.rma}: \end{tabbing} Using the background-corrected data \Rcode{bgrd.rma} as input the code for computing \Rcode{quantile} normalization is as follows: <>= norm.qu <- express(bgrd.rma, "NormQuan", filedir=outdir, tmpdir="", update=FALSE, normalize.method="quantile", normalize.select="pmonly", normalize.option="transcript:together:none", normalize.logbase="0", normalize.params=c(0.0)) @ {\bf Note:} For the normalization step is is also not allowed to define a \Rcode{tmpdir}, otherwise all trees would be stored in a temporary \ROOT\ file and file \Rcode{NormQuan.root} would be empty. \\ Now we draw density plots and boxplots of the normalized intensities as quality controls: <>= hist(norm.qu, which="pm") boxplot(norm.qu, which="pm") @ \begin{figure}[h] \begin{center} \includegraphics[width=70mm]{DensityPlot_NormQuant.png} \includegraphics[width=70mm]{BoxPlot_NormQuant.png} % \caption{xxx} % \label{xxx} \end{center} \end{figure} \pagebreak In addition we can draw scatter plots between normalized intensities at the probe-level: <>= data <- validData(norm.qu, which="pm") plot(log2(data[,1]), log2(data[,2]), xlab="BreastA", ylab="BreastB") plot(log2(data[,1]), log2(data[,4]), xlab="BreastA", ylab="ProstateA") @ \begin{figure}[h] \begin{center} \includegraphics[width=70mm]{ScatterPlot_NormQuant_PM_BrABrB.png} \includegraphics[width=70mm]{ScatterPlot_NormQuant_PM_BrAPrA.png} % \caption{xxx} % \label{xxx} \end{center} \end{figure} \begin{tabbing} {\bf 3. step:} median-polish summarization for resulting \Rclass{DateTreeSet} \Robject{norm.qu}: \end{tabbing} Finally we do \Rcode{medianpolish} summarization, using the normalized data \Rcode{norm.qu} as input: <>= expr.mp <- express(norm.qu, "ExprMedpol", filedir=outdir, tmpdir="", update=FALSE, summarize.method="medianpolish", summarize.select="pmonly", summarize.option="transcript", summarize.logbase="log2", summarize.params=c(10, 0.01, 1.0), bufsize=32000) @ {\bf Note 1:} For the summarization step is is possible to define a \Rcode{tmpdir}; when handling hundreds of trees on computers with 1 GB RAM only, it is even the recommended way for multichip summarization methods such as \Rcode{medianpolish}. \\ {\bf Note 2:} When using a multichip summarization method with data consisting of many thousand trees it could also be necessary to reduce the basket size of each \ROOT\ tree, e.g. to a size of \Rcode{bufsize=2000}. (Internally, each \ROOT\ tree consists of many tree baskets, and when reading a tree, one basket after the other is loaded into memory and not the whole tree, thus reducing memory consumption.) \\ Now we can draw density plots and boxplots of the final expression values: <>= hist(expr.mp) boxplot(expr.mp) @ \begin{figure}[h] \begin{center} \includegraphics[width=70mm]{DensityPlot_ExprMedpol.png} \includegraphics[width=70mm]{BoxPlot_ExprMedpol.png} % \caption{xxx} % \label{xxx} \end{center} \end{figure} \pagebreak In addition we can draw scatter plots between expression values at the transcript-level: <>= expr <- validData(expr.mp) plot(log2(expr[,1]), log2(expr[,2]), xlab="BreastA", ylab="BreastB") plot(log2(expr[,1]), log2(expr[,4]), xlab="BreastA", ylab="ProstateA") @ \begin{figure}[h] \begin{center} \includegraphics[width=70mm]{ScatterPlot_ExprMedpol_BrABrB.png} \includegraphics[width=70mm]{ScatterPlot_ExprMedpol_BrAPrA.png} % \caption{xxx} % \label{xxx} \end{center} \end{figure} \pagebreak %----------------------------------------------------------------------- \section{Background Correction using \Rfunction{express()}} %----------------------------------------------------------------------- Currently the following background correction methods are available as \Rcode{bgcorrect.method}: \begin{itemize} \item {\tt rma}: background is calculated using a global model for the distribution of probe intensities. \item {\tt sector}: a constant background is calculated for each sector separately and subtracted from intensities. \item {\tt weightedsector}: first, a constant background is calculated for each sector separately, then a weight is calculated and each background value multiplied with the weight to smooth the transition between sectors, and finally the smoothed background is subtracted from intensities. \item {\tt gccontent}: background is calculated based on the mean intensity of probes with identical GC content. \end{itemize} {\bf Note:} By default the summarization methods \Rcode{farms} and \Rcode{dfw} skip the background correction step. %- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \subsection{rma} %- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - The \Rcode{rma} background adjustment method is described in Irizarry \citep{iriz:etal:2003}. By default PM probe intensities are corrected by using a global model for the distribution of probe intensities. When using function \Rfunction{express()} the following \Rcode{bgcorrect} parameters can be used for \Rcode{bgcorrect.method="rma"}: \begin{tabbing} xxxxxxxxxxxxx\=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\kill Settings to be used for \Rcode{bgcorrect.select}: \\ {\tt none}: \> scheme mask is not changed by selector \end{tabbing} \begin{tabbing} xxxxxxxxxxxxx\=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\kill Settings to be used for \Rcode{bgcorrect.option = "