%\VignetteIndexEntry{LVSmiRNA} %\VignetteKeywords{Preprocessing, Agilent} %\VignetteDepends{LVSmiRNA} %\VignettePackage{LVSmiRNA} \documentclass{article} \usepackage{graphicx} \usepackage{subfigure} \usepackage{url} \usepackage{float} \usepackage{natbib} \usepackage{setspace} \setstretch{1.5} \makeatletter \newcommand\figcaption{\def\@captype{figure}\caption} \newcommand\tabcaption{\def\@captype{table}\caption} \makeatother \setlength{\textwidth}{6.5in} \setlength{\oddsidemargin}{0in} \setlength{\evensidemargin}{0in} \begin{document} \title{Quick start guide for using the \textit{LVSmiRNA} package in R} \author{Stefano Calza, Suo Chen and Yudi Pawitan} \maketitle \tableofcontents % \begin{abstract} % \end{abstract} \section{Overview} This document provides a brief guide to the \textit{LVSmiRNA} package, which is a package for normalization of microRNA (miRNA) microarray data. There are four components to this package. These are: i) Reading in data. ii) Identify a subset of miRNAs with the smallest array-to-array variation, called LVS miRNAs. iii) Normalization using the selected reference set. iv) Summarization The LVS normalization method \cite{Calza2007} builds upon the fact that the data-driven housekeeping miRNAs that are the least variant across samples might be a good reference set for normalization. The total information extracted from probe-level intensity data of all samples is modeled as a function of array and probe effects using robust linear fit (\textit{rlm}) \cite{Huber1964}. The method selects miRNAs according to the array-effect statistic and residual standard deviation (SD) from the model. The modified LVS also incorporates a more complex \textit{joint} model to identify the LVS. Instead of assuming constant residual variation in \textit{rlm}, the dispersion parameters are modeled as a function of array and probe effects \cite{Lee2006}. The advantage of LVS normalization is that it is robust against violation of standard assumptions in most methods: the majority of features do not vary between samples and the proportion of up and down regulated expression are approximately equal. \section{Getting started} To load the \textbf{LVSmiRNA} in your R session, type <>= library(LVSmiRNA) @ \subsection{Example data: Spike-in Agilent chip} We demonstrate the functionality of this R package using miRNA expression data from a spike-in experiment \cite{Willenbrock2009} which is included as part of the package. The input file \textbf{Comparison\_Array.txt} is a text file containing a header row, names of the samples in one column called 'Sample'. \begin{enumerate} \item To begin, users will have to save the relevant image processing output files and the file containing samples descriptions (e.g. \textbf{Comparison Array.txt} in a directory. The example data can be downloaded from the author website (\url{http://www.med.unibs.it/~calza/software/examples.tar.gz}). \item Read the samples description file (e.g. \texttt{Comparison\_Array.txt}) into \textsf{R}. <>= dir.files <- system.file("extdata", package="LVSmiRNA") taqman.data <- read.table(file.path(dir.files,"Comparison_Array.txt"),header=TRUE,as.is=TRUE) @ \item Read in the raw intensities data. Modify the \texttt{FileName} column in \texttt{taqman.data} to match the position of the exemple files. The current values assume that the files are located in the working directory. <>= here.files <- "some/path/to/files" ## NOT RUN MIR <- read.mir(taqman.data,path=here.files) @ \item A binary version of the data is already available in the package. <>= data("MIR-spike-in") @ \item Identify LVS miRNAs: calculate residual variance and array-to-array variability which is measured by a $\chi^{2}$ statistic for the raw data fitted by either standard rlm or joint model. <>= MIR.RA <- estVC(MIR) @ \item Again for semplicity the object can be directly loaded from the package <>= data("MIR_RA") @ \item Make a scatter plot to visualize the relationship between the square-root or logarithm of array effect versus the logrithm of the residual SD, called the 'RA-plot'. <>= plot(MIR.RA) @ \item Normalization: perform \textit{lvs} normalization based on \texttt{MIR.RA}. The default procedure will first summarize data (based on ``rlm'' method) and then normalize. <>= MIR.lvs <- lvs(MIR,RA=MIR.RA) @ \item Now have a look at the box plot of data after normalization. <>= boxplot(MIR.lvs) @ \item Summarization can also be performed without normalization, using different methods. <>= ex.1 <- summarize(MIR,RA=MIR.RA,method="rlm") ex.2 <- summarize(MIR,method="medianpolish") @ \item If no \texttt{RA} argument is supplied to the function \texttt{lvs}, the computation performed by \texttt{estVC} will be carried on within \texttt{lvs}. As this is the most computationally intensive step in the procedure we stringly suggest to perform it using \texttt{estVC} and saving the result for following steps. \end{enumerate} \section{Parallel computation} The bottle neck in \texttt{LVSmiRNA} in terms of speed is the computation of Array \& probes effects (function \texttt{estVC}). Therefore \texttt{LVSmiRNA} allows for paralle computation using either \texttt{multicore} or \texttt{snow}. To use either packages the user must load them and set up the cluster (if needed) manually. \subsection{Using \texttt{multicore}} \label{sec:mcore} The package \texttt{multicore} requires minimum user intervention. The only option is the choice of the number of cores to use. By default \texttt{multicore} would use all the available. Setting a different number can be done in the \texttt{options}. <>= require(multicore) options(cores=8) MIR.RA <- estVC(MIR) @ \subsection{Using \texttt{snow}} \label{sec:mcore} The package \texttt{snow} requires the user to set up a cluster object manually. The user must choose the number of clusters and the type. See \texttt{?makeCluster} gor more details. Here is an example using ``SOCK'' cluster type. <>= require(snow) cl <- makeCluster(8,"SOCK") MIR.RA <- estVC(MIR,clName=cl) stopCluster(cl) @ And here an example using ``MPI''. This would load the \texttt{Rmpi} package. <>= cl <- makeCluster(8,"MPI") MIR.RA <- estVC(MIR,clName=cl) stopCluster(cl) @ \bibliographystyle{plainnat} \bibliography{LVSmiRNA} \end{document}