%\VignetteIndexEntry{samroc - example} %\VignetteDepends{SAGx,multtest, hu6800.db} %\VignetteKeywords{Expression Analysis} %\VignettePackage{SAGx} \documentclass[11pt]{article} \usepackage{geometry}\usepackage{color} \definecolor{darkblue}{rgb}{0.0,0.0,0.75} \usepackage[% baseurl={http://www.bioconductor.org},% pdftitle={Samroc example},% pdfauthor={Per Broberg},% pdfsubject={SAGx},% pdfkeywords={Bioconductor},% pagebackref,bookmarks,colorlinks,linkcolor=darkblue,citecolor=darkblue,% pagecolor=darkblue,raiselinks,plainpages,pdftex]{hyperref} %------------------------------------------------------------ % newcommand %------------------------------------------------------------ \newcommand{\arsinh}{\mathop{\mathgroup\symoperators arsinh}\nolimits} \newcommand{\Robject}[1]{\texttt{#1}} \newcommand{\Rpackage}[1]{\textit{#1}} \newcommand{\Rclass}[1]{\textit{#1}} \newcommand{\Rfunction}[1]{{\small\texttt{#1}}} \newcommand{\myincfig}[3]{% \begin{figure}[htbp] \begin{center} \includegraphics[width=#2]{#1} \caption{\label{#1}#3} \end{center} \end{figure} } \usepackage[authoryear,round]{natbib} \usepackage{graphicx} \usepackage{amsfonts} \begin{document} %------------------------------------------------------------ \title{Samroc example} %------------------------------------------------------------ \author{Per Broberg} \maketitle \section*{Analysis of the data from Golub \textit{et al.}} Consider the microarray experiment in \cite{golubetal} where ALL and AML subtypes of leukemia are compared. The data are available within package \Rpackage{multtest}. We can analyse those data in \Rpackage{SAGx} with the function \textit{samrocNboot}. The ideas behind it are presented in \cite{broberg:2003}. Briefly, the method relies on a penalised \textit{t}-test statistica $ d = (\bar{x}_1 - \bar{x}_2)/(S + a)$ with fudge factor $a$ \cite{efron:2001}. In this case the effect estimated consists of a difference in group means. In general the method can estimate and test one such effect in the presence of explanatory variables such as AGE or GENDER using a linear model. In such a case the function \Rfunction{samrocN} provides a solution. Example code now follows. <<>>= library("SAGx") library("multtest") data(golub) set.seed(849867) samroc.res <- samrocN(data = golub, formula = ~as.factor(golub.cl)) show(samroc.res) @ The function \Rfunction{samrocN} is used to perform a penalised \textit{t}-test. Its value is an object of class \Rclass{samroc.result}. The functions \Rfunction{show} and \Rfunction{plot} are defined for such objects. In Figure \ref{density} the densities of the test statistic and its permutation null distribution are displayed. The graph was produced by invoking the \Rfunction{plot} function <>= plot(samroc.res) @ \begin{figure}[htbp] \centering <>= par(bg = "cornsilk") plot(samroc.res) @ \caption{Densities of the test statistic and of its permutation null distribution} \label{density} \end{figure} \begin{figure}[htbp] \centering The estimated proportion unchanged genes equals \Sexpr{format.pval(samroc.res@p0, digits = 2)}. The distribution of \textit{p}-values is shown in Figure~\ref{hist1}, which confirms that many genes are changed. Furthermore, using the function \textit{pava.fdr} we obtain estimates of the FDR and of the local FDR, see Figure~\ref{hist2}. This function is presented in \cite{broberg:2005} and combines the local FDR estimator of \cite{aubert:2004} with Poisson regression (see \cite{efron:2004}) and isotonic regression. <>= par(bg = "cornsilk") hist(samroc.res@pvalues, xlab = "p-value", main ="", col = 'orange', freq = F) print(abline(samroc.res@p0,0, col = 'red')) @ \caption{Histogram of the \textit{p}-values generated by function \textit{samrocNboot}} \label{hist1} \end{figure} \begin{figure}[htbp] \centering <>= par(bg = "cornsilk") fdrs <- pava.fdr(ps = samroc.res@pvalues) plot(samroc.res@pvalues, fdrs$pava.local.fdr, type = 'n', xlab = "p-value", ylab = "False Discovery Rate (FDR)") lines(lowess(samroc.res@pvalues, fdrs$pava.local.fdr), col = 'red') lines(lowess(samroc.res@pvalues, fdrs$pava.fdr), col = 'blue') legend(0.1,0.9,pch=NULL,col=c("red","blue"),c("pava local FDR","pava FDR"),lty = 1) @ \caption{Scatter plot of the local false discovery rate and the false discovery rate as estimated by function \textit{pava.fdr}} \label{hist2} \end{figure} One can also perform a simple Gene Set Enrichment Analysis based on the output from \Rfunction{samrocNboot} by invoking \Rfunction{GSEA.mean.t}, cf. \cite{lutian:2005} which describes a similar idea. The package \Rpackage{hu6800.db} maps KEGG pathways \cite{kegg:2000} onto probeset identifiers. The following code analyses one KEGG pathway ( 00970 Aminoacyl-tRNA biosynthesis) and outputs a p-value based on the average over the pathway of the absolute value of the test statistic $d$. The algorithm includes restandardization following \cite{efron:tibshirani:2006}. <<>>= library("hu6800.db") kegg <- as.list(hu6800PATH2PROBE) probeset <- golub.gnames[,3] GSEA.mean.t(samroc = samroc.res, probeset = probeset, pway = kegg[1], type = "original", two.side = FALSE) @ \newpage \section{On the calculation of p-values} Following \cite{tusher:2001}, \cite{broberg:2003} defines a permutation p-value for gene $i$ out of a total $N$ as \begin{equation}\label{pval1} p_{i} = \frac{\#\{d^{*k}(j)|: |d^{*k}(j)| > |d(i)|\} }{N \times B} \end{equation} , denoting by $d(i)$ the test statistic corresponding to gene $i$, and by $d^{*k}(i)$ the permutation null statistic in the $k^{th}$ iteration out of a total $B$. This has the unfortunate side effect of occasionally returning $p$-values equal to zero. To solve this the definition from \cite{davison:1997} is employed. Denote by $F_{n}$ the empirical distribution function of all $-|d^{*k}|$. The estimate then becomes: \begin{equation}\label{pval2} p_i = \frac{B\times N\times F_{n}(-|d(i)|)+ 1}{B\times N + 1} \end{equation} This follows from $\{t^{*} \geq t\} \Leftrightarrow \{ -t^{*} \leq -t\}$. Various functions from SAGx were used in \cite{StefanPierrou03152007}. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \newpage \bibliographystyle{plainnat} \bibliography{samroc-ex} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \end{document}