%\VignetteIndexEntry{ABSSeq} %\VignettePackage{ABSSeq} \documentclass[a4paper]{article} \title{ABSSeq: a new RNA-Seq analysis method based on absolute expression differences and generalized Poisson model} \author{Wentao Yang} \begin{document} \maketitle \section{Introduction} This vignette is intended to give a brief introduction of the \verb'ABSSeq' \textsf{R} package by analyzing the simulated data from Soneson et al. \cite{soneson} . For details about the approach, consult Yang \cite{yang}. Currently, \verb'ABSSeq' can just be applied on pairwise study. We assume that we have counts data from an experiment, which consists of two conditions and several replicates for each condition in a matrix. The counts usually have enormous variation across genes and compared conditions. The reliable identification of differential expression (DE) genes from such data requires a probabilistic model to account for ambiguity caused by sample size, biological and technical variations, levels of expression and outliers. \verb'ABSSeq' infers differential expression by the absolute expression differences between conditions. It assumes that the absolute expression difference of each gene between conditions is contributed by two parts, the expression variation across samples and the differential expression. If one gene belongs to differential expression gene, its absolute expression difference should be larger than its expression variation and also relative large among the changes of all gene. Based on this hypothesis, \verb'ABSSeq' employs two generalized Poisson model to account for the variation across samples and overall changes. It calculates a pvalue accroding to bulit model. \verb'ABSSeq' tests null hypothesis which takes into account the magnitude of expression difference through two directions: samples and genes, and therefore detects differential expression genes which are closer to the biological concept of differential expression. \section{Pairwise study} We firstly import the \verb'ABSSeq' package. <<>>= library(ABSSeq) @ Then, we load a simulated data set. It is a list and contains three elements: the counts matrix, denoted by 'counts', the groups, denoted by 'groups' and differential expression genes, denoted by 'DEs'. <<>>= data(simuN5) names(simuN5) @ The data is simulated from Negative binomial distribution with means and variances from Pickrell's data \cite{pickrell} and added outliers randomly \cite{soneson}. This data includes group informtion. <<>>= simuN5$groups @ But we also can define groups as <<>>= conditions <- factor(c(rep(1,5),rep(2,5))) @ We construct an \verb'ABSDataSet' object by combining the counts matrix and defined groups with the \verb'ABSDataSet' function. <<>>= obj <- ABSDataSet(simuN5$counts, factor(simuN5$groups)) obj1 <- ABSDataSet(simuN5$counts, conditions) @ The default normalization method is \verb'quartile', used the up quantile of data. However, there are also other choices for users, that is, \verb'total' by total reads count, \verb'DESeq' from DESeq \cite{deseq} and \verb'User' through size factors provided by users. The normalization method can be checked and revised by \verb'normMethod'. <<>>= obj1 <- ABSDataSet(simuN5$counts, factor(simuN5$groups),"User",runif(10,1,2)) normMethod(obj1) normMethod(obj1) <- "DESeq" normMethod(obj1) @ Once we get the \verb'ABSDataSet' object, We can estimate the size factor for each sample by selected method as mentioned above used the function \verb'normalFactors'. And we can see the size factors by \verb'sizeFactors'. <<>>= obj=normalFactors(obj) sizeFactors(obj) @ Then, we can get the normalized counts by \verb'counts'. <<>>= head(counts(obj,norm=TRUE)) @ With the size factors, we can calculate the absolate difference between conditions, variances, log2 of fold-change for each gene. It can be done by function \verb'calPara' as <<>>= obj=calPara(obj) @ If we want to see correlation between the absolute difference and expression level, we can use function \verb'plotDifftoBase'. <>= plotDifftoBase(obj) @ \begin{figure}[!ht] \begin{center} <>= <> @ \caption{'Absolute difference against expression level'-plot for count data. We show the correlation by isoreg and marked genes with different color according to a given fold-change.} \label{plotDifftoBase} \end{center} \end{figure} In the end, we model the data with generalized Poisson distribution and calculate the pvalue for each gene based on the absolute difference. It can be done by the function \verb'GPTest', which reports pvalues as well as adjusted pvalue, which can be accessed by \verb'results' with names of \verb'pvalue' and \verb'adj.pvalue'. <<>>= obj <- GPTest(obj) head(results(obj,c("pvalue","adj.pvalue"))) @ The \verb'results' function can be used to access all information in an \verb'ABSDataSet'. <<>>= head(results(obj)) @ Besides, we can also get this result by the function \verb'ABSSeq', which perfoms a default analysis by calling above functions in order and returns a table with mean expression of each group, log2 fold-change, pvalue and adjusted pvalue. <<>>= data(simuN5) obj <- ABSDataSet(simuN5$counts, factor(simuN5$groups)) res <- ABSSeq(obj) head(res) @ \begin{thebibliography}{99} \bibitem{yang} Wentao Yang, Philip Rosenstielb and Hinrich Schulenburg. \textsl{ABSSeq: a new RNA-Seq analysis method based on absolute expression differences and generalized Poisson model.} (2014). \bibitem{soneson} Soneson C, Delorenzi M \textsl{A comparison of methods for differential expression analysis of RNA-seq data.} BMC Bioinformatics 2013, 14(1):91. \bibitem{pickrell} Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras J-B, Stephens M, Gilad Y, Pritchard JK \textsl{Understanding mechanisms underlying human gene expression variation with RNA sequencing} Nature 2010, 464(7289):768-772. \bibitem{deseq} Anders S, Huber W \textsl{Differential expression analysis for sequence count data.} Genome Biol 2010, 11(10):R106. \end{thebibliography} \end{document}