%\VignetteIndexEntry{Glimma Vignette} %\VignetteKeyword{RNA-Seq} %\VignetteKeyword{differential expression} %\VignetteKeyword{interactive graphics} %\VignettePackage{Glimma} \documentclass{article} <>= BiocStyle::latex() @ \usepackage{hyperref} \usepackage{graphicx} \begin{document} \title{Overview of the Glimma package} \author{Shian Su, Matthew E. Ritchie} \date{28 January 2016\\ Revised 27 February 2016} \maketitle \tableofcontents \section{Introduction} \Biocpkg{Glimma} is a \Bioconductor{} package for interactive visualization of the results from a differential expression analysis of RNA-sequencing data. This functionality is intended to enhance reporting capabilities so that results can be explored more easily by end-users. \Biocpkg{Glimma} (which loosely stands for {\it I}nteractive {\it G}raphics from the {\it limma} package) extends some of the popular plotting capabilities in \Biocpkg{limma} \cite{limma} such as multidimensional scaling (MDS) plots, that can be useful for visualising relationships between samples, and mean-difference plots (MD plots) for summarising results from comparisons of interest. The choice of displays and layouts used was inspired by visualisations in the Degust software \cite{Degust}. The \Biocpkg{Glimma} package is designed to handle RNA-seq differential expression results from the \Biocpkg{limma}, \Biocpkg{edgeR} \cite{edgeR} and \Biocpkg{DESeq2} \cite{DESeq2} packages. Figure~\ref{fig:overview} below provides an overview of this functionality. \begin{figure} \centerline{\includegraphics[width=0.65\linewidth]{GlimmaWorkFlow.pdf}} \caption{Overview of package workflow.} \label{fig:overview} \end{figure} In this vignette we demonstrate the two main plotting capabilities of this package using a published RNA-seq dataset \cite{Smchd1}. \section{Getting Started} We first perform a basic differential expression analysis on an RNA-seq data set involving Lymphoma cell-lines with either wild-type or null levels of the gene {\it Smchd1} \cite{Smchd1}. This experiment was analysed using a {\it limma-voom} pipeline \cite{voom, voomqwts} that tests for differential expression relative to a particular fold-change using treat \cite{treat}. The plots that \Biocpkg{Glimma} extends are shown in Figure~\ref{fig:static}. <>= library(edgeR) library(limma) library(Glimma) data(lymphomaRNAseq) x <- lymphomaRNAseq class(x) ## Filter out genes with low counts sel = rowSums(cpm(x$counts)>0.5)>=3 x = x[sel,] ## Make MDS plot genotype = relevel(x$samples$group, "Smchd1-null") par(mfrow=c(1,2)) plotMDS(x, label=1:ncol(x), main="MDS plot", col=as.numeric(genotype)) legend("topright",legend=c("Smchd1-null","Wild Type"), pch=20, col=1:2, text.col=1:2) ## Normalize the data using TMM x = calcNormFactors(x, method="TMM") ## Set up design matrix des = model.matrix(~genotype) des ## Apply voom with sample quality weights and fit linear model v=voomWithQualityWeights(x, design=des, normalization="none", plot=FALSE) vfit = lmFit(v,des) ## Apply treat relative to a fold-change of 1.5 vtfit=treat(vfit,lfc=log2(1.5)) vfit= eBayes(vfit) results = decideTests(vfit,p.value=0.01) summary(results) ## Make a mean-difference (MD) plot of the results plotMD(vfit,col=2, status=results[,2], hl.col=c("red", "blue"), legend="topright", main="MD plot: Wild-type vs Smchd1") @ \begin{figure} \centerline{\includegraphics[width=\linewidth]{Glimma-basicanalysis}} \caption{Examples of static MDS plot (left) and mean-difference plot (right) for the Lymphoma RNA-seq data set. \Biocpkg{Glimma} extends these functions, adding various interactive features to each.} \label{fig:static} \end{figure} \section{Interactive Multidimensional Scaling Plots} The first extension display extends \Rfunction{plotMDS} from \Biocpkg{limma}. Multidimensional scaling is a dimension reduction technique that is popular in gene expression analysis as a check that the samples are well behaved (i.e. whether replicate samples cluster together and to check for the presence of batch effects). The \Rfunction{glMDSPlot} function generates an html page with two panels, with an MDS plot on the left and a on the right and a bar plot of the eigenvalues (a measure of the magnitude of variation explained by each dimension) on the left. An example from the Lymphoma data set \cite{Smchd1} is given below, with Figure~\ref{fig:mds} showing a screen shot of the output. The interactive nature of this display allows users to hover over points to reveal the sample information for particular points (left panel) and to switch between dimensions using the bar plot in the right panel. The html page generated by \Rfunction{glMDSPlot} is intended to make it easier for end users look for relationship between samples in different dimensions. For further information on the arguments it accepts, see \Rfunction{?glMDSPlot}. By default, the data is saved in a directory named {\tt glimma-plots}, with the file {\tt MDS-Plot.html} opening the interactive display shown above. This function works with \Rfunction{DGEList} objects from \Biocpkg{edgeR} or a matrix of expression values and accepts labels and experimental group information so that samples of different types can be readily distinguished from one and other. <>= glMDSPlot(x, labels=1:7, groups=genotype, folder="Smchd1-Lymphoma", launch=FALSE) @ \begin{figure} \centerline{\includegraphics[width=0.75\linewidth]{glMDSPlot}} \caption{Screen shot of interactive MDS plot generated by the \Rfunction{glMDSPlot} function.} \label{fig:mds} \end{figure} \section{Interactive Mean-Difference Plots} A second extension extends \Rfunction{plotMD} from \Biocpkg{limma}. The \Rfunction{glMDPlot} function generates an html page with two panels and a search bar that allows individual genes to be looked up and highlighted on the summary mean-difference plot which shows average expression and log-fold-change in the first panel alongside the transformed counts in the second. Genes that pass a particular significance threshold can be highlighted on the mean-difference plot. An example from the Lymphoma data set \cite{Smchd1} is given below, with Figure~\ref{fig:md} showing a screenshot of the output. The plot on the right is the mean-difference plot with the same genes highlighted in Figure~\ref{fig:static}. The plot on the right shows the $\log_2$ transformed counts-per-million (CPM) by experimental group for the gene selected. The interactive nature of both plots allows users to hover over points to reveal the values and Gene IDs in the left panel and sample information in the right panel. The html page generated by \Rfunction{glMDPlot} is intended to make it easier for end users look up genes of interest in their differential expression results. For further information on the arguments it accepts, see \Rfunction{?glMDPlot}. By default, the data is saved in a directory named {\tt glimma-plots}, with the file {\tt MD-Plot.html} opening the interactive display shown above. This function works with output from \Biocpkg{limma}, \Biocpkg{edgeR} and \Biocpkg{DESeq2}. <>= glMDPlot(vfit, counts=x$counts, anno=x$genes, groups=genotype, samples=1:7, status=results[,2], display.columns=c("Symbols", "GeneID", "GeneName"), folder="Smchd1-Lymphoma", main="MD plot: Wild-type vs Smchd1", launch=FALSE) @ \begin{figure} \centerline{\includegraphics[width=\linewidth]{glMDPlot}} \caption{Screen shot of interactive mean-difference (MD) plot generated by the \Rfunction{glMDPlot}. The gene {\it Smchd1} has been searched for and highlighted in the left-most plot, and it's expression in sample 1 has been highlighted in the right panel.} \label{fig:md} \end{figure} A default version of \Rfunction{glMDPlot} accepts a \Robject{data.frame} of results (e.g. an unsorted top table from \Rpackage{limma} - note that the genes must be in the same order as they appear in the {\tt counts} and {\tt anno} objects). This is intended to allow results from any arbitrary differential expression analysis to be handled using this tool. The user can then specify the relevant columns to display in the summary plot. For example the code below makes a volcano plot by choosing the log-odds ({\tt B} versus the log-fold-change ({\tt logFC}) columns. <>= topt = topTable(vfit, coef=2, number=Inf, sort="none") glMDPlot(topt, xval="logFC", yval="B", counts=x$counts, anno=x$genes, groups=genotype, samples=1:7, status=results[,2], display.columns=c("Symbols", "GeneID", "GeneName"), folder="Smchd1-Lymphoma", html="volcano", launch=FALSE) @ \begin{figure} \centerline{\includegraphics[width=\linewidth]{glMDPlotCustom}} \caption{Screen shot of interactive volcano plot created using the \Rfunction{glMDPlot} by providing an unsorted {\tt data.frame} of results from a differential expression analysis ({\it limma-voom} in this case) and selecting the columns to plot via the {\tt xval} and {\tt yval} arguments.} \label{fig:volcano} \end{figure} \begin{thebibliography}{} \bibitem[1]{limma} Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies, {\it Nucleic Acids Research}, {\bf 43}(7):e47. \bibitem[2]{Degust} Powell DR. (2015) Degust: Visualize, explore and appreciate RNA-seq differential gene-expression data, \url{http://victorian-bioinformatics-consortium.github.io/degust/}. \bibitem[3]{edgeR} Robinson MD, McCarthy DJ, Smyth GK. (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data, {\it Bioinformatics}, {\bf 26}(1):139--40. \bibitem[4]{DESeq2} Love MI, Huber W, Anders S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2, {\it Genome Biology}, {\bf 15}(12):550. \bibitem[5]{Smchd1} Liu R, Chen K, Jansz N, Blewitt ME, Ritchie, ME (2016) Transcriptional profiling of the epigenetic regulator Smchd1, {\it Genomics Data}, {\bf 7}:144--7. \bibitem[6]{voom} Law CW, Chen Y, Shi W, Smyth GK (2014) Voom: precision weights unlock linear model analysis tools for RNA-seq read counts, {\it Genome Biology}, {\bf 15}:R29. \bibitem[7]{voomqwts} Liu R, Holik AZ, Su S, Jansz N, Chen K, Leong HS, Blewitt ME, Asselin-Labat ML, Smyth GK, Ritchie ME (2015) Why weight? Combining voom with estimates of sample quality improves power in RNA-seq analyses, {\it Nucleic Acids Research}, {\bf 43}(15):e97. \bibitem[8]{treat} McCarthy DJ, Smyth GK (2009) Testing significance relative to a fold-change threshold is a TREAT, {\it Bioinformatics}, {\bf 25}(6):765-71. \end{thebibliography} <>= sessionInfo() @ \end{document}