% -*- mode: noweb; noweb-default-code-mode: R-mode; -*- %\VignetteIndexEntry{splicegear Introduction} %\VignetteDepends{splicegear, Biobase} %\VignetteKeywords{Expression Analysis} %\VignettePackage{splicegear} \documentclass[12pt]{article} \usepackage{amsmath} \usepackage[authoryear,round]{natbib} \usepackage{hyperref} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \bibliographystyle{plainnat} \title{Splicegear package} \begin{document} \maketitle \section*{Introducing splicegear} Microarrays have become an established technique for the analysis of gene expression. With the advances in numerical processing of the data, the lowering of the costs and well defined experimental protocols, the reliability of data analysis has increased. The possiblity to study alternative splicing using microarrays has appeared very recently in experimental publications. The word is now 'this is possible', and it has been reported for spotted oligonucleotides arrays, for printed arrays and for arrays made by the {\it Affymetrix} company. However little has been done to quantify how well the technique performs, how the existing oligonucleotide data could have been influenced by alternative phenomenon and how it could contribute to discover novel splice variants. This package defines classes to handle probe expression values in an alternative splicing context. The class \Robject{SpliceExprSet} combines information about putative splice sites on a sequence with probes matching the sequence and corresponding probe intensities. It it constituted of three attributes, \Robject{eset}, \Robject{probes} and \Robject{spliceSites}, the first being of class \Robject{ExpressionSet} the second of class \Robject{Probes} and the third of class \Robject{SpliceSites}. The idea behind the clear separation between splice site positions, probes and probe intensities is the dynamic nature of their relationship. While the sequence for the probe is fixed, the target sequences can vary slightly and the position or presence of splice sites as well (especially when working with putative splice sites). The view chosen is appropriate for linking alternative splicing information with microarray data, but is admitedly not very standard on the alternative splicing side. Many representations of alternative splicing show exons as boxes and draw broken lines between segments to show the possible splice variants. The model chosen is not completely incompatible with this view. We present how to use this representation within the package in one of the last sections below. The package is loaded by a simple call to \begin{verbatim} library(splicegear) \end{verbatim} <>= library(splicegear) @ \section*{Plotting methods} Plotting methods are defined for the classes \Robject{SpliceSites} and \Robject{SpliceExprSet}. Exact matching of the reference sequences used by the database of putative splice sites against the probes of {\it Affymetrix} {\tt U95A} chips were performed. The expression values from the {\it GeneLogic} `Dilution' dataset (RNA extracted from cells from the central nervous system and from the liver) were observed. <<>= data(spsites) print(spsites) plot(spsites) @ \section*{Import data} The package has facilities to parse XML structures in a defined XML format (see \url{http://palsdb.ym.edu.tw/index2.html}) %(see Appendix~\ref{appendix:dtd}). Motivations were discussed in length in a once submitted manuscript. To summarize, we hoped to initiate efficient data exchange for alternative splicing, in the spirit of the WDDX and SOAP formats. The DTD we introduce is more a call for discussion by interested parties. It is not fully compliant with WDDX nor SOAP, althought it may come in the future. The XML can be stored in a file. In {\bf R}, one can make an object of class \Rclass{xml} very easily: <<>>= library(XML) filename <- system.file("extdata", "example.xml", package="splicegear") xml <- xmlTreeParse(filename, asTree=TRUE) @ Further details concerning XML handling can be found in the documentation for the package XML. The XML structure can be converted to a list of \Rclass{SpliceSites}: <<>>= spsites <- buildSpliceSites(xml, verbose=FALSE) length(spsites) show(spsites[1:2]) @ The package is currently able to connect to the `database of putative alternative splicing' {\it PALSdb}. The typical way to obtain data from the web is composed of two steps. \begin{enumerate} \item query a web site and obtain XML in return \item build {\bf R} objects from the XML \end{enumerate} \begin{verbatim} xml <- queryPALSdb("alcohol") spsites <- buildSpliceSites(xml, verbose=FALSE) \end{verbatim} \section*{The class \Rclass{SpliceSites}} The class \Rclass{SpliceSites} is a little complex. One should refer to the relevant help file for details. We only introduce here a detailed example of can be performed. <<>>= ## build SpliceSites library(XML) filename <- system.file("extdata", "example.xml", package="splicegear") xml <- xmlTreeParse(filename, asTree=TRUE) spsites <- buildSpliceSites(xml, verbose=FALSE) ## subset the second object in the list my.spsites <- spsites[[2]] @ <>= plot(my.spsites) @ As shown, for most of the putative splice site several ESTs are supporting evidences. One might want to see the tissue distribution of the matches \section*{Data in a \Rclass{data.frame}} The \Rclass{data.frame} has a very important role in the S language. A large number of function are designed around this data structure. We provide a way to link the class \Robject{SpliceExprSet} with this data structure. The function casting to \Rclass{data.frame} is named \Rfunction{as.data.frame.SpliceExprSet}. Using the S3 dispatch system, a call to \Rfunction{as.data.frame} with a first argument of class \Rclass{SpliceExprSet} should be enough. <<>>= data(spliceset) dataf <- as.data.frame(spliceset) colnames(dataf) @ %<>= <<>>= lm.panel <- function(x, y, ...) { points(x,y,...) p.lm <- lm(y~x); abline(p.lm) } ## to plot probe intensity values conditioned by the position of the probes on ## the mRNA: ## (commented out to avoid a warning) ##coplot(log(exprs) ~ Material | begin, data=dataf, panel=lm.panel) @ Further explanations about formulas and models in {\bf S-plus} and {\bf R} can be found easily elsewhere. \section*{Genomic view} A popular representation of splice variants shows exons as boxes, linked by broken lines to show which exons are skipped and which ones are not for the splice variants~\ref{fig:genomic.AS}. In this context, type II and type III splice variants are not relevant: each exon is only likely to be skipped. The package features an experimental class that extends the class \Rclass{SpliceExprSet} and gives compatibility with this representation. %\begin{figure}[htbp] %\begin{center} %\includegraphics[width=\textwidth]{HASDB.spliceview} %\includegraphics[width=\textwidth]{transcript.geneview.Hs.4291} %\caption{\label{fig:genomic.AS}More common representation of alternative splicing.} %\end{center} %\end{figure} <<>>= ## a 10 bp window seq.length <- as.integer(10) ## positions of the exons spsiteIpos <- matrix(c(1, 3.5, 5, 9, 3, 4, 8, 10), nc=2) ## known variants variants <- list(a=c(1,2,3,4), b=c(1,2,3), c=c(1,3,4)) ## n.exons <- nrow(spsiteIpos) spvar <- new("SpliceSitesGenomic", spsiteIpos=spsiteIpos, variants=variants, seq.length=seq.length) @ A plotting method (not unlike the display that can be found on the TrEmbl website) is provided. <>= par(mfrow = c(3,1), mar = c(3.1, 2.1, 2.1, 1.1)) plot(spvar, split=TRUE, col.exon=rainbow(n.exons)) @ \section*{Combining alternative splicing information with probes intensities} The class \Rclass{Probes} stores information relative to probes matching a reference sequence, primarily their position on the a reference sequence. Additional data concerning the probes can be stored in the slot {\it info}. The class \Rclass{SpliceExprSet} is mainly an aggregation of an instance of class \Rclass{SpliceSites}, an instance of \Rclass{Probes} and an instance of class \Rclass{ExpressionSet}. The typical procedure is to build \Rclass{SpliceSites} and their correspondings \Rclass{Probes} for a type of chip\footnote{See the functions \Rfunction{queryPALSdb}, \Rfunction{buildSpliceSites} and the Bioconductor package \Rpackage{matchprobes}.}. Experimental hybridizations for the probes are stored in \Rclass{ExpressionSet} objects. This design facilitates the use of the package in different contexts. For example, once the mapping of the probes has been performed and the data relative to alternative splicing have been acquired, the analyst can re-use these and combine them with data from different hybridization experiments. It also becomes possible to distribute mappings and splice variants information for certain types of chips. This was performed for {\it Affymetrix} chips and data packages providing the information will be distributed shortly. %\appendix %\label{appendix:dtd} \end{document}