%\VignetteEngine{knitr} %\VignetteIndexEntry{MSnbase development} %\VignetteKeywords{Mass Spectrometry, Proteomics, Infrastructure } %\VignettePackage{MSnbase-development} \documentclass[12pt,a4paper,english]{scrartcl} \usepackage{amsmath,amsfonts,amssymb} \usepackage{tikz} \usepackage{hyperref} \usepackage[authoryear,round]{natbib} \usepackage[auth-sc]{authblk} \usepackage{setspace} \onehalfspacing \newcommand{\R}{\texttt{R} } \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\mbox{\normalfont\textsf{#1}}}} \newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} %% colors \definecolor{Red}{rgb}{0.7,0,0} \definecolor{Blue}{rgb}{0,0,0.8} \usepackage{geometry} \geometry{verbose, tmargin = 2.5cm, bmargin = 2.5cm, lmargin = 3.5cm, rmargin = 3.5cm} \usepackage{hyperref} \usepackage{breakurl} \hypersetup{% pdfusetitle, bookmarks = {true}, bookmarksnumbered = {true}, bookmarksopen = {true}, bookmarksopenlevel = 2, unicode = {true}, breaklinks = {false}, hyperindex = {true}, colorlinks = {true}, linktocpage = {true}, plainpages = {false}, linkcolor = {Blue}, citecolor = {Blue}, urlcolor = {Red}, pdfstartview = {Fit}, pdfpagemode = {UseOutlines}, pdfview = {XYZ null null null} } \author{ Laurent Gatto\thanks{\email{lg390@cam.ac.uk}} } \affil{ Computational Proteomics Unit\\ University of Cambridge, UK } \begin{document} \title{\Rpackage{MSnbase} development} \maketitle %% Abstract and keywords %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \vskip 0.3in minus 0.1in \hrule \begin{abstract} This vignette describes the classes implemented in \Rpackage{MSnbase} package. It is intended as a starting point for developers or users who would like to learn more or further develop/extend \Robject{pSet}. \end{abstract} \textit{Keywords}: Mass Spectrometry (MS), proteomics, infrastructure. \vskip 0.1in minus 0.05in \hrule \vskip 0.2in minus 0.1in %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% <<'setup', include = FALSE, cache = FALSE>>= library("knitr") opts_chunk$set(fig.align = 'center', fig.show = 'hold', par = TRUE, prompt = FALSE, comment = NA) options(replace.assign = TRUE, width = 65) @ \tableofcontents <>= suppressPackageStartupMessages(library(MSnbase)) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% Section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \input{Foreword.tex} \input{Bugs.tex} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction} This document is not a replacement for the individual manual pages, that document the slots of the \Rpackage{MSnbase} classes. It is a centralised high-level description of the package design. \Rpackage{MSnbase} aims at being compatible with the \Rpackage{Biobase} infrastructure \cite{Gentleman2004}. Many meta data structures that are used in \Robject{eSet} and associated classes are also used here. As such, knowledge of the \textit{Biobase development and the new eSet} vignette\footnote{%% The vignette can directly be accessed with \texttt{vignette("BiobaseDevelopment",package="Biobase")} once \Rpackage{Biobase} is loaded.} would be beneficial. The initial goal is to use the \Rpackage{MSnbase} infrastructure for labelled quantitation using reporter ions (iTRAQ \citep{Ross2004} and TMT \citep{Thompson2003}). Spectral counting should be trivial to apply with current features, as long as identification data is at hand. Currently, no effort is invested to streamline label-free quantitative proteomics, although some effort has been done to keep the infrastructure flexible enough to accommodate more designs. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{\Rpackage{MSnbase} classes} All classes have a \Robject{.\_\_classVersion\_\_} slot, of class \Robject{Versioned} from the \Robject{Biobase} package. This slot documents the class version for any instance to be used for debugging and object update purposes. Any change in a class implementation should trigger a version change. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{\Robject{pSet}: a virtual class for raw mass spectrometry data and meta data} \label{sec:pset} This virtual class is the main container for mass spectrometry data, i.e spectra, and meta data. It is based on the \Robject{eSet} implementation for genomic data. The main difference with \Robject{eSet} is that the \Robject{assayData} slot is an environment containing any number of \Robject{Spectrum} instances (see section \ref{sec:spectrum}). One new slot is introduced, namely \Robject{processingData}, that contains one \Robject{MSnProcess} instance (see section \ref{sec:msnprocess}). and the \Robject{experimentData} slot is now expected to contain \Robject{MIAPE} data (see section \ref{sec:miape}). The \Robject{annotation} slot has not been implemented, as no prior feature annotation is known in shotgun proteomics. <>= getClass("pSet") @ \paragraph{Future work} Currently, few setters have been implemented. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{\Robject{MSnExp}: a class for MS experiments} \label{sec:msnexp} \Robject{MSnExp} extends \Robject{pSet} to store MS experiments. It does not add any new slots to \Robject{pSet}. Accessors and setters are all inherited from \Robject{pSet} and new ones should be implemented for \Robject{pSet}. Methods that manipulate actual data in experiments are implemented for \Robject{MSnExp} objects. <>= getClass("MSnExp") @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{\Robject{MSnSet}: a class for quantitative proteomics data} This class stores quantitation data and meta data after running \Robject{quantify} on an \Robject{MSnExp} object. The quantitative data is in form of a $n \times m$ matrix, where $m$ is the number of features/spectra originally in the \Robject{MSnExp} used as parameter in \Robject{quantify} and $m$ is the number of reporter ions (see section \ref{sec:reporterions}). This prompted to keep a similar implementation as the \Robject{ExpressionSet} class, while adding the proteomics-specific annotation slot introduced in the \Robject{pSet} class, namely \Robject{processingData} for objects of class \Robject{MSnProcess} (see section \ref{sec:msnprocess}). The \Robject{MSnSet} class extends the virtual \Robject{eSet} class to provide compatibility for \Robject{ExpressionSet}-like behaviour. The experiment meta-data in \Robject{experimentData} is also of class \Robject{MIAPE} (see section \ref{sec:miape}). The \Robject{annotation} slot, inherited from \Robject{eSet} is not used. <>= getClass("MSnSet") @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{\Robject{MSnProcess}: a class for logging processing meta data} \label{sec:msnprocess} This class aims at recording specific manipulations applied to \Robject{MSnExp} or \Robject{MSnSet} instances. The \Robject{processing} slot is a \Robject{character} vector that describes major processing. Most other slots are of class \Robject{logical} that indicate whether the data has been centroided, smoothed, \ldots although many of the functionality is not implemented yet. Any new processing that is implemented should be documented and logged here. It also documents the raw data file from which the data originates (\Robject{files} slot) and the \Rpackage{MSnbase} version that was in use when the \Robject{MSnProcess} instance, and hence the \Robject{MSnExp}/\Robject{MSnSet} objects, were originally created. <>= getClass("MSnProcess") @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{\Robject{MIAPE}: Minimum Information About a Proteomics Experiment} \label{sec:miape} The Minimum Information About a Proteomics Experiment \citep{Taylor2007, Taylor2008} \Robject{MIAPE} class describes the experiment, including contact details, information about the mass spectrometer and control and analysis software. %% Raw data is currently imported from \texttt{mzXML} files \citep{Pedrioli2004} < %% using the \Rfunction{xcms:::rampRawData} %% and \Rfunction{xcms:::rampRawDataMSn} functions from the %% \Rpackage{xcms} package \citep{Smith2006}. %% These functions do not give access to the meta data. New importer functions are %% under development (see for instance \Rpackage{mzR}\footnote{%% %% \url{https://github.com/sneumann/mzR/blob/master/DESCRIPTION}}) that will %% hopefully give programmatic access to meta data stored in the data file to %% populate the \Robject{MIAPE} object. <>= getClass("MIAPE") @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{\Robject{Spectrum} \textit{et al.}: classes for MS spectra} \label{sec:spectrum} \Robject{Spectrum} is a virtual class that defines common attributes to all types of spectra. MS1 and MS2 specific attributes are defined in the \Robject{Spectrum1} and \Robject{Spectrum2} classes, that directly extend \Robject{Spectrum}. %% The choices of attributes has been dictated by the \Rfunction{xcms:::rampRawData} %% and \Rfunction{xcms:::rampRawDataMSn} functions and what data from the %% \texttt{mzXML} file they gave access to. %% It is expected that some hopefully minor changes might %% come up here when migrating to other data import packages, that allow random access %% to \texttt{mzXML} data and support \texttt{mzML} \citep{Martens2010}. <>= getClass("Spectrum") @ <>= getClass("Spectrum1") @ <>= getClass("Spectrum2") @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{\Robject{ReporterIons}: a class for isobaric tags} \label{sec:reporterions} The iTRAQ and TMT (or any other peak of interest) are implemented \Robject{ReporterIons} instances, that essentially defines an expected MZ position for the peak and a width around this value as well a names for the reporters. <>= getClass("ReporterIons") @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{\Robject{NAnnotatedDataFrame}: multiplexed \Robject{AnnotatedDataFrame}s} \label{sec:nannotateddataframe} The simple expansion of the \Robject{AnnotatedDataFrame} classes adds the \Robject{multiplex} and \Robject{multiLabel} slots to document the number and names of multiplexed samples. <>= getClass("NAnnotatedDataFrame") @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Miscellaneous} \paragraph{Unit tests} \Rpackage{MSnbase} implements unit tests with the \Rpackage{testthat} package. \paragraph{Processing methods} Methods that process raw data, i.e. spectra should be implemented for \Robject{Spectrum} objects first and then \texttt{eapply}'ed (or similar) to the \Robject{assayData} slot of an \Robject{MSnExp} instance in the specific method. %% \paragraph{Why this MIAME slot?} \label{misc:whymiame} While it would have been %% possible to transfer all data stored in \Rpackage{Biobase}'s \Robject{MIAME} %% to a new \Robject{MIAPE} class and use the latter for \Robject{experimentData} slots in %% the \Robject{pSet} class, it would not have been possible to directly transfer this %% to \Robject{MSnSet} instances, as \Robject{MSnSet} classes directly inherit %% from the \Robject{ExpressionSet}, whose \Robject{experimentData} slot must be of class %% \Robject{MIAME}. %% \input{NoteAboutSpeedAndMemory.tex} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Session information} \label{sec:sessionInfo} <>= toLatex(sessionInfo()) @ \bibliographystyle{plainnat} \bibliography{MSnbase} \end{document}