%\VignetteIndexEntry{Risa: converts experimental metadata from ISA-tab into Bioconductor data structures} %\VignetteDepends{Biobase} %\VignetteKeywords{ImportData} %\VignettePackage{Risa} \documentclass[a4paper]{article} \usepackage{times} \usepackage{a4wide} \usepackage{verbatim} \newcommand{\Robject}[1]{\texttt{#1}} \newcommand{\Rpackage}[1]{\textit{#1}} \newcommand{\Rclass}[1]{\textit{#1}} \newcommand{\Rfunction}[1]{{\small\texttt{#1}}} \clearpage \SweaveOpts{keep.source=TRUE,eps=FALSE,include=FALSE,width=4,height=4.5} \begin{document} \SweaveOpts{concordance=TRUE} \title{Risa: Building R objects from local ISA-Tab files} \author{Alejandra Gonzalez-Beltran and Steffen Neumann and Audrey Kauffmann\\ and Gabriella Rustici and Philippe Rocca-Serra and Eamonn Maguire \\ and Susanna-Asunta Sansone\\ isatools@googlegroups.com } \maketitle \section{Introduction} The Risa package is part of the ISA infrastructure software suite (http://isa-tools.org). It provides funcitonality to read ISA-Tab datasets, described in the following section. The source code and latest version can be found in the GitHub repository https://github.com/ISA-tools/Risa. Please, submit all 'bugs' and feature requests through https://github.com/ISA-tools/Risa/issues. \section{ISA-Tab format} The Investigation / Study / Assay (ISA) Tab-delimited (Tab) format is a general purpose framework with which to collect and communicate complex metadata (i.e. sample characteristics, technologies used, type of measurements made) from experiments employing a combination of technologies (http://isa-tools.org). In particular, ISA-Tab has been developed for - but not limited to - experiments using genomics, transcriptomics, proteomics or metabol/nomics techniques (the 'omics'). ISA-Tab uses three types of file to capture the experimental metadata: \begin{itemize} \item \emph{Investigation file} \item \emph{Study file} \item \emph{Assay file} (with associated data files). \end{itemize} The Investigation file contains an overall description of an experiment while all experimental steps are described in the Study and in the Assay file(s). For each Investigation file there may be one or more Study files; for each Study file there may be one or more Assay files. \subsection{Investigation file} In this file, information is reported on a per-column basis and the fields are organized and divided in sections. The Investigation file is intended to meet three needs: \begin{itemize} \item to define key entities, such as factors, protocols, parameters, which may be referenced in the other files; \item to relate Assay files to Study files; and optionally, \item to relate each Study file to an Investigation (when two or more Study files need to be grouped). The declarative sections cover general information such as contacts, protocols and equipment, and also - where applicable - the description of terminologies (controlled vocabularies or ontologies) and other annotation resources that were used. \end{itemize} \subsection{Study file} In this file, information is structured on a per-row basis with the first row being used for column headers. The Study file contains contextualizing information for one or more assays, for example; the subjects studied; their source(s); the sampling methodology; their characteristics; and any treatments or manipulations performed to prepare the specimens. \subsection{Assay file} In this file, as for the Study file, fields are organized on a per-row basis with the first row being used for column headers. The Assay file represents a portion of the experimental graph (i.e., one part of the overall structure of the workflow); each Assay file must contain assays of the same type, defined by the type of measurement (i.e. gene expression) and the technology employed (i.e. DNA microarray). Assay-related information includes protocols, additional information relating to the execution of those protocols and references to data files (whether raw or processed). For easy transfer, ISA-Tab files and associated data files can be packaged into an ISArchive, using a standalone Java application named ISAcreator (http://isatab.sourceforge.net). In order to facilitate identification of ISA-Tab components in an ISArchive, specific extensions have been created as follows: \begin{itemize} \item \emph{i\_iname.txt} for identifying the Investigation file \item \emph{s\_sname.txt} for identifying Study file (s) \item \emph{a\_aname.txt} for identifying Assay file (s) \end{itemize} where 'iname', 'sname', 'aname' are the user-given names for the investigation, study/ies, assay(s), respectively. \section{The Risa package} The Risa package is used to build R objects from an ISA archive or dataset. The output is a list of objects containing, for example, the investigation, studies and assays filenames, the contents of their files, the list of samples, among other things. These objects can then be used by downstream Bioconductor packages for data analysis and visualization (i.e, xcms). The package currently includes the function processAssayXcmsSet that, for a specific mass spectrometry assay, builds an xcmsSet object. \subsection{Building an R object from a local ISA dataset} If you have your own ISA archive, you can use the function \Rfunction{readISAtab} to convert it into an R object. The arguments for the function \Rfunction{readISAtab} are: \begin{itemize} \item{path}{ the name of the directory containing ISAtab files. The default is the working directory.} \item{verbose}{ a boolean indicating to show messages for the different steps, if TRUE, or not to show them, if FALSE} \end{itemize} As an example, we can use the \emph{faahKO} dataset, whose version 1.2.11 contains an ISA dataset describing the experiment. First, it is required to load the \emph{Risa} package, and the \emph{faahKO} package must have been installed. <>= library(Risa) require(faahKO) @ Then, we read the ISA-Tab data set from the \emph{faahKO} package: <>= faahkoISA <- readISAtab(find.package("faahKO")) @ The object \Robject{faahkoISA} belongs to the \Rclass{ISAtab} class, and contains the following elements: \begin{itemize} \item path - the path of the ISA-Tab dataset, \item investigation.filename - the name of the Investigation file \item investigation.file - a data frame with the contents of the Investigation file \item study.identifiers - the list of study identifiers \item study.filenames - the names of the study files \item study.files - a list of data frames wiht the contents of the study files \item assay.filenames - the names of the assay files \item assay.filenames.per.study - the names of the assay files according to the study they belong to \item assay.files - a list of data frames with the contents of the assay files \item assay.files.per.study - a list of data frames with the contents of the assay files divided per study they belong to \item assay.technology.types - a list with the technology types corresponding to each assay \item assay.measurement.types - a list with the measurement types corresponding to each assay \item data.filenames - a list with the names of the data files \item samples - a list with the names of the samples \item samples.per.assay.filename - the samples classified according to the assay filename they belong to \item assay.filenames.per.sample - the names of the assay files classified per sample name \item sample.to.rawdatafile - the association between samples and raw data files \item sample.to.assayname - the association between samples and assay names \item rawdatafile.to.sample - the association between raw data files and samples \item assayname.to.sample - the association between assay names and samples \end{itemize} Additionally, the ISA dataset could be compressed in a .zip file. If that is the case, the function \Rfunction{readISAtab} can be used, passing the \Robject{zipfile} as parameter. The only condition is that the ISA-Tab files are contained directly into the zip file, i.e. not inside additional folders. In this case, the parameters for the function \Rfunction{readISAtab} will be: \begin{itemize} \item{zipfile}{ a zip archive containing ISAtab files.} \item{path}{ the name of the directory in which the files from the zip archive will be extracted. The default is the working directory.} \item{verbose}{ a boolean indicating to show messages for the different steps, if TRUE, or not to show them, if FALSE} \end{itemize} \section*{Building xcmsSets for mass spectrometry assays} The function \Rfunction{processAssayXcmsSet} allows to build an xcmsSet (object defined in the xcms package) from the information in an assay file. The parameters for this function are: \begin{itemize} \item{isa}: an ISA object, as retrieved by the function \Rfunction{readISAtab} \item{assay.filename} the name of the assay file with information about the relevant assay \item{...} extra arguments that can be passed down to the xcmsSet function from the xcms package \end{itemize} Using the \emph{faahKO} package as an example, we select the name of assay file, and use the \Rfunction{processAssayXcmsSet} to build a object of type \Rclass{xcmsSet}: <>= assay.filename <- faahkoISA["assay.filenames"][1] faahkoXset <- processAssayXcmsSet(faahkoISA, assay.filename) @ \section*{Augmenting the ISA-Tab dataset after analysis} The Risa package also provides the functionality to augment the original ISA-Tab dataset with more information after analysis. The function \Rfunction{updateAssayMetadata} allows to modify the metadata in a particular assay file. The arguments are: \begin{itemize} \item{isa}{ An isatab object, as retrieved by the \Rfunction{readISAtab} function.} \item{assay.filename}{ the filename of the assay file to be augmented/modified} \item{col.name}{ the name of the column of the assay file to be modified} \item{values}{ the values to be added to the column of the assay file: it could be a single value, and in this case the value is repeated across the column, or it could be a list of values (whose length must match the number of rows of the assay file) } \end{itemize} To continue with our example using the \emph{faahKO} data package, we will assume that the results of analysis are stored in the file \emph{faahkoDSDF.txt}. Then, we will update the ISA-Tab dataset adding the result file into the 'Derived Spectral Data File' column of the assay file. <>= updateAssayMetadata(faahkoISA, assay.filename,"Derived Spectral Data File","faahkoDSDF.txt" ) @ For an example for a real use case, please refer to https://github.com/sneumann/mtbls2/. \section*{Writing ISA-Tab datasets} The Risa package offers functions to write the whole ISA-Tab dataset or part of it back to disk. These functions are \Rfunction{write.ISAtab}, \Rfunction{write.investigation.file}, \Rfunction{write.study.file}, \Rfunction{write.assay.file}. So, after updating the assay file as indicated above, we can save it back to disk, using the following command: <>= temp = tempdir() write.ISAtab(faahkoISA, temp) #write.assay.file(faahkoISA, assay.filename, temp) @ \section*{Session Info} % <>= toLatex(sessionInfo()) @ % \section*{Further information} For further information about the ISA software infrastructure, please visit our website http://isa-tools.org. \end{document}