%\VignetteDepends{AffyCompatible} %\VignetteIndexEntry{Annotation retrieval with NetAffxResource} %\VignetteKeywords{tutorial, AffyCompatible, NetAffx, NetAffxResource} % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \documentclass{article} \usepackage{hyperref} \newcommand{\R}{{\textsf{R}}} \newcommand{\code}[1]{{\texttt{#1}}} \newcommand{\term}[1]{{\emph{#1}}} \newcommand{\Rpackage}[1]{\textsf{#1}} \newcommand{\Rfunction}[1]{\texttt{#1}} \newcommand{\Robject}[1]{\texttt{#1}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\textit{#1}}} \newcommand{\Rfunarg}[1]{{\textit{#1}}} \newcommand{\Affy}{Affymetrix} \newcommand{\Bioc}{Bioconductor} \title{Annotations with NetAffx} \author{Martin Morgan, Robert Gentleman} \date{Created: 19 February 2008} \begin{document} \maketitle \Affy{} provides annotations for all arrays they produce. The annotations are made available in \Bioc{} with the \Rclass{NetAffxResource} class in the \Rpackage{AffyCompatible} package; additional packages complement \Affy{} annotation information with data collected from additional public repositories. This document outlines a simple workflow to retrieve annotations available through NetAffx. <>= library(AffyCompatible) @ %% To use these facilities, one must be a registered \Affy{} user; see the \Affy{} \href{https://www.affymetrix.com/site/login/login.affx}{user registration} site for details. The first step is to create an instance of the \Rclass{NetAffxResource} class. Do this using the \Rfunction{NetAffxResource} function. Important arguments are \Rfunarg{user} and \Rfunarg{password} length 1 character vectors containing the registered user name and password. The password is printed, saved, and transmitted in clear text, and so is \textbf{not} secure. An additional argument is \Rfunarg{directory}, which is the location where the NetAffx data base and downloaded files are stored. \Rfunarg{directory} defaults to a session-specific temporary directory, meaning that if it is not supplied the data base and any downloaded annotations are removed when the R session ends. To create the \Rclass{NetAffxResource} instance, evaluate a command like <>= password <- AffyCompatible:::acpassword @ <>= rsrc <- NetAffxResource(user="mtmorgan@fhcrc.org", password=password) rsrc @ %% This creates the resource, but does not validate the user name and password (the user name and password are verified when the NetAffx resource is first retrieved from \Affy{}, typically the first time the code in the following paragraph is evaluated). A typical workflow involves querying \Robject{rsrc} for the names of available arrays, and the descriptions of annotations available for an array of interest: <>= head(names(rsrc)) affxDescription(rsrc[["Bovine"]]) @ %% Annotations usually include a comma-separated value (CSV) file that can be represented in R as a \Robject{data.frame}. The data frame usually includes a probe identifier column, and columns of additional information \Affy{} has collated from a variety of sources, as described on the NetAffx site. Additional annotation files usually include a (much larger physically, but containing comparable information) MAGE-ML representation of the CSV file, channel description files (CDF), other files describing probes preesent on chips, probe sequences in FASTA format, and possibly other files specific to the chip platform. An R representation of the annotations of a particular array can be created with <>= annos <- rsrc[["Porcine"]] annos @ %% A particular annotation can be selected from this using R commands to navigate the implied class structure: <>= sapply(affxAnnotation(annos), force)[1:5] anno <- affxAnnotation(annos)[[3]] anno @ %% (The Porcine BLASTP Annotation file is chosen because it is small). The annotation file may also be obtained by subsetting the reseource with a second argument corresponding to the annotation description or index <>= anno <- rsrc[["Porcine", "Annotations, CSV format"]] anno <- rsrc[["Porcine", 3]] @ Annotation files can be retrieved with <>= df <- readAnnotation(rsrc, annotation=anno) @ %% This checks to see if the relevant annotation file is in the directory specified in the \Robject{rsrc} object. If the annotation file is not present, it is retrieved from the \Affy{} site. The argument \Rfunarg{update=TRUE} forces retrieval. \Rfunction{readAnnotation} will read files with known type (e.g., CSV) into appropriate R objects (e.g., data frames), and return these from \Rfunction{readAnnotation}. Some file types (e.g., CDF) are not meant for representation as R objects, and for these \Rfunction{readAnnotation} returns the (local) path to the relevant file. For all annotations, the argument \Rfunarg{content=FALSE} returns the local file path, without loading the content of the file into R. \Affy{} does not specify the format of all files, so some files might reasonably be read into R but the \Rfunction{readAnnotation} code is not able to identify the appropriate format. The user is free to explore these annotation files using standard R commands, e.g., <>= anno <- rsrc[["Porcine", "PSI Library File"]] fl <- readAnnotation(rsrc, annotation=anno, content=FALSE) fl ## a zip file, containing 'Porcine.psi' conn <- unz(fl, "Porcine.psi") readLines(conn, n=6) read.table(conn, header=FALSE, skip=1, sep="\t", nrows=5) @ \end{document}