% -*- mode: noweb; noweb-default-code-mode: R-mode; -*- %\VignetteIndexEntry{Alternative CDF environments for 2(or more)-genomes chips} %\VignetteKeywords{Preprocessing, Affymetrix} %\VignetteDepends{altcdfenvs} %\VignettePackage{altcdfenvs} %documentclass[12pt, a4paper]{article} \documentclass[12pt]{article} \usepackage{amsmath} \usepackage{hyperref} \usepackage[authoryear,round]{natbib} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \author{Laurent Gautier} \title{Alternative CDF environments for 2(or more)-genomes chips} \begin{document} \maketitle \section*{Introduction} Let's start by loading the package: <<>>= library(altcdfenvs) @ The {\it Plasmodium} / {\it Anopheles} is taken as an example: <<>>= library(plasmodiumanophelescdf) @ One will adapt easily the code below for other chips. \section*{How to build a CdfEnvAffy object from the cdfenv package} The first step is to wrap the naked enviroment in the package \Rpackage{plasmodiumanophelescdf} in an object: <<>>= planocdf <- wrapCdfEnvAffy(plasmodiumanophelescdf, 712, 712, "plasmodiumanophelescdf") print(planocdf) @ The numbers $712$ and $712$ correspond to the dimension of the array. If you do not know these numbers for your chip, the easiest (for the moment) is to read CEL data in an \Rclass{AffyBatch} and call the function \Rfunction{print} on this object. \section*{How to create a CdfEnvAffy that is a subset of the 2-genomes one} If the identifiers starting with `Pf' correspond to plasmodium, it is an easy job to find them: <<>>= ids <- geneNames(planocdf) ids.pf <- ids[grep("^Pf", ids)] @ Subsetting the \Rclass{CdfEnvAffy} is also an easy task: <<>>= ## subset the object to only keep probe sets of interest plcdf <- planocdf[ids.pf] print(plcdf) @ However, this is not that simple:{\bf the environment created does not contain all the probe set ids from Plasmodium}. Unfortunately, one cannot rely on pattern matching on the probe set id to find all the probe set ids associated with Plasmodium. The list of plasmodium ids included in the package can let us build a Plasmodium-only CdfEnvAffy (contributed by Zhining Wang). <<>>= filename <- system.file("exampleData", "Plasmodium-Probeset-IDs.txt", package="altcdfenvs") ids.pf <- scan(file = filename, what = "") plcdf <- planocdf[ids.pf] print(plcdf) @ Before we eventually save our environment, we may want to give it an explicit name: <<>>= plcdf@envName <- "Plasmodium ids only" print(plcdf) @ \section*{Assign the new Cdf data to an AffyBatch} Handling of AffyCdfEnv directly in within an AffyBatch, or AffyBatch-like, structure is being completed\ldots in the meanwhile, the current mecanism for cdfenvs has to be used. If your CEL files were read into an AffyBatch named \Robject{abatch}. \begin{Scode} envplcdf <- as(plcdf, "environment") abatch@cdfName <- "plcdf" \end{Scode} From now on, \Robject{abatch} will only consider Cdf information from \Robject{plcdf}. If you want to save this further use, I would recommend to do: \begin{Scode} save(abatch, plcdf, envplcdf, file="where/to/save.rda") \end{Scode} \end{document}