%\VignetteIndexEntry{HowTo BGX} %\VignetteDepends{Biobase, affy, gcrma} %\VignetteKeywords{Microarray, DifferentialExpression} %\VignettePackage{gcrma} \documentclass[12pt, a4paper]{article} \usepackage{hyperref} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\textit{#1}}} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \author{Ernest Turro\\Imperial College London} \begin{document} \title{HowTo BGX} \date{April 25, 2007} \maketitle \section{Introduction} This vignette describes how to use \Rpackage{bgx}, a C++ implementation of a Bayesian hierarchical integrated approach to the modelling and analysis of Affymetrix GeneChip arrays. The model and methodology is described in Hein et al, 2005. There are two ways to run \Rpackage{bgx}: (1) through R and (2) as a standalone binary. Both ways make use of probe level GeneChip data, which you must obtain as GeneChip CEL files. \section{Reading in the CEL files} When you load \Rpackage{bgx}, several required packages from the Bioconductor\footnote{\url{http://bioconductor.org}} project are automatically loaded. <<>>= library(bgx) @ The \Rpackage{affy} package allows you to read CEL files into an \Robject{AffyBatch} object. This can be achieved by changing your working directory to wherever the CEL files are stored and executing: \begin{Sinput} > aData <- ReadAffy() \end{Sinput} This will read in the CEL files in alphabetical order and save the data in the \Robject{aData} object. Alternatively, you can specify the specific files you would like to read in by adding their paths to the argument list, for example: \begin{Sinput} > aData <- ReadAffy("CEL/choe/chipC-rep1.CEL", "CEL/choe/chipS-rep2.CEL") \end{Sinput} \section{Running BGX through R} A basic execution of the program can be performed by simply passing an \Robject{AffyBatch} object as a single parameter to the \Robject{bgx} function and saving the result in an \Robject{ExpressionSet} object. The result will hold array-specific gene expression values and their corresponding standard errors in \Robject{assayData(eset)\$exprs} and \Robject{assayData(eset)\$se.exprs} respectively. \begin{Sinput} > eset <- bgx(aData) \end{Sinput} A more elaborate scenario would involve splitting the arrays into a number of conditions using the \Rfunarg{samplesets} argument\footnote{Note that if your \Robject{AffyBatch} object contains information on the experimental design in the \Robject{phenoData} slot, you do not need to use the \Rfunarg{samplesets} argument.}; specifying which genes to analyse with the \Rfunarg{genes} argument; specifying whether to take into account probe affinity with \Rfunarg{probeAff}; setting the number of burn-in and post burn-in runs with the \Rfunarg{burnin} and \Rfunarg{iter} arguments respectively; setting the set of parameters to save with the \Rfunarg{output} argument\footnote{\Rfunarg{output} can be set to either "minimal", "trace" or "all". See the documentation for an explanation of what these levels mean}; and specifying where to save the runs with \Rfunarg{rundir}. Execute \Rfunction{help(bgx)} in R for a full explanation of all the parameters. As an example, let us analyse the \Robject{Dilution} data set and save the results in the current working directory ("."): <>= library(affydata) library(hgu95av2cdf) data(Dilution) eset <- bgx(Dilution, samplesets=c(2,2), probeAff=FALSE, burnin=2048, iter=8192,genes=c(12500:12599), output="all") @ The \Robject{eset} object will contain gene expression information for each gene under each condition (not necessarily each array). You may obtain the gene expression measure using the \Rfunction{exprs} function. For instance: <<>>= exprs(eset)[10:40,] # Shorthand for assayData(eset)\$exprs[10:40,] @ Run \Rfunction{help(ExpressionSet)} in R for more information. Note that \Rfunarg{samplesets} should be set to an array specifying the number of replicates in each condition. If set to \Rfunction{(3,2)}, \Robject{bgx} will treat the first three arrays read into R as replicates under condition 1 and the next two as replicates under condition 2. You should make sure that all condition 1 files are read in first and all condition 2 files are read in second by \Rfunction{ReadAffy()}. You may check the order of the samples in your \Robject{AffyBatch} object by using the \Rfunction{sampleNames} function: <<>>= sampleNames(Dilution) @ \section{Running BGX as a standalone binary} Occasionally it may be useful to run \Robject{bgx} as a standalone binary from the command line\footnote{You can compile it by tweaking 'src/Makefile.standalone' to your specifications and running `make -f Makefile.standalone` from the 'src' directory.}. In this case, you should use the \Robject{standalone.bgx} function instead of the \Robject{bgx} function. It takes the same arguments as \Robject{bgx}, with the addition of \Rfunarg{dirname}, which should specify where you would like to save the input files required by the standalone binary. \begin{Sinput} aData <- ReadAffy() # Read in 6 arrays across two conditions # in alphabetical order standalone.bgx(aData, samplesets=c(3,3), genes=c(1:650,1000:1200), burnin=16384, iter=65536, output="minimal", dirname="input-choe3replicates") \end{Sinput} Once you have saved the input files, you should locate the binary, make sure it is executable\footnote{Under Unix-like environments, you can type \texttt{chmod +x bgx} at the command prompt to do this.}, and pass the path to the newly created \texttt{infile.txt} file as a single argument. For example: \begin{verbatim} ./bgx ../input-choe3replicates/infile.txt \end{verbatim} \section{Detailed analysis of the output} If you wish to analyse the output in detail, you should first read the output into a list as follows: <>= bgxOutput <- readOutput.bgx("run.1") @ You may then pass the \Robject{bgxOutput} object to any of several analysis functions. For instance, to view the gene expression distributions under the various conditions for gene 10, you could do: <>= plotExpressionDensity(bgxOutput, gene=10) @ In order to get a list of ranked differential expression values, you could do: <>= rankedGeneList <- rankByDE(bgxOutput) print(rankedGeneList[1:25,]) # print top 25 DEG @ Run \Rfunction{help(analysis.bgx)} for more detailed usage instructions on the analysis functions. <>= unlink("run.1", recursive=TRUE) @ \end{document}