% \VignetteIndexEntry{Trigger Tutorial} % \VignettePackage{trigger} \documentclass[11pt]{article} \usepackage{epsfig} \usepackage{latexsym} \usepackage{amsmath} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsxtra} \usepackage{graphicx,subfigure} \usepackage{vmargin} \usepackage{amsthm} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\texttt{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{ \bm }[1]{ \mbox{\bf {#1}}} \parindent 0in \setpapersize{USletter} \setmarginsrb{1truein}{0.5truein}{1truein}{0.5truein}{16pt}{30pt}{0pt}{20truept} \setlength{\emergencystretch}{2em} \usepackage{Sweave} \begin{document} \setkeys{Gin}{width=0.6\textwidth} \title{Bioconductor's Trigger package} \author{Lin Chen, Dipen Sangurdekar, and John D. Storey$^{\ddagger}$\\ $^{\ddagger}$Email: \texttt{jstorey@princeton.edu}} \maketitle \bibliographystyle{plain} \tableofcontents \section{Overview} The \Rpackage{trigger} package guides an integrative genomic analysis. Integrative genomic data usually consists of genomic information from various sources, which includes genetic information (genotype), high-dimensional intermediate traits in the genome (e.g., gene expression, protein abundance) and/or higher-order traits (phenotypes) for an organism. In the following examples, we mainly discuss intermediate traits of gene expression. It should be noted that this package can also be applied to protein abundance and/or other continuous trait expression.\\ The package contains functions to: (1) construct global linkage map between genetic marker and gene expression; (2) analyze multiple-locus linkage (epistasis) for gene expression; (3) quantify the proportion of genome-wide variation explained by each locus and identify eQTL linkage hotspots; (4) estimate pair-wise causal gene regulatory probability and construct gene regulatory networks; and (5) identify causal genes for a quantitative trait of interest. \\ This document provides a tutorial for using the \Rpackage{trigger} package. The package contains the following functions: \begin{itemize} \item \Rfunction{trigger.build}: Format the input data \item \Rfunction{trigger.link}: Genome-wide eQTL analysis \item \Rfunction{trigger.mlink}: Multi-locus linkage (epistasis) analysis \item \Rfunction{trigger.eigenR2}: Estimate the proportion of genome-wide variation explained by each eQTL \item \Rfunction{trigger.loclink} and \Rfunction{trigger.net}: Network-Trigger analysis \item \Rfunction{trigger.netPlot2ps}: Write the network from a trigger probability matrix to a postscript file \item \Rfunction{trigger.trait}: Trait-Trigger analysis \end{itemize} To view the help file for the function \Rfunction{trigger.link} within R, type \texttt{?trigger.link}. If you identify bugs related to basic usage please contact the authors directly. Otherwise, any questions or problems rergarding \Rpackage{snm} should be sent to the Bioconductor mailing list. Please do not send requests for general usage to the authors. \\ \section{A yeast data set}\label{yeast} The basic input data of this package consists of (1) a $m_{m}\times n$ marker genotype matrix with $m_{m}$ marker genotypes in rows and $n$ samples/arrays in columns; (2) a $m_{e}\times n$ gene expression matrix (or intermediate trait expression matrix) with $m_{e}$ genes in rows and$n$ samples/arrays in columns; (3) a $m_{m} \times 2$ marker position matrix, of which the first column is the chromosome name, and the second column is the position of each marker, with each row corresponding to one marker in the marker genotype matrix; and (4) a $m_{e} \times 3$ gene position matrix, of which the first column is the chromosome name, and the second/third column is the starting/ending coordinate of each gene, with each row corresponding to one gene in the expression matrix. Please code the names of autosomal chromosomes to be integers and the name of sex chromsome to be ``X". Also note that the same unit (e.g., base pair, kb, or cM) should be used for marker positions and gene positions. As an illustration of input data format and various analysis offered in this package, we demonstrate the functionality of this package using a data set from a yeast eQTL study \cite{Brem2005,Storey2005}. In the study, a genetic cross of \textit{Saccharomyces cerevisiase} BY4716 and RM11-1a strains was utilized to generated 112 F1 recombinant segregants. Each individual strain was then genotyped and gene expression measurements were done in a controlled growth environment. The data set consists of a list of four matrices: \begin{itemize} \item \Robject{marker}: A $3244 \times 112$ genotype matrix \item \Robject{exp}: A $6216 \times 112$ gene expression matrix \item \Robject{marker.pos}: A $3244 \times 2$ matrix of marker position information \item \Robject{exp.pos}: A $6216 \times 3$ matrix of gene position information. \end{itemize} This yeast data set is included in the package as the dataset \Robject{yeast}. To load the data, type \Rfunction{data(yeast)}, and to view a description of this data type \texttt{?yeast}. Once the data is loaded, one can type \Rfunction{attach(yeast)} to attach the yeast data. After the analysis is done, type \Rfunction{detach(yeast)} to detach the data set. We use a randomly generated subset of the data for the purpose of this vignette. <>= library(trigger) data(yeast) names(yeast) #reduce data size for vignette run time set.seed(123) #select subset of 400 traits gidx = sort(sample(1:6216, size = 400)) yeast$exp = yeast$exp[gidx,] yeast$exp.pos = yeast$exp.pos[gidx,] #select subset of markers midx = sort(sample(1:3244, size = 500)) yeast$marker = yeast$marker[midx,] yeast$marker.pos = yeast$marker.pos[midx,] attach(yeast) dim(exp) @ The function \Rfunction{trigger.build} formats the input data and returns a \Robject{S4 class} object for the convenience of subsequent analyses. It will convert the marker genotype matrix to a matrix of integers starting from 1 (a matrix of 1 or 2 for haploid genotypes, or 1, 2, or 3 for diploid genotypes).\\ <>= trig.obj <- trigger.build(marker=marker, exp=exp, marker.pos=marker.pos, exp.pos=exp.pos) trig.obj detach(yeast) @ \section{Genome-wide eQTL analysis}\label{eqtl} The function \Rfunction{trigger.link} computes pair-wise likelihood ratio statistic for linkage of each gene-marker pair in the genome. If there are markers on sex chromosome, \Rfunarg{gender} of each sample should be specified and the gender-specific mean will be computed for each genotype to obtain a likelihood ratio statistic. When the option \Rfunarg{norm} is \Rfunarg{TRUE}, each gene (row) of expression data matrix will be normalized to standard $N(0, 1)$ based on the rank of the expression values for each gene. Since the null likelihood ratio statistic follow a chi-square distribution, parametric p-values will be computed based on the observed statistics. The function updates \Robject{trig.obj} with a matrix \Robject{stat} of likelihood ratio statistics and a matrix of p-values \Robject{pvalue} corresponding to gene-marker pairs in the genome, with genes in rows and markers in columns.\\ The function \Rfunction{plot} with the argument \Rfunarg{type = "link"} takes the matrix of p-values for linkage of each gene-marker pair, calls the measures below a \Rfunarg{cutoff} to be significant and plots the significant gene-marker pairs in a genome-wide eQTL linkage map. Genes and markers are ordered according to their genome positions. Applying the functions to the yeast data set, in Figure \ref{fig:linkplot} we plot the genome-wide linkage map of eQTL and gene expression at p-value cutoff $3.3 \times 10^{-4}$, which corresponds to $5\%$ FDR\cite{Storey2003}. Note that the function \Rfunction{plot} thresholds the significance measures \Robject{pvalue} below cutoff. If one would like to threshold the significance measures above a threshold, one can apply the function to the negative matrix of significance measures and choose a negative cutoff. <>= trig.obj = trigger.link(trig.obj, norm = TRUE) @ <