%\VignetteIndexEntry{Estimate eQTL networks using qpgraph} %\VignetteKeywords{graphical Markov model, qp-graph, expression, genotyping, network, eQTL} %\VignettePackage{eQTLnetwork} \documentclass{article} <>= BiocStyle::latex(use.unsrturl=FALSE) @ \usepackage{natbib} \bioctitle[Estimate eQTL networks using \Biocpkg{qpgraph}]% {Estimate eQTL networks using {\tt qpgraph}} \author{Inma Tur$^{1,3}$, Alberto Roverato$^2$ and Robert Castelo$^1$} \begin{document} \SweaveOpts{eps=FALSE} \DefineVerbatimEnvironment{Sinput}{Verbatim} {formatcom = {\color{Sinput}}} \DefineVerbatimEnvironment{Soutput}{Verbatim} {formatcom = {\color{Soutput}}} \DefineVerbatimEnvironment{Scode}{Verbatim} {formatcom = {\color{Scode}}} \definecolor{Sinput}{rgb}{0.21,0.49,0.72} \definecolor{Soutput}{rgb}{0.32,0.32,0.32} \definecolor{Scode}{rgb}{0.75,0.19,0.19} <>= pdf.options(useDingbats=FALSE) options(width=80) rm(list=ls()) try(detach("package:mvtnorm"), silent=TRUE) try(detach("package:qtl"), silent=TRUE) @ \maketitle \begin{quote} {\scriptsize 1. Universitat Pompeu Fabra, Barcelona, Spain. \\ 2. Universit\`a di Bologna, Bologna, Italy. \\ 3. Now at Kernel Analytics, Barcelona, Spain. } \end{quote} \section{Introduction} In this vignette we introduce the functionality of the \Biocpkg{qpgraph} package to estimate eQTL networks from genetical genomics data. To meet the space and time constraints in building this vignette within the \Biocpkg{qpgraph} package, we are going to simulate genetical genomics data instead of using a real data set. For this purpose, we will use the functionality described in another vignette from this package, entitled ``Simulating molecular regulatory networks using qpgraph''. If you use the approach and functions described in this vignette in your own research, please cite the following article: \begin{quote} Tur, I., Roverato, A. and Castelo, R. Mapping eQTL networks with mixed graphical Markov models. \textit{Genetics}, 198(4):1377-1393, 2014. \end{quote} \section{Simulating an eQTL network and data from it} We are going to simulate an eQTL network in the following steps: \begin{enumerate} \item Load the necessary packages. <<>>= library(GenomeInfoDb) library(qtl) library(qpgraph) @ \item Simulate a genetic map using the R/CRAN package \CRANpkg{qtl}, consisting of nine chromosomes, being 100 cM long with 10 markers equally spaced along each of them, no telomeric markers and no X sexual chromosome. <<>>= map <- sim.map(len=rep(100, times=9), n.mar=rep(10, times=9), anchor.tel=FALSE, eq.spacing=TRUE, include.x=FALSE) @ \item Create a first empty eQTL network as an empty \Rclass{eQTLcross} object using the previously simulated genetic map. \item Simulate an eQTL network consisting of 50 genes, where half of them have one \textit{cis}-acting (local) eQTL, there are 5 eQTL \textit{trans}-acting (distant) on 5 genes each and each gene is connected to 2 other genes (default). Each eQTL has an additive effect of $a=2$ and each gene-gene association has a marginal correlation $\rho=0.5$. We seed the random number generator to enable reproducing the same eQTL network employed in this vignette. A dot plot of the simualted eQTL associations is displayed in Figure~\ref{fig:simeqtlnet}. <<>>= set.seed(12345) sim.eqtl <- reQTLcross(eQTLcrossParam(map=map, genes=50, cis=0.5, trans=rep(5, 5)), a=2, rho=0.5) @ <>= plot(sim.eqtl, main="Simulated eQTL network") @ \item Simulate genotyping and expression data for 100 individuals from this eQTL network. We seed again the random number generator to enable random sampling the same data. <<>>= set.seed(12345) cross <- sim.cross(map, sim.eqtl, n.ind=100) cross @ \end{enumerate} \begin{figure} \centerline{\includegraphics[width=0.7\textwidth]{eQTLnetworks-simeqtlnet}} \caption{Dot plot of eQTL associations in a simulated eQTL network.} \label{fig:simeqtlnet} \end{figure} \section{Estimating an eQTL network from genetical genomics data} Here we briefly illustrate how to estimate an eQTL network from genetical genomics data stored as a R/CRAN \CRANpkg{qtl} \Rclass{cross} object. This object is the one we have simulated before. To use this functionality we need to provide an annotation for the genes we have in our data. This is retrieved from the simulated eQTL network object. <<>>= annot <- data.frame(chr=as.character(sim.eqtl$genes[, "chr"]), start=sim.eqtl$genes[, "location"], end=sim.eqtl$genes[, "location"], strand=rep("+", nrow(sim.eqtl$genes)), row.names=rownames(sim.eqtl$genes), stringsAsFactors=FALSE) @ For later visualization purposes, we also need a physical map, which we calculate assuming a constant Kb/cM rate of 5. We scale the gene annotations and chromosome lengths also using this Kb/cM rate. We create a \Rclass{Seqinfo} object storing the chromosome lengths of this simulated genome. <<>>= pMap <- lapply(map, function(x) x * 5) class(pMap) <- "map" annot$start <- floor(annot$start * 5) annot$end <- floor(annot$end * 5) genome <- Seqinfo(seqnames=names(map), seqlengths=rep(100 * 5, nchr(pMap)), NA, "simulatedGenome") @ The entire estimation procedure can be performed in the following steps. \begin{enumerate} \item Create a paramter object of class \Rclass{eQTLnetworkEstimationParam}. <<>>= param <- eQTLnetworkEstimationParam(cross, physicalMap=pMap, geneAnnotation=annot, genome=genome) @ \medskip \item Calculate all marginal associations between markers and genes. <<>>= eqtlnet.q0 <- eQTLnetworkEstimate(param, ~ marker + gene, verbose=FALSE) eqtlnet.q0 @ \medskip \item Obtain a first estimate of the eQTL network by selecting associations at FDR $< 0.05$. <<>>= eqtlnet.q0.fdr <- eQTLnetworkEstimate(param, estimate=eqtlnet.q0, p.value=0.05, method="fdr") eqtlnet.q0.fdr @ \medskip Display a comparison of the dot plot of the simulated eQTL associations with the ones estimated by marginal associations at FDR $< 0.05$. The result is shown in Figure~\ref{fig:simeqtlnetvsfdr}. <>= par(mfrow=c(1, 2)) plot(sim.eqtl, main="Simulated eQTL network") plot(eqtlnet.q0.fdr, main="Esiimated eQTL network") @ \begin{figure} \centerline{\includegraphics[width=0.9\textwidth]{eQTLnetworks-simeqtlnetvsfdr}} \caption{Dot plots of eQTL associations in a simulated eQTL network (left) and in an estimated eQTL network (right) selecting marginal associations at FDR $< 5\%$.} \label{fig:simeqtlnetvsfdr} \end{figure} \medskip \item Calculate non-rejection rate values with $q=3$ between markers and genes. <<>>= eqtlnet.q0.fdr.nrr <- eQTLnetworkEstimate(param, ~ marker + gene | gene(q=3), estimate=eqtlnet.q0.fdr, verbose=FALSE) eqtlnet.q0.fdr.nrr @ \medskip \item Obtain a second estimate of the eQTL network by selecting associations at FDR $< 0.05$ and with non-rejection rate value $\epsilon < 0.1$. <<>>= eqtlnet.q0.fdr.nrr <- eQTLnetworkEstimate(param, estimate=eqtlnet.q0.fdr.nrr, epsilon=0.1) eqtlnet.q0.fdr.nrr @ \medskip Display a comparison of the dot plot of the simulated eQTL associations with the ones estimated by marginal associations at FDR $< 0.05$ and non-rejection rates meeting a cutoff $\epsilon < 0.1$. The result is shown in Figure~\ref{fig:simeqtlnetvsnrr}. <>= par(mfrow=c(1, 2)) plot(sim.eqtl, main="Simulated eQTL network") plot(eqtlnet.q0.fdr.nrr, main="Esiimated eQTL network") @ \begin{figure} \centerline{\includegraphics[width=0.9\textwidth]{eQTLnetworks-simeqtlnetvsnrr}} \caption{Dot plots of eQTL associations in a simulated eQTL network (left) and in an estimated eQTL network (right) selecting marginal associations at FDR $< 5\%$ and non-rejection rate meeting a cutoff $\epsilon < 0.1$.} \label{fig:simeqtlnetvsnrr} \end{figure} \medskip Examine the median number of eQTLs per gene. <<>>= eqtls <- alleQTL(eqtlnet.q0.fdr.nrr) median(sapply(split(eqtls$QTL, eqtls$gene), length)) @ \item Note that while we have simulated at most one eQTL per gene, we have currently estimated a median of 6 eQTLs per gene. This leads to the horizontal patterns in the dot plot where multiple markers in the same chromosome target the same gene and are the result of independently mapping eQTLs that are tagging the same causal one. To remove these redundant eQTL associations we perform a forward selection procedure at a nominal significance level $\alpha < 0.05$, as follows: <<>>= eqtlnet.q0.fdr.nrr.sel <- eQTLnetworkEstimate(param, estimate=eqtlnet.q0.fdr.nrr, alpha=0.05) eqtlnet.q0.fdr.nrr.sel @ \end{enumerate} In Figure~\ref{fig:simeqtlnetvsnrrsel} we can see a comparison between the dot plots of the simulated eQTL network and the final estimate obtained by first selecting marginal associations at FDR $< 0.05$, discarding those that did not meet a NRR cutoff $\epsilon < 0.1$ and further performing a forward selection procedure at a significance level $\alpha < 0.05$ among eQTLs within the same chromosomes targeting a common gene. Observe that in this final eQTL network estimate many of the redundant eQTL associations have been effectively discarded. <>= par(mfrow=c(1, 2)) plot(sim.eqtl, main="Simulated eQTL network") plot(eqtlnet.q0.fdr.nrr.sel, main="Esiimated eQTL network") @ \begin{figure} \centerline{\includegraphics[width=0.9\textwidth]{eQTLnetworks-simeqtlnetvsnrrsel}} \caption{Dot plots of eQTL associations in a simulated eQTL network (left) and in an estimated eQTL network (right) selecting marginal associations at FDR $< 5\%$ and non-rejection rate with $\epsilon < 0.1$.} \label{fig:simeqtlnetvsnrrsel} \end{figure} Finally, the \Biocpkg{qpgraph} package provides functionality to ease the visualization of the eQTL network, going beyond the dot plot to display not only eQTL associations, but also the gene-gene associations where one of the two genes has at least one eQTL. This functionality is based on the concept of hive plot \citep{Krzywinski:2012} and has been adapted from the code provided by the \CRANpkg{HiveR} package \citep{Hanson:2014} to display eQTL networks. It uses the \CRANpkg{grid} package for plotting purposes and the code below illusrates how to produce the hive plots in Figure~\ref{fig:hiveplot}, which shows a hive plot per chromosome of the final estimated eQTL network. The fact that in hive plots vertex (node) positions are fixed eases the task of comparing them. In our context, this facilitates the comparison of the genetic control of gene expression across chromosomes. <>= library(grid) library(graph) grid.newpage() pushViewport(viewport(layout=grid.layout(3, 3))) for (i in 1:3) { for (j in 1:3) { chr <- (i-1) * 3 + j pushViewport(viewport(layout.pos.col=j, layout.pos.row=i)) plot(eqtlnet.q0.fdr.nrr.sel, type="hive", chr=chr) grid.text(paste0("chr", as.roman(chr)), x=unit(0.05, "npc"), y=unit(0.9, "npc"), just="left") grid.text("genes", x=unit(0.08, "npc"), y=unit(0.1, "npc"), just="left", gp=gpar(cex=0.9)) grid.text("all chr", x=unit(0.92, "npc"), y=unit(0.2, "npc"), just="right", gp=gpar(cex=0.9)) grid.text("genes", x=unit(0.92, "npc"), y=unit(0.1, "npc"), just="right", gp=gpar(cex=0.9)) grid.text("markers", x=unit(0.5, "npc"), y=unit(0.95, "npc"), just="centre", gp=gpar(cex=0.9)) popViewport(2) } } @ \begin{figure} \centerline{\includegraphics[width=0.7\textwidth]{eQTLnetworks-hiveplot}} \caption{Hive plots of an eQTL network estimated from simulated data, involving only connected components with at least one eQTL association. For each chromosome, the hive plot shows three axes, where markers and genes are ordered from the center according to their genomic location. Vertical and left axes represent the chromosome in the corresponding plot, while the right axis represents the entire genome alternating black and gray along consecutive chromosomes. Edges between genes axes correspond to gene-gene associations.} \label{fig:hiveplot} \end{figure} \section{Session information} <>= toLatex(sessionInfo()) @ \bibliography{qpgraph} \bibliographystyle{apalike} \end{document}