%\VignetteIndexEntry{General Manual} %\VignetteIndexEntry{Global visualization tool of genomic data} %\VignetteDepends{stats, utils, graphics, grDevices, datasets, base, biomaRt} %\VignetteKeywords{Global, Kariogram} %\VignettePackage{chromPlot} \documentclass[a4paper]{article} \usepackage[utf8]{inputenc} \usepackage{graphicx} \usepackage[english]{babel} \usepackage[utf8]{inputenc} \usepackage{hyperref} \parskip 3ex % espacio entre parrafos. \renewcommand{\baselinestretch}{1.5} \title{The chromPlot user's guide} \author{Karen Y. Oróstica and Ricardo A. Verdugo} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\R}[0]{{\textit{R}}} \newcommand{\inclfig}[3]{% \begin{figure}[htbp] \begin{center} \includegraphics[width=#2]{#1} \caption{\label{#1}#3} \end{center} \end{figure} } % margen izquierdo \usepackage{Sweave} \begin{document} \SweaveOpts{concordance=TRUE} \maketitle \tableofcontents %\tableofcontents \newpage \section{Introduction} Visualization is an important step in data analysis workflows for genomic data. Here, we introduce the use of \texttt{chromPlot}, an R package for global visualization of genome-wide data. \texttt{chromPlot} is suitable for any organism with linear chromosomes. Data is visualized along chromosomes in a variety of formats such as segments, histograms, points and lines. One plot may include multiple tracks of data, which can be placed inside or on either side of the chromosome body representation. The package has proven to be useful in a variety of applications, for instance, detecting chromosomal clustering of differentially expressed genes, combining diverse information such as genetic linkage to phenotypes and gene expression, quality controlling genome resequencing experiments, visualizing results from genome-wide scans for positive selection, synteny between two species, among others. \section{Creating a plot with genomic coordinates} The \texttt{gaps} argument is used to tell \texttt{chromPlot} what system of coordinates to use. The information is provided as a table following the format for the `Gap' track in the Table Browser of the UCSC website\footnote{https://genome.ucsc.edu/}. From this table, \texttt{chromPlot} extracts the number of chromosomes, chromosomes names and lengths, and the position of centromeres (shown as solid circles). The tables for the latest genome build of human and mouse are provided with package (\texttt{hg\_gap} and \texttt{mm10\_gap}) and are loaded by \texttt{data()}). The user can use tables downloaded from the UCSC Table Browser for other genomes. If no data is provided to \texttt{gaps}, plotting is still possible as long as one of \texttt{annot1}, \texttt{bands} or \texttt{org} arguments is provided. The information will be taken from those objects, in that preference order, except for centromers which will not be plotted. In this example, we will plot the chromosomes in the hg19 human genome. \texttt{chromPlot} returns some messages when doing calculations. Here, it just retrieves the number of bases in each chromosomes. Messages will be omitted in next examples. \clearpage <>= library("chromPlot") data(hg_gap) head(hg_gap) chromPlot(gaps=hg_gap) @ \section{Input data} \texttt{chromPlot} has 8 arguments that can take objects with genomic data: (\texttt{annot1}, \texttt{annot2}, \texttt{annot3}, \texttt{annot4}, \texttt{segment}, \texttt{segment2}, \texttt{stat} and \texttt{stat2}. Data provided to these arguments are internally converted to data tracks that can be plotted. These arguments take their input in any of these formats: \begin{enumerate} \item A string with a filename or URL \item A data frame \item A GRanges object (\texttt{GenomicRanges} package) \end{enumerate} Additionally, the user may obtain a list of all ensemble genes by providing and organism name to the \texttt{org} argument (ignored if data is provided to \texttt{annot1}). The data provided as objects of class data.frame must follow the BED format in order to be used as tracks by \texttt{chromPlot}\footnote{https://genome.ucsc.edu/FAQ/FAQformat.html\# format1}. However, as opposed to the files in BED format, track must have column names. The columns Chrom (character class), Start (integer class) and End (integer class) are mandatory. \texttt{chromPlot} can work with categorical or quantitative data. The categorical data must have a column called Group (character class), which represents the categorical variable to classify each genomic element. In the case of quantitative data, the user must indicate the column name with the score when calling \texttt{chromPlot()} by setting the \texttt{statCol} parameter.\\ Examples of different data tables will be shown throughout this tutorial. All data used in this vignette are included in \texttt{chromPlot} (inst/extdata folder). In order to keep the package size small, we have included only a few chromosomes in each file. We use mostly public data obtained from the UCSC Genome Browser\footnote{http://genome.ucsc.edu/} or from The 1000 Genomes Selection Browser 1.0\footnote{http://hsb.upf.edu/}, i.e. the iHS, Fst and xpehh tables shown below. In the following example code, an annotation package from Bioconductor to display the density of all transcripts in the genome. We load a TxDb object (inherit class from AnnotationDb) with all known gene transcripts in the hg19 human genome. We extract the transcripts for this gene definition and plot them genome-wide. The transcripts object(\texttt{txgr}) has \texttt{GRanges} class, from \emph{GenomicRanges} package. The The \emph{GenomeFeatures} package is required to extract the transcripts from the annotation object. % <>= library("TxDb.Hsapiens.UCSC.hg19.knownGene") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene library(GenomicFeatures) txgr <- transcripts(txdb) txgr @ \clearpage <>= chromPlot(gaps=hg_gap, annot1=txgr) @ \section{Types of data visualization} %Gráficos en el cuerpo del cromosoma \subsection{Chromosomes banding} \subsubsection{Plotting G banding} The \texttt{chromPlot} package can create idiograms by providing a `cytoBandIdeo' table taken from the Table Browser at the UCSC Genome Browser website. These tables are provided with the package for human and mouse (\texttt{hg\_cytoBandIdeo} and \texttt{mm10\_cytoBandIdeo}).\\ In the next code, we show how to obtain an idiogram with a subset of chromosomes for human: % <>= data(hg_cytoBandIdeo) head(hg_cytoBandIdeo) @ \clearpage You can choose chromosomes using \texttt{chr} parameter, which receives a vector with the name of the chromosomes. % <>= chromPlot(bands=hg_cytoBandIdeo, gaps=hg_gap, chr=c("1", "2", "3", "4", "5", "6"), figCols=6) @ \clearpage \subsubsection{Genomic elements} \texttt{chromplot} can plot the location of genomic elements in the chromosomal body. For this example, we will use a table of refSeq genes taken from the UCSC Genome Browser. The file included in the package contains only chromosomes 19 to 21 to keep the package's size small. % <>= data_file1 <- system.file("extdata", "hg19_refGeneChr19-21.txt", package = "chromPlot") refGeneHg <- read.table(data_file1, sep="\t", header=TRUE, stringsAsFactors=FALSE) refGeneHg$Colors <- "red" head(refGeneHg) @ \clearpage <>= chromPlot(gaps=hg_gap, bands=refGeneHg, chr=c(19, 20, 21), figCols=3) @ \clearpage \subsubsection{Assigning different colors} \label{sec:arbitrarycolors} It is possible to use different colors for each genomic element. However, you should keep in main that humans can only distinguish a limited number of colors in a plot. Therefore, for continuous variables, it is useful to create bins of data and assign colors to each bin. <>= data_file2 <- system.file("extdata", "Fst_CEU-YRI-W200Chr19-21.bed", package = "chromPlot") fst <- read.table(data_file2, sep="\t", stringsAsFactors=FALSE, header=TRUE) head(fst) fst$Colors <- ifelse(fst$win.FST >= 0 & fst$win.FST < 0.025, "gray66", ifelse(fst$win.FST >= 0.025 & fst$win.FST < 0.05, "grey55", ifelse(fst$win.FST >= 0.05 & fst$win.FST < 0.075, "grey35", ifelse(fst$win.FST >= 0.075 & fst$win.FST < 0.1, "black", ifelse(fst$win.FST >= 0.1 & fst$win.FST < 1, "red","red"))))) head(fst) @ \clearpage <>= chromPlot(gaps=hg_gap, chr=c(19, 20, 21), bands=fst, figCols=3) @ \clearpage \subsubsection{Grouping elements by category} \label{sec:groupband} If elements are assigned to categories in the Group column of the track, \texttt{chromplot} creates a legend. If the Colors column is available, it will use custom colors, otherwise it assigns arbitrary colors. <>= fst$Group <- ifelse(fst$win.FST >= 0 & fst$win.FST < 0.025, "Fst 0-0.025", ifelse(fst$win.FST >= 0.025 & fst$win.FST < 0.05, "Fst 0.025-0.05", ifelse(fst$win.FST >= 0.05 & fst$win.FST < 0.075, "Fst 0.05-0.075", ifelse(fst$win.FST >= 0.075 & fst$win.FST < 0.1, "Fst 0.075-0.1", ifelse(fst$win.FST >= 0.1 & fst$win.FST < 1, "Fst 0.1-1","na"))))) head(fst) @ \clearpage <>= chromPlot(gaps=hg_gap, chr=c(19, 20, 21), bands=fst, figCols=3) @ \clearpage \subsubsection{Synteny} This package is able of represent genomic regions that are conserved between two species. \texttt{chromplot} can work with AXT alignment files \footnote{https://genome.ucsc.edu/goldenPath/help/axt.html}. Each alignment block in an AXT file contains three lines: a summary line (alignment information) and 2 sequence lines: 0 chr19 3001012 3001075 chr11 70568380 70568443 - 3500\\ TCAGCTCATAAATCACCTCCTGCCACAAGCCTGGCCTGGTCCCAGGAGAGTGTCCAGGCTCAGA\\ TCTGTTCATAAACCACCTGCCATGACAAGCCTGGCCTGTTCCCAAGACAATGTCCAGGCTCAGA\\ 1 chr19 3008279 3008357 chr11 70573976 70574054 - 3900\\ CACAATCTTCACATTGAGATCCTGAGTTGCTGATCAGAATGGAAGGCTGAGCTAAGATGAGCGA\\ CACAGTCTTCACATTGAGGTACCAAGTTGTGGATCAGAATGGAAAGCTAGGCTATGATGAGGGA\\ Moreover, \texttt{chromplot} is able to work with BED format. In the next example, we show how to graph sinteny between human and mouse from BED file. % <>= data_file3 <- system.file("extdata", "sinteny_Hg-mm10Chr19-21.txt", package = "chromPlot") sinteny <- read.table(data_file3, sep="\t", stringsAsFactors=FALSE, header=TRUE) head(sinteny) @ \clearpage <>= chromPlot(gaps=hg_gap, bands=sinteny, chr=c(19:21), figCols=3) @ \clearpage \subsection{Histograms} \subsubsection{Single histogram} The user can generate a histogram for any of the following tracks: \texttt{annot1}, \texttt{annot2}, \texttt{annot3}, \texttt{annot4}, \texttt{segment}, and \texttt{segment2}. Histograms are created when the number of genomic elements in a track exceeds a maximum set by the \texttt{maxSegs} argument (200 by default) or the maximum size of the elements is < \texttt{bin} size (1 Mb by default). Histograms can be plotted on either side of each chromosome. The side can be set for each track independently (see section~\ref{sec:chrSide}). The following example represents all annotated genes in the human genome \footnote{https://genome.ucsc.edu/cgi-bin/hgTables}. You can also use BiomaRt package \footnote{http://bioconductor.org/packages/2.3/bioc/html/biomaRt.html} to get annotated information remotely.\\ <>= refGeneHg$Colors <- NULL head(refGeneHg) @ \clearpage <>= chromPlot(gaps=hg_gap, bands=hg_cytoBandIdeo, annot1=refGeneHg, chr=c(19:21), figCols=3) @ Using biomaRt package: \begin{Schunk} \begin{Sinput} > chromPlot(bands=hg_cytoBandIdeo, gaps=hg_gap, org="hsapiens") \end{Sinput} \end{Schunk} (Same figure as above).\\ \clearpage \subsubsection{Stacked histograms: multiple files} It is possible to superimpose multiple histograms. This feature can be useful to represent processed data, obtained after of several stages of filtering or selection. For example, in microarray experiments, different colors of each histogram bar can represent the total number of genes (red), genes represented on the array (yellow), differentially over-expressed genes (green) and differentially sub-expressed genes (blue) in that order. The \texttt{annot3} and \texttt{annot4} parameters receive filtered and selected subsets of data array respectively. Given that both \texttt{annot4} and \texttt{annot3} contain information that has been 'selected' and 'filtered', the resulting histogram is quite small compared to gene density (red histogram). % <>= data_file4 <-system.file("extdata", "mm10_refGeneChr2-11-17-19.txt", package= "chromPlot") ref_mm10 <-read.table(data_file4, sep="\t", stringsAsFactors=FALSE, header =TRUE) data_file5 <- system.file("extdata", "arrayChr17-19.txt", package = "chromPlot") array <- read.table(data_file5, sep="\t", header=TRUE, stringsAsFactors=FALSE) head(ref_mm10) head(array, 4) @ \clearpage Now, we will load the GenesDE object, and then we will obtain a subset of them, that it will contain over-expressed (nivel column equal to +) and sub-expressed (nivel column equal to -) genes. % <>= data(mm10_gap) data_file6 <- system.file("extdata", "GenesDEChr17-19.bed", package = "chromPlot") GenesDE <- read.table(data_file6, sep="\t", header=TRUE, stringsAsFactors=FALSE) head(GenesDE) DEpos <- subset(GenesDE, nivel%in%"+") DEneg <- subset(GenesDE, nivel%in%"-") head(DEpos, 4) head(DEneg, 4) @ \clearpage <>= chromPlot(gaps=mm10_gap, bands=mm10_cytoBandIdeo, annot1=ref_mm10, annot2=array, annot3=DEneg, annot4=DEpos, chr=c( "17", "18", "19"), figCols=3, chrSide=c(-1, -1, -1, 1, -1, 1, -1, 1), noHist=FALSE) @ \clearpage \subsubsection{Stacked histograms: single file} \texttt{chromplot} can also show stacked histograms from a data.frame with a `Group' column containing category for each genomic elements. The \texttt{segment} and \texttt{segment2} arguments can take this type of input. As an example, we will plot differentially expressed genes classified by monocytes subtypes (Classical-noClassical and intermediate) on the right side of the chromosome, and ta histogram of refSeq genes on the left side. <>= data_file7 <- system.file("extdata", "monocitosDEChr19-21.txt", package = "chromPlot") monocytes <- read.table(data_file7, sep="\t", header=TRUE, stringsAsFactors=FALSE) head(monocytes) @ \clearpage <>= chromPlot(gaps=hg_gap, bands=hg_cytoBandIdeo, annot1=refGeneHg, segment=monocytes, chrSide=c(-1,1,1,1,1,1,1,1), figCols=3, chr=c(19:21)) @ \clearpage \subsection{XY plots} The arguments \texttt{stat} and \texttt{stat2} can take tracks of genomic elements associated with numeric values. The user can choose between lines or points for representing each data point along chromosomes by using the \texttt{statTyp} parameter (p = point, l = line). The \texttt{statCol} parameter must contain the name of the column containing continuous values in \texttt{stat} (use \texttt{statCol2} for \texttt{stat2}). It is possible to apply a statistical function (mean, median, sum etc) to the data using \texttt{statSumm} parameters ('none' by default). If the value is 'none', chromPlot will not apply any statistical function. <>= head(fst) @ \clearpage \subsubsection{Using points} % <>= chromPlot(bands=hg_cytoBandIdeo, gaps=hg_gap, stat=fst, statCol="win.FST", statName="win.FST", statTyp="p", chr=c(19:21), figCols=3, scex=0.7, spty=20, statSumm="none") @ or calculating a mean of each value per bin by giving setting \texttt{statSum="mean"}. <>= chromPlot(bands=hg_cytoBandIdeo, gaps=hg_gap, stat=fst, statCol="win.FST", statName="win.FST", statTyp="p", chr=c(19:21), figCols=3, scex=0.7, spty=20, statSumm="mean") @ \subsubsection{Using connected lines} % <>= chromPlot( bands=hg_cytoBandIdeo, gaps=hg_gap, stat=fst, statCol="win.FST", statName="win.FST", statTyp="l", chr=c(19:21), figCols=3, statSumm="none") @ Here, we can smooth the graph by using a mean per bin: <>= chromPlot( bands=hg_cytoBandIdeo, gaps=hg_gap, stat=fst, statCol="win.FST", statName="win.FST", statTyp="l", chr=c(19:21), figCols=3, statSumm="mean") @ Note that the \texttt{statSumm} argument can receive any function name ("none" is the default). No sanity check is performed, and thus the user is responsible to make sure that using that function makes sense for the data at hand. \subsubsection{Coloring by datapoints exceeding a threshold} We will plot to two tracks of data with continuous values simultaneously using the (\texttt{stat} and \texttt{stat2} arguments. A third one will be shown on the chromosomal body after being categorized in arbitrary bins (see section~\ref{sec:groupband}). The values on both tracks of continuous data will be colored according to a threshold provided by the user in the \texttt{statThreshold} and \texttt{statThreshold2} parameters, which are applied for the \texttt{stat} and \texttt{stat2} tracks, respectively. <>= data_file8 <- system.file("extdata", "iHS_CEUChr19-21", package = "chromPlot") ihs <- read.table(data_file8, sep="\t", stringsAsFactors=FALSE, header=TRUE) head(ihs) data_file9 <-system.file("extdata", "XPEHH_CEU-YRIChr19-21", package="chromPlot") xpehh <-read.table(data_file9, sep="\t", stringsAsFactors=FALSE, header=TRUE) head(xpehh) @ We can label any data point by providing and an 'ID' column with labels. ID values of NA, NULL, or empty ("") are ignored. Here, we will only label single data point with the maximum XP value. <>= xpehh$ID <- "" xpehh[which.max(xpehh$XP),"ID"] <- xpehh[which.max(xpehh$XP),"Name"] head(xpehh) @ \clearpage <>= chromPlot(gaps=hg_gap, bands=fst, stat=ihs, stat2=xpehh, statCol="iHS", statCol2="XP", statName="iHS", statName2="normxpehh", colStat="red", colStat2="blue", statTyp="p", scex=2, spty=20, statThreshold=1.2, statThreshold2=1.5, chr=c(19:21), bin=1e6, figCols=3, cex=0.7, statSumm="none", legChrom=19, stack=FALSE) @ \clearpage \subsubsection{Plotting LOD curves} A potential use of connected lines is plotting the results from QTL mapping. Here we show a simple example of how to plot the LOD curves from a QTL mapping experiment in mice along a histogram of gene density. For demonstration purposes, we use a simple formula for converting cM to bp. A per-chromosome map or an appropriate online tool (\url{http://cgd.jax.org/mousemapconverter/}) should be used in real applications. <>= library(qtl) data(hyper) hyper <- calc.genoprob(hyper, step=1) hyper <- scanone(hyper) QTLs <- hyper colnames(QTLs) <- c("Chrom", "cM", "LOD") QTLs$Start <- 1732273 + QTLs$cM * 1895417 chromPlot(gaps=mm10_gap, bands=mm10_cytoBandIdeo, annot1=ref_mm10, stat=QTLs, statCol="LOD", chrSide=c(-1,1,1,1,1,1,1,1), statTyp="l", chr=c(2,17:18), figCols=3) @ \subsubsection{Plotting a map with IDs} In the previous section, we used an ID to highly one point from a track with continuous values. However, \texttt{chromPlot} can display many IDs, while trying to avoid overlapping of text labels. Points are ordered by position and the overlapping labels are moved downwards. This is useful for displaying maps, e.g. genetic of physical maps of genetic markers. For this the user must ensure that the table contains the ID column. The values in that column will be plotted as labels next to the data point. In the following example we show the IDs of of a small panel of 150 SNPs. We will use a different color for known (rs) and novel (non rs) SNPs. By setting statType="n" we avoid plotting the actual data point. <>= data_file10 <- system.file("extdata", "CLG_AIMs_150_chr_hg19_v2_SNP_rs_rn.csv", package = "chromPlot") AIMS <- read.csv(data_file10, sep=",") head(AIMS) @ \clearpage <>= chromPlot(gaps=hg_gap, bands=hg_cytoBandIdeo, stat=AIMS, statCol="Value", statName="Value", noHist=TRUE, figCols=4, cex=0.7, chr=c(1:8), statTyp="n", chrSide=c(1,1,1,1,1,1,-1,1)) @ \clearpage \subsection{Segments} \subsubsection{Large stacked segments} \texttt{chromplot} allows for the user represent large segments as vertical bars on either side of the chromosomal bodies. If the maximum segment size of segments is smaller than \texttt{bin} (1 Mb by default), or there are more segments than \texttt{maxSegs} (200 by default), they will be plotted as a histogram. However, the user can change this behavior by setting the \texttt{noHist} parameter to TRUE. If a 'Group' column is present in the table of segments, it is used as a category variable and different colors are used for segments in each category. The user can set the colors to be used in the \texttt{colSegments} and \texttt{colSegments2} arguments.\\ This type of graph is useful for displaying, for instance, QTLs (quantitative trait locus), due to the fact that they cover large genomic regions. Here we show how to graph segments on the side of the chromosomal body. By setting \texttt{stack=TRUE} (default), drawing space is saved by plotting all nonoverlapping segments at the minimum possible distance from the chromosome. Otherwise, they are plotted at increasing distance from the chromosome, regardless of whether they overlap or not. % <>= data_file12 <-system.file("extdata", "QTL.csv", package = "chromPlot") qtl <-read.table(data_file12, sep=",", header =TRUE, stringsAsFactors=FALSE) head(qtl) @ \clearpage <>= chromPlot(gaps=mm10_gap, segment=qtl, noHist=TRUE, annot1=ref_mm10, chrSide=c(-1,1,1,1,1,1,1,1), chr=c(2,11,17), stack=TRUE, figCol=3, bands=mm10_cytoBandIdeo) @ \clearpage \subsubsection{Large stacked segments groupped by two categories} When the segments have more than one category (up to two supported), they are differentiated by a combination of color and shape for a point plotted in the middle of the segment. The segment itself is shown in gray. The first category is taken from the 'Group' column and establishes the color of the symbol. The second category is taken from the 'Group2' column and determines the symbol shape.\\ In the following example, we use data for SNPs associated with phenotypes and ethnicity, taken from phenoGram website \footnote{http://visualization.ritchielab.psu.edu/phenograms/examples}. <>= data_file11 <- system.file("extdata", "phenogram-ancestry-sample.txt", package = "chromPlot") pheno_ancestry <- read.csv(data_file11, sep="\t", header=TRUE) head(pheno_ancestry) @ \clearpage <>= chromPlot(bands=hg_cytoBandIdeo, gaps=hg_gap, segment=pheno_ancestry, noHist=TRUE, chr=c(3:5), figCols=3, legChrom=5) @ \clearpage Since the data contain SNPs positions, the segments are only 1 bp long and the resulting lines are too small to be seen. For display purposes, we will increase the segments' sizes by adding a 500Kb pad to either side of each SNP. <>= pheno_ancestry$Start<-pheno_ancestry$Start-5e6 pheno_ancestry$End<-pheno_ancestry$End+5e6 head(pheno_ancestry) @ \clearpage <>= chromPlot(bands=hg_cytoBandIdeo, gaps=hg_gap, segment=pheno_ancestry, noHist=TRUE, chr=c(3:5), figCols=3, legChrom=5) @ \clearpage \subsubsection{Large non-overlapping segments} \texttt{chromplot} can categorize genomic regions (Group column) and then represent them with different colors. Also the package is capable of showing not-overlapping regions along the chromosome. The following example shows the ancestry of each chromosomal region. The user can obtain the annotation data updated through the biomaRt package. % <>= data_file13 <- system.file("extdata", "ancestry_humanChr19-21.txt", package = "chromPlot") ancestry <- read.table(data_file13, sep="\t",stringsAsFactors=FALSE, header=TRUE) head(ancestry) @ \clearpage <>= chromPlot(gaps=hg_gap, bands=hg_cytoBandIdeo, chrSide=c(-1,1,1,1,1,1,1,1), noHist=TRUE, annot1=refGeneHg, figCols=3, segment=ancestry, colAnnot1="blue", chr=c(19:21), legChrom=21) @ \clearpage \subsection{Multiple data types} The \texttt{chromPlot} package is able to plot diverse types of tracks simultaneously. % <>= chromPlot(stat=fst, statCol="win.FST", statName="win.FST", gaps=hg_gap, bands=hg_cytoBandIdeo, statTyp="l", noHist=TRUE, annot1=refGeneHg, chrSide=c(-1, 1, 1, 1, 1, 1, 1, 1), chr = c(19:21), figCols=3, cex=1) @ \clearpage Here we show a figure from in Verdugo et al. (2010), to represent the association between the genetic divergence regions (darkred regions in the body of the chromosomes), the QTLs (color bars on the right of the chromosome), and the absence of association with gene density shown (histogram on the left side of the chromosomes). % <>= options(stringsAsFactors = FALSE); data_file14<-system.file("extdata", "donor_regions.csv", package = "chromPlot") region<-read.csv(data_file14, sep=",") region$Colors <- "darkred" head(region) head(qtl) @ \clearpage <>= chromPlot(gaps=mm10_gap, segment=qtl, noHist=TRUE, annot1=ref_mm10, chrSide=c(-1,1,1,1,1,1,1,1), chr=c(2,11,17), stack=TRUE, figCol=3, bands=region, colAnnot1="blue") @ \clearpage \section{Graphics settings} \subsection{Choosing side} \label{sec:chrSide} The user can choose a chromosome side for any track of data, except if given to the \texttt{bands} argument, in which case it is plotten on the body of the chromosome. The \texttt{chrSide} parameter receives a vector with values 1 or -1 for each genomic tracks (\texttt{annot1}, \texttt{annot2}, \texttt{annot3}, \texttt{annot4}, \texttt{segment}, \texttt{segment2}, \texttt{stat} and \texttt{stat2} placing them to the right (if -1) or to the left (if 1) of the chromosomes. For demonstration, here we show the same track of data on two different sides. <>= chromPlot(gaps=mm10_gap, bands=mm10_cytoBandIdeo, annot1=ref_mm10, annot2=ref_mm10, chrSide=c(-1, 1, 1, 1, 1, 1, 1, 1), chr=c(17:19), figCols=3) @ \clearpage \subsection{Choosing colors} For each parameter that received a data.frame, the user can specify a color for plotting. If the data will be plotted as segments, the user can specify a vector of colors. The color will be assigned in the order provided to each level of a category (when a Group columns is present in the data table). The color parameters and their respective data tracks are as follows: \begin{enumerate} \item \texttt{colAnnot1: annot1} \item \texttt{colAnnot2: annot2} \item \texttt{colAnnot3: annot3} \item \texttt{colAnnot4: annot4} \item \texttt{colSegments: segment} \item \texttt{colSegments2: segment2} \item \texttt{colStat: stat} \item \texttt{colStat2: stat2} \end{enumerate} For data that are plotted individually, i.e. bands, segments, points in XY, or data labels, it is possible to set an arbitrary color for each element by providing a color name in a column called ``Colors'' in the data table. Setting a value in this way overrides any color provided in the above arguments for a given track. The user is responsible for providing color names that R understands. No check is done by chromPlot, but R will complain if a wrong name is used. For un example of this use, see section~\ref{sec:arbitrarycolors}. \subsection{Placement of legends} \texttt{chromPlot} places the legends under the smallest or second smallest chromosome, depending on the number of legends needed. The legend for the second category of a segments track is placed in the middle-right of the plotting area of the smallest chromosome. These choices were made because they worked in most cases that we tested. However, the placement of legends in R not easily automated to produce optimal results in all situations. Depending on the particular conditions of a plot such as data density, chromosomes chosen, font size and the size of the plotting device, the the legend by block viewing some data. When not pleased with the result of chromPlot's placing of legends, the user has two options: \begin{enumerate} \item setting the \texttt{legChrom} argument to an arbitrary chromosome name. The legend will be placed under that chromosome. If more than one legend is needed the first one will be placed under the chromosome before the chosen chromosome, unless only one chromosome is plotted. \item setting the \texttt{legChrom} to \texttt{NA} to omit plotting a legend. The user can use the \texttt{legend()} function to create a custom legend and can choose the best location by trial and error. \end{enumerate} \clearpage \section{Acknowledgments} This work was funded by a the FONDECYT Grant 11121666 by CONICYT, Government of Chile. We acknowledge all the users who have tested the software and provided valuable feedback. Particularly, Alejandro Blanco and Paloma Contreras. \section{REFERENCES} Verdugo, Ricardo A., Charles R. Farber, Craig H. Warden, and Juan F. Medrano. 2010. “Serious Limitations of the QTL/Microarray Approach for QTL Gene Discovery.” BMC Biology 8 (1): 96. doi:10.1186/1741-7007-8-96. \end{document}