\documentclass[a4paper, 12pt]{article} \author{\href{mailto:m.robinson@garvan.org.au}{Mark Robinson} \href{mailto:a.statham@garvan.org.au}{Aaron Statham} \href{mailto:d.strbenac@garvan.org.au}{Dario Strbenac}} \usepackage[pdftex]{hyperref} \usepackage{amsmath} \usepackage{amssymb} \usepackage{amscd} %\usepackage{attachfile} \usepackage{graphicx} \usepackage[tableposition=top]{caption} \usepackage{ifthen} \usepackage[utf8]{inputenc} \topmargin -.5in \headheight 0in \headsep 0in \oddsidemargin -.5in \evensidemargin -.5in \textwidth 176mm \textheight 245mm \usepackage{color} \usepackage{Sweave} \begin{document} \SweaveOpts{engine=R} %\VignetteIndexEntry{Using Repitools for Epigenomic Sequencing Data} \title{Integrative Analysis of Epigenomic sequencing (and microarray) data with \texttt{Repitools}} \date{} \maketitle \begin{center} Last compiled on: \today \end{center} \section{Introduction} \texttt{Repitools} is a package that allows exploratory as well as targeted statistical analysis of absolute and differential binding for ChIP-seq and MeDIP-seq data types, and gives visual summaries in a variety of formats. Some basic quality checking utilities are available for sequencing data. Much of the functionality available is implemented for both tiling microarrays and sequencing data, with very similar function calls for both types of data. \\ In this vignette, we highlight various features within the package. Further description of the package can be found in the associated Bioinformatics Applications Note\footnote{\href{http://bioinformatics.oxfordjournals.org/content/26/13/1662.abstract}{Repitools: an R package for the analysis of enrichment-based epigenomic data}} as well as in the help documents. \\ To start with, set a random seed, and load the \texttt{Repitools} package: <>= options(prompt = " ", continue = " ") set.seed(4) library(Repitools) @ \section{Example Datasets} \input{datasets} \section{Quality Checking} \input{qc} \section{Analyses} \input{analyses1} \subsection{Domains of Concordant Change} Another analysis of interest is the detection of {\em regions} where changes in expression (or an epigenetic mark, etc.) occur on a particular chromosome. The function \texttt{findClusters} addresses this need. The method of determining clusters requires a search through the column of scores (e.g. t-statistics) for a persistent change. Significance of clusters is determined by randomization. The order of the statistics is permuted a large number of times and the number of clusters found in the true statistics column and the permuted statistics columns is counted, ranging from a loose cutoff to a tight cutoff. A cutoff is chosen to control the user-specified FDR. Importantly, the table must be pre-sorted in positional order. This allows the user to use whatever definition of position they want. Note that the distance between features is not taken into account in this implementation. <>= stats.table <- data.frame(chr = "chr1", pos = 1:500, t.stat = rnorm(500)) stats.table[100:104, "t.stat"] <- rnorm(5, 5) stats.table[200:204, "t.stat"] <- rnorm(5, -5) stats.clustered <- findClusters(stats.table, score.col = 3, w.size = 5, n.med = 2, n.consec = 3, cut.samps = seq(-2, -10, -2), maxFDR = 0.05, trend = "down", n.perm = 10) cluster.1 <- which(stats.clustered$cluster == 1) stats.clustered[cluster.1, ] @ \noindent In this example, a running window of 5 consecutive genes is calculated along each chromosome; the median value of those 5 genes is assigned to the middle gene. If, in the 5-gene window, there are at least 2 genes that have an assigned median above the cutoff being used (cutoffs of -2, -4, -6, -8, and -10 are tried), then those genes are candidate cluster-generating genes. Starting from a candidate gene, and working outwards until encountering a positive t-statistic, if a consecutive run of at least 3 genes with t-statistic being negative could be made, then this forms a cluster. The default estimated FDR of 0.05 is used. \input{analyses2} \section{Visualisations} \input{visualisations} \section{Utility Functions} The function described in this section perform useful tasks that are commonly made with epigenetic data. \subsection{Windows and Counts} Often, it is required to create a set of windows covering the entire genome, for some analysis. The function \texttt{genomeBlocks} does this. <>= chrs <- c(50000, 10000) names(chrs) <- c("chr1", "chr2") genomeBlocks(chrs, width = 5000) @ \noindent This example makes a \texttt{GRanges} object of 5 kb windows along both example chromosomes. \ \\ \ \\ \input{utilities} \section{Summary} Repitools has a number of useful functions for quality checking, analysis, and comparison of trends. Many of the functions work seamlessly on array data, as well as sequencing data. Also, there are numerous utility functions, that perform some common task in the investigation of epigenomic data. Consult the package documentation for instructions on how to use functions that were not demonstrated by this vignette. \ \\ \ \\ \section{Environment} This vignette was created in: \input{sInfo} \end{document}