---
title: "Basepair-Resolution Analysis with BRGenomics"
subtitle: "_Straightforward tools for high-resolution genomics data_"
author:
  name: Mike DeBerardine
  email: mike.deberardine@gmail.com
package: BRGenomics
output:
  BiocStyle::html_document:
    toc: true
    toc_float: true
  BiocStyle::pdf_document:
    toc: true
abstract: |
  BRGenomics is designed to help users avoid code repetition by providing 
  efficient and tested functions to accomplish common, discrete tasks in the 
  analysis of high-throughput sequencing data. The included functions are geared 
  toward analyzing basepair-resolution sequencing data, the properties of which 
  are exploited to increase performance and user-friendliness. We leverage 
  standard Bioconductor methods and classes to maximize compatibility with its 
  rich ecoystem of bioinformatics tools, and we aim to make BRGenomics 
  sufficient for most post-alignment data processing. Common data processing and
  analytical steps are turned into fast-running one-liners that can be 
  simultaneously applied across numerous datasets. BRGenomics is 
  fully-documented, and we aim it to be beginner-friendly.
  
vignette: |
  %\VignetteIndexEntry{Overview}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

# Motivation

This package is designed to:

* Replace the use of command-line utilities for most post-alignment processing, 
e.g. `bedtools` and `deeptools`
* Be easy-to-use and easy-to-install, without requiring external dependencies, 
e.g. `hitslib` or the kent source utilities from the UCSC genome browser
* Allow users to string together common analysis pipelines with simple, 
fast-running one-liners
* Avoid code repetition by providing tested and validated code
* Exploit the properties of basepair-resolution data to optimize performance and 
increase user-friendliness
* Use process forking to make use of multicore processors 
* Maximize compatibility with Bioconductor's rich ecosystem of analysis 
software, in addition to leveraging the traditional strengths of R in statistics 
and data visualization
* Fully replace the `bigWig` R package

# Features

* Process and import bedGraph, bigWig, and bam files quickly and easily, with 
several pre-configured defaults for typical uses
* Count and filter spike-in reads
* Calculate spike-in normalization factors using several methods and options, 
including options for batch normalization
* Count reads by regions of interest
* Count reads at positions within regions of interest, at single-base resolution 
or in larger bins, and generate count matrices for heatmapping
* Calculate bootstrapped signal (e.g. readcount) profiles with confidence 
intervals (i.e. meta-profiles)
* Modify gene regions (e.g. extract promoters or genebody regions) using a 
single simple and straightforward function
* Conveniently and efficiently call `DESeq2` to calculate differential 
expression in a manner that is robust to global changes^[Avoid the default 
behavior of calculating genewise dispersion across all samples present, which is 
invalid if any experimental condition causes broad changes]
  + Use non-contiguous genes in `DESeq2` analysis, e.g. to exclude of specific 
  sites/peaks from the analysis (not usually supported by DESeq2)
  + Efficiently generate results across a list of comparisons
* Support for blacklisting throughout, and proper accounting of blacklisted 
sites in relevant calculations
* Users interact with an intuitive and computationally efficient data structure 
(the "basepair resolution `GRanges`" object), which is already supported by a 
rich, user-friendly suite of tools that greatly simplify working with datasets 
and annotations

# Coming Soon

Data processing:

* Summarizing and plotting replicate correlations
* Function to use random read sampling to assess if sequencing depth sufficient 
to stabilize arbitrary calculations (so a user can supply anonymous function to 
calculate things like rank expression, power analysis or differential expression 
by DESeq2, pausing indices, etc.)

Signal counting and analysis:

* Two-stranded meta-profile calculations
* Automated generation of a list of DESeq2 comparisons using all possible 
combinations; all possible permutations; or by defining a simple hierarchy of 
each-vs-one comparisons