%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{1. Introduction to Bioconductor}
# Introduction to Bioconductor
useR! 2014
Author: Martin Morgan (mtmorgan@fhcrc.org), Sonali Arora
Date: 30 June, 2014
```{r setup, echo=FALSE}
knitr::opts_chunk$set(cache=TRUE)
```
## R
Language and environment for statistical computing and graphics
- Full-featured programming language
- Interactive and *interpretted* -- convenient and forgiving
- Coherent, extensive documentation
- Statistical, e.g. `factor()`, `NA`
- Extensible -- CRAN, Bioconductor, github, ...
Vector, class, object
- Efficient _vectorized_ calculations on 'atomic' vectors `logical`,
`integer`, `numeric`, `complex`, `character`, `byte`
- Atomic vectors are building blocks for more complicated _objects_
- `matrix` -- atomic vector with 'dim' attribute
- `data.frame` -- list of equal length atomic vectors
- Formal _classes_ represent complicated combinations of vectors,
e.g., the return value of `lm()`, below
Function, generic, method
- Functions transform inputs to outputs, perhaps with side effects,
e.g., `rnorm(1000)`
- Argument matching first by name, then by position
- Functions may define (some) arguments to have default values
- _Generic_ functions dispatch to specific _methods_ based on class of
argument(s), e.g., `print()`.
- Methods are functions that implement specific generics, e.g.,
`print.factor`; methods are invoked _indirectly_, via the generic.
Introspection
- General properties, e.g., `class()`, `str()`
- Class-specific properties, e.g., `dim()`
Help
- `?print`: help on the generic print
- `?print.data.frame`: help on print method for objects of class
data.frame.
Example
```{r}
x <- rnorm(1000) # atomic vectors
y <- x + rnorm(1000, sd=.5)
df <- data.frame(x=x, y=y) # object of class 'data.frame'
plot(y ~ x, df) # generic plot, method plot.formula
fit <- lm(y ~x, df) # object of class 'lm'
methods(class=class(fit)) # introspection
```
## Bioconductor
Analysis and comprehension of high-throughput genomic data
- Statistical analysis: large data, technological artifacts, designed
experiments; rigorous
- Comprehension: biological context, visualization, reproducibility
- High-throughput
- Sequencing: RNASeq, ChIPSeq, variants, copy number, ...
- Microarrays: expression, SNP, ...
- Flow cytometry, proteomics, images, ...
Packages, vignettes, work flows
![Alt Sequencing Ecosystem](SequencingEcosystem.png)
- 824 packages
- Discover and navigate via [biocViews][]
- Package 'landing page'
- Title, author / maintainer, short description, citation,
installation instructions, ..., download statistics
- All user-visible functions have help pages, most with runnable
examples
- 'Vignettes' an important feature in Bioconductor -- narrative
documents illustrating how to use the package, with integrated code
- 'Release' (every six months) and 'devel' branches
Objects
- Represent complicated data types
- Foster interoperability
- S4 object system
- Introspection: `getClass()`, `showMethods(..., where=search())`,
`selectMethod()`
- 'accessors' and other documented functions / methods for
manipulation, rather than direct access to the object structure
- Interactive help
- `method?"substr,"` to select help on methods, `class?D`
for help on classes
Example
```{r Biostrings, message=FALSE}
require(Biostrings) # Biological sequences
data(phiX174Phage) # sample data, see ?phiX174Phage
phiX174Phage
m <- consensusMatrix(phiX174Phage)[1:4,] # nucl. x position counts
polymorphic <- which(colSums(m != 0) > 1)
m[, polymorphic]
```
```{r showMethods, eval=FALSE}
showMethods(class=class(phiX174Phage), where=search())
```
Exercise
1. Load the Biostrings package and phiX174Phage data set. What class
is phiX174Phage? Find the help page for the class, and identify
interesting functions that apply to it.
2. Discover vignettes in the Biostrings package with
`vignette(package="Biostrings")`. Add another argument to the
`vignette` function to view the 'BiostringsQuickOverview' vignette.
3. Navigate to the Biostrings landing page on
http://bioconductor.org. Do this by visiting the biocViews
page. Can you find the BiostringsQuickOverview vignette on the web
site?
4. The following code loads some sample data, 6 versions of the
phiX174Phage genome as a DNAStringSet object.
```{r phiX}
library(Biostrings)
data(phiX174Phage)
```
Explain what the following code does, and how it works
```{r consensusMatrix}
m <- consensusMatrix(phiX174Phage)[1:4,]
polymorphic <- which(colSums(m != 0) > 1)
mapply(substr, polymorphic, polymorphic, MoreArgs=list(x=phiX174Phage))
```
## Summary
Bioconductor is a large collection of R packages for the analysis and
comprehension of high-throughput genomic data. Bioconductor relies on
formal classes to represent genomic data, so it is important to
develop a rudimentary comfort with classes, including seeking help for
classes and methods. Bioconductor uses vignettes to augment
traditional help pages; these can be very valuable in illustrating
overall package use.
[biocViews]: http://bioconductor.org/packages/release/BiocViews.html