--- title: "`Pbase` example data" output: BiocStyle::html_document: toc: true --- ```{r style, echo = FALSE, results = 'asis', message=FALSE} BiocStyle::markdown() ``` **Package:** [`Pbase`](http://bioconductor.org/packages/devel/bioc/html/Pbase.html)
**Authors:** [Laurent Gatto](http://cpu.sysbiol.cam.ac.uk/) and [Sebastian Gibb](http://sebastiangibb.de/research.html)
**Last compiled:** `r date()`
**Last modified:** `r file.info("Pbase-data.Rmd")$mtime` ```{r env, echo=FALSE, message=FALSE, warning=FALSE} library("Pbase") ``` ## Introduction This vignette briefly introduces the central data object of the `Pbase` package, namely `Proteins` instances, as depicted below. They contain a set of protein sequences (10 in the figure below), composed of the protein sequences (grey boxes) and annotation data (table on the left). Each protein links to a set of experimentally observed peptides (also in grey) that are also decorated with their own annotation data. The figure also show the accessors for the different data slots, that are detailed in `?Proteins`. ```{r pplot, echo=FALSE, fig.width=8.5, fig.height=8.5} Pbase:::pplot() ``` `Proteins` objects are populated by protein sequences stemming from a fasta file and the peptides originate from an LC-MSMS experiment. The original data used below is a 10 fmol [Peptide Retention Time Calibration Mixture](http://www.piercenet.com/product/peptide-retention-time-calibration-mixture) spiked into 50 ng HeLa background acquired on a Thermo Orbitrap Q Exactive instrument. A restricted set of high scoring human proteins from the UniProt release `2015_02` were searched using the `MSGF+` search engine. ## The fasta database ```{r fa, cache=TRUE} library("Biostrings") fafile <- system.file("extdata/HUMAN_2015_02_selected.fasta", package = "Pbase") fa <- readAAStringSet(fafile) fa ``` ## The PSM data ```{r psm, cache=TRUE} library("mzID") idfile <- system.file("extdata/Thermo_Hela_PRTC_selected.mzid", package = "Pbase") id <- flatten(mzID(idfile)) dim(id) head(id) ``` ## The Proteins object ```{r p, cache=TRUE} library("Pbase") p <- Proteins(fafile) p <- addIdentificationData(p, idfile) p ``` A `Proteins` object is composed of a set of protein sequences accessible with the `aa` accessor as well as an optional set of peptides features that are mapped as coordinates along the proteins, available with `pranges`. The actual peptide sequences can be extraced with `pfeatures`. ```{r paccess} aa(p) pranges(p) pfeatures(p) ``` A Proteins instance is further described by general `metadata`. Protein sequence and peptide features annotations can be accessed with `ametadata` and `pmetadata` (or `acols` and `pcols`) respectively. ```{r metadata} metadata(p) head(acols(p)) head(pcols(p)) ``` Specific proteins can be extracted by index of name using `[` and proteins and their peptide features can be plotted with the default plot method. ```{r plot, fig.align='center', cache=TRUE} seqnames(p) plot(p[c(1,9)]) ``` More details can be found in `?Proteins`. The object generated above is also directly available as `data(p)`. ## Session information ```{r si} sessionInfo() ```