---
title: "Description and usage of MsBackendMgf"
output:
BiocStyle::html_document:
toc_float: true
vignette: >
%\VignetteIndexEntry{Description and usage of MsBackendMgf}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
%\VignettePackage{Spectra}
%\VignetteDepends{Spectra,BiocStyle,BiocParallel}
---
```{r style, echo = FALSE, results = 'asis', message=FALSE}
BiocStyle::markdown()
```
**Package**: `r Biocpkg("MsBackendMgf")`
**Authors**: `r packageDescription("MsBackendMgf")[["Author"]] `
**Last modified:** `r file.info("MsBackendMgf.Rmd")$mtime`
**Compiled**: `r date()`
```{r, echo = FALSE, message = FALSE}
library(Spectra)
library(BiocStyle)
```
# Introduction
The `r Biocpkg("Spectra")` package provides a central infrastructure for the
handling of Mass Spectrometry (MS) data. The package supports
interchangeable use of different *backends* to import MS data from a
variety of sources (such as mzML files). The `MsBackendMgf` package
allows the import of MS/MS data from mgf ([Mascot Generic
Format](http://www.matrixscience.com/help/data_file_help.html))
files. This vignette illustrates the usage of the `MsBackendMgf`
package.
# Installation
To install this package, start `R` and enter:
```{r, eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("MsBackendMgf")
```
This will install this package and all eventually missing dependencies.
# Importing MS/MS data from mgf files
Mgf files store one to multiple spectra, typically centroided and of
MS level 2. In our short example below, we load 2 mgf files which are
provided with this package. Below we first load all required packages
and define the paths to the mgf files.
```{r load-libs}
library(Spectra)
library(MsBackendMgf)
fls <- dir(system.file("extdata", package = "MsBackendMgf"),
full.names = TRUE, pattern = "mgf$")
fls
```
MS data can be accessed and analyzed through `Spectra` objects. Below
we create a `Spectra` with the data from these mgf files. To this end
we provide the file names and specify to use a `MsBackendMgf()`
backend as *source* to enable data import. Note that below we also disable
parallel processing by specifically *registering* the serial processing as the
default. See `?bpparam` for more details on parallel processing options with the
`r Biocpkg("BiocParallel")` package.
```{r import}
library(BiocParallel)
register(SerialParam())
sps <- Spectra(fls, source = MsBackendMgf())
```
With that we have now full access to all imported spectra variables
that we list below.
```{r spectravars}
spectraVariables(sps)
```
Besides default spectra variables, such as `msLevel`, `rtime`,
`precursorMz`, we also have additional spectra variables such as the
`TITLE` of each spectrum in the mgf file.
```{r instrument}
sps$rtime
sps$TITLE
```
By default, fields in the mgf file are mapped to spectra variable names using
the mapping returned by the `spectraVariableMapping()` function:
```{r spectravariables}
spectraVariableMapping()
```
The names of this `character` vector are the spectra variable names (such as
`"rtime"`) and the field in the mgf file that contains that information are the
values (such as `"RTINSECONDS"`). Note that it is also possible to overwrite
this mapping (e.g. for certain mgf *dialects*) or to add additional
mappings. Below we add the mapping of the mgf field `"TITLE"` to a spectra
variable called `"spectrumName"`.
```{r map}
map <- c(spectrumName = "TITLE", spectraVariableMapping())
map
```
We can then pass this mapping to the `backendInitialize` method, or the
`Spectra` constructor.
```{r import2}
sps <- Spectra(fls, source = MsBackendMgf(), mapping = map)
```
We can now access the spectrum's title with the newly created spectra variable
`"spectrumName"`:
```{r spectrumName}
sps$spectrumName
```
In addition we can also access the m/z and intensity values of each
spectrum.
```{r mz}
mz(sps)
intensity(sps)
```
The `MsBackendMgf` backend allows also to export data in mgf format. Below we
export the data to a temporary file. We hence call the `export` function on our
`Spectra` object specifying `backend = MsBackendMgf()` to use this backend for
the export of the data. Note that we use again our custom mapping of variables
such that the spectra variable `"spectrumName"` will be exported as the
spectrums' title.
```{r export}
fl <- tempfile()
export(sps, backend = MsBackendMgf(), file = fl, mapping = map)
```
We next read the first lines from the exported file to verify that the title was
exported properly.
```{r export-check}
readLines(fl)[1:12]
```
Note that the `MsBackendMgf` exports all spectra variables as fields in the mgf
file. To illustrate this we add below a new spectra variable to the object and
export the data.
```{r}
sps$new_variable <- "A"
export(sps, backend = MsBackendMgf(), file = fl)
readLines(fl)[1:12]
```
We can see that also our newly defined variable was exported. Also, because we
did not provide our custom variable mapping this time, the variable
`"spectrumName"` was **not** used as the spectrum's title.
Sometimes it might be required to not export all spectra variables since some
exported fields might not be recognized/supported by external tools. Using the
`selectSpectraVariables` function we can reduce our `Spectra` object to export
to contain only relevant spectra variables. Below we restrict the data to only
m/z, intensity, retention time, acquisition number, precursor m/z and precursor
charge and export these to an mgf file. Also, some external tools don't support
the `"TITLE"` field in the MGF file. To disable export of the spectrum ID/title
`exportTitle = FALSE` can be used.
```{r}
sps_ex <- selectSpectraVariables(sps, c("mz", "intensity", "rtime",
"acquisitionNum", "precursorMz",
"precursorCharge"))
export(sps_ex, backend = MsBackendMgf(), file = fl, exportTitle = FALSE)
readLines(fl)[1:12]
```
# Session information
```{r}
sessionInfo()
```