---
title: "MQmetrics"
author: "Alvaro Sanchez-Villalba"
output: BiocStyle::html_document
vignette: >
    %\VignetteIndexEntry{MQmetrics}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
---
  
```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    warning = FALSE,
    comment = "#>"
)
```

  
# Overview

The package MQmetrics (MaxQuant metrics) provides a workflow to analyze the 
quality and reproducibility of your proteomics mass spectrometry analysis from 
MaxQuant. Input data are extracted from several 
[MaxQuant](https://pubmed.ncbi.nlm.nih.gov/27809316/) output 
tables ([MaxQUant Summer School Output tables](https://www.youtube.com/watch?v=wEIWRFKlzJ0)),
and produces a pdf report. 
It includes several visualization tools to check numerous parameters regarding
the quality of the runs.
It also includes two functions to visualize the 
[iRT peptides from Biognosys](https://biognosys.com/product/irt-kit/) 
in case they were spiked in the samples.

# Workflow

## Install the package


You can install MQmetrics from Biocodunctor with:

```r
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")


BiocManager::install("MQmetrics")
```

You can install the development version from 
[GitHub](https://github.com/svalvaro/) with:

``` {r install, eval = FALSE}
# install.packages("devtools")
devtools::install_github("svalvaro/MQmetrics")
```

# Data Input

After your MaxQuant run has finished, a folder named **combined** has been 
created. This folder should have at least two other folders within: 

../combined/txt/ Containing all the tables.txt
../combined/proc/ Containing #runningTimes.txt

You just need the path to the **combined** folder and you will be able to 
start using MQmetrics.

```{r MQPathCombined}
MQPathCombined <- '/home/alvaro/Documents/MaxQuant/example5/combined/'
```


## Generate a report

First you need to load the library.

```{r report, message=FALSE}
library(MQmetrics)
```

Then you just need to use the `generateReport()` function. This function 
has parameters to control each of the function that it aggregates. You can 
read more about those parameters by using:
  
```{r report_question_mark, eval=FALSE}
?generateReport
```

Though, the most important parameters are the following:

```{r parameters, eval=FALSE}
generateReport(MQPathCombined = , # directory to the combined folder
               output_dir = , # directory to store the resulting pdf
               long_names = , # If your samples have long names set it to TRUE
               sep_names = , # Indicate the separator of those long names
               UniprotID = , # Introduce the UniprotID of a protein of interest
               intensity_type = ,) # Intensity or LFQ

# The only mandatory parameter is MQPathCombined, the rest are optional.
```


Is as simple as this to use MQmetrics:
```{r example,eval=FALSE}
# If you're using a Unix-like OS use forward slashes.
MQPathCombined <- '/home/alvaro/Documents/MaxQuant/example5/combined/'

# If you're using Windows you can also use forward slashes:
MQPathCombined <- "D:/Documents/MaxQuant_results/example5/combined/"

#Use the generateReport function.
generateReport(MQPathCombined)
```


## Visualization plots

If you are only interested in a few plots from the `generateReport()`
function, you can do it.
You only need to have access to each file independently.

### Load the data

MQmetrics requires 8 tables from the MaxQuant analysis and the #runningTimes 
file.
If you want to learn more about the information of each of these tables, 
you can do so
in the 
[MaxQuant Summer School videos](https://www.youtube.com/watch?v=wEIWRFKlzJ0).

```{r, message=FALSE, warning=FALSE}

# To create the vignettes and examples I use data that is in the package itself:
MQPathCombined <- system.file('extdata/combined/', package = 'MQmetrics')


# make_MQCombined will read the tables needed for creating the outputs.
MQCombined <- make_MQCombined(MQPathCombined, remove_contaminants = TRUE) 

```


### MaxQuant Analysis Parameters

```{r ExperimentDuration, comment= NA}
MaxQuantAnalysisInfo(MQCombined) 
```

# Visualizations.

## Proteins Identified

The function `PlotProteinsIdentified()`, will take as input the 
proteinGroups.txt table and show the number of proteins and NAs in each sample.
It can differentiate two types of intensities: 'Intensity' or 'LFQ'.

```{r PlotProteins,fig.height=5, fig.width=7.2}
PlotProteinsIdentified(MQCombined,
                       long_names = TRUE,
                       sep_names = '_',
                       intensity_type = 'LFQ',
                       palette = 'Set2')
```

## Peptides Identified

The function `PlotPeptidesIdentified()`,  will take as input the summary
table and show the number of peptides sequences identified in each sample.

```{r PeptidesIdentified, warning=FALSE , fig.height=5, fig.width=7.2}
PlotPeptidesIdentified(MQCombined, 
                       long_names = TRUE,
                       sep_names = '_', 
                       palette = 'Set3')
```

## Proteins versus peptide/protein ratio

The function `PlotIdentificationRatio()`,  will take as input the summary and
proteinGroups tables and plot the number of protein found vs the ratio of 
peptides/proteins found in each Experiment.

```{r ratio, warning=FALSE , fig.height=5, fig.width=7.2}

PlotProteinPeptideRatio(MQCombined,
                      intensity_type = 'LFQ',
                      long_names = TRUE,
                      sep_names =  '_')

```

## MS/MS submitted versus identified

The function `PlotMsMs()`,  will take as input the summary.txt table and
show the number of MS/MS Submitted and identified in each sample.
```{r PlotMSMS, fig.height=5, fig.width=7.2}
PlotMsMs(MQCombined,
       long_names = TRUE, 
       sep_names = '_',
       position_dodge_width = 1,
       palette = 'Set2')
```

## Peaks submitted versus identified

The function `PlotPeaks()`, will take as input the summary.txt table and 
show the number peaks detected and sequenced in each sample.

```{r PlotPeaks, fig.height=5, fig.width=7.2}
PlotPeaks(MQCombined,
        long_names = TRUE,
        sep_names = '_',
        palette = 'Set2')
```

## Isotope patterns detected and sequenced

The function `PlotIsotopePattern()`,will take as input the summary.txt table
and show the number isotope patterns detected and sequenced in each sample.
```{r isotope, fig.height=5, fig.width=7.2}
PlotIsotopePattern(MQCombined,
                 long_names = TRUE,
                 sep_names = '_',
                 palette = 'Set2')
```


## Charge-state of the precursor ions

The function `PlotCharge()`, will take as input the evidence.txt table and
show the charge-state of the precursor ion in each sample.

```{r Charg, warning=FALSE, message=FALSE, fig.height=10, fig.width=7.2}
PlotCharge(MQCombined, 
         palette = 'Set2',
         plots_per_page = 6)
```

## Protease Specificity

The function `PlotProteaseSpecificity()`, will take as input the summary.txt
table and show the number peaks detected and sequenced in each sample.

```{r missed_cleavages,fig.height=10, fig.width=7.2}
PlotProteaseSpecificity(MQCombined, 
                      palette = 'Set2',
                      plots_per_page = 6)
```


## Peptide Hydrophobicity

The function `PlotHydrophobicity()`, takes as input the peptides.txt table
and returns the distribution of GRAVY score.

```{r PlotHydrophobicity, warning= F, message= F,fig.height=10, fig.width=7.2}
PlotHydrophobicity(MQCombined,
                 show_median =  TRUE,
                 binwidth = 0.1, 
                 size_median = 1.5, 
                 palette = 'Set2',
                 plots_per_page = 6)

```

## Andromeda Score

The function `PlotAndromedaScore()`, takes as input the peptides.txt table
and returns the distribution of MaxQuant's Andromeda Score.

```{r AndromedaScore, message = F, warning = FALSE,fig.height=10, fig.width=7.2}
PlotAndromedaScore(MQCombined,
                 palette = 'Set2',
                 plots_per_page = 6)
```


## Protein intensities comparison

The function `PlotIntensity()`, takes as input the  proteinGroups.txt table 
and returns a violin plot for those intensities. If the 'LFQ' intensities are
in the proteinGroups.txt table, it will by default split the violion into "LFQ'
and 'Intensity'. The parameter split_violin_intensity, can be set to FALSE and
then select wether you would like to visualize the 'Intensity' or 'LFQ'
intensity individually.
If split_violin_intensity is set TRUE, but no LFQ intensities are not present,
it will automatically show the normal Intensities.

```{r PlotIntensity, warning = FALSE, fig.height=5, fig.width=7.2}
PlotIntensity(MQCombined, 
            split_violin_intensity = TRUE, 
            intensity_type = 'LFQ', 
            log_base = 2, 
            long_names = TRUE, 
            sep_names = '_', 
            palette = 'Set2')

```


## Protein PCA

The function `PlotPCA()` takes as input the proteinGroups.txt table and 
creates a Principal Componente Analysis plot of each Experiment.

```{r PlotPCA, warning = FALSE, fig.height=5, fig.width=7.2}
PlotPCA(MQCombined,
      intensity_type = 'LFQ',
      palette = 'Set2')
```

## Dynamic Range (Combined)

The function `PlotCombinedDynamicRange()` takes as input the proteinGroups.txt
table and returns the dynamic range of all experiments combined. If the
parameter show_shade is used, a square will appear showing the percent_proteins 
selected and the orders of abundance.

```{r DynamicRange, fig.height=5, fig.width=7.2}
PlotCombinedDynamicRange(MQCombined,
                       show_shade = TRUE,
                       percent_proteins = 0.90)
```


## Dynamic Range (Individual)

The function `PlotAllDynamicRange()` takes as input the proteinGroups.txt
table and returns the dynamic range of all experiments separated. If the 
parameter show_shade is used, a square will appear showing the
*percent_proteins* selected and the orders of abundance.

```{r DynamicRangeAll, fig.height=10, fig.width=7.2}
PlotAllDynamicRange(MQCombined,
                  show_shade = TRUE, 
                  percent_proteins = 0.90)
```

## Protein Overlap

The function `PlotProteinOverlap()` takes as input the proteinGroups.txt
table and returns a plot that shows the number of proteins identified in the
samples.

```{r protein_coverage_all, fig.height=7.2, fig.width=7.2}
PlotProteinOverlap(MQCombined)
```


## Protein Coverage

The function `PlotProteinCoverage()` takes as input the peptides.txt and
proteinGroups.txt tables and one or multiple protein UniprotID(s). It shows,
if present, the coverage of that protein in each of the samples.

```{r protein_degradation,fig.height=12, fig.width=12, message=FALSE, warning=F}
PlotProteinCoverage(MQCombined,
                  UniprotID =  c('P55072', 'P13639'),
                  log_base = 2,
                  segment_width = 1,
                  palette = 'Set2',
                  plots_per_page = 6)
```


## iRT peptides (1)

The function `PlotiRT()` takes as input the evidence.txt table and returns,
if found the iRT peptides from Biognosys. Their retention time and intensity.

```{r irt_peps1, warning = FALSE, message= FALSE, fig.height=10, fig.width=7.2}
PlotiRT(MQCombined, 
      show_calibrated_rt = FALSE,
      tolerance = 0.001,
      plots_per_page = 6)
```

## iRT peptides (2)

The function `PlotiRTScore()` takes as input the evidence.txt table and
returns, if found, a linear regression of the retention times of the iRT
peptides of Biognosys.

```{r irt_peps2, warning = FALSE, message= FALSE, fig.height=10, fig.width=7.2}
PlotiRTScore(MQCombined, 
           tolerance = 0.001,
           plots_per_page = 6)
```

## Total Ion Current

The function `PlotTotalIonCurrent()` takes as input the msmsScans.txt, and
returns a plot showing the TIC values vs the retention time of each sample.
It can show as well the maximum value of each sample.

```{r TotalIonCurrent, fig.height=10, fig.width=7.2}
PlotTotalIonCurrent(MQCombined,
                  show_max_value = TRUE,
                  palette = 'Set2',
                  plots_per_page = 6)
```


## Acquisition Cycle

The function `PlotAcquisitionCycle` takes as input the msScans.txt table
and returns the cycle time and MS/MS count vs the retention time of each sample.
**Note**: msScans.ttxt is not generated automatically by MaxQuant, the user must
select it in MaxQuant: Global Parameters --> tables 

```{r Pn, fig.height= 10, warning=F, message=F,fig.height=10, fig.width=7.2}
PlotAcquisitionCycle(MQCombined, 
                   palette = 'Set2',
                   plots_per_page = 6)
```


## Post-translational modifications

The function `PlotPTM()`, takes as input the modificationSpecificPeptides.txt
table and returns the main modifications found at the peptide level.
The parameters can be adjusted to select the minimum number of peptides
modified per group, and whether or not you would like to visualize the 
Unmodified peptides.

```{r PTM, fig.height=8, fig.width=7.2}
PlotPTM(MQCombined,
      peptides_modified = 1,
      plot_unmodified_peptides = FALSE,
      palette = 'Set2',
      aggregate_PTMs = TRUE,
      plots_per_page = 6)
```


## Post-translational modifications across samples

The function `PlotPTMAcrossSamples()`, takes as input 
the modificationSpecificPeptides.txt table and one or multiple modification of 
interest. It returns a plot showing the intensities of that given PTM across 
the samples. It is important to specify correctly the name of the PTM, the same
way as MaxQuant:


```{r ptm_single, fig.height=8, fig.width=7.2}

PlotPTMAcrossSamples(MQCombined, 
                     PTM_of_interest = c('Acetyl (Protein N-term)',
                                         'Oxidation (M)'),
                     long_names = TRUE,
                     sep_names = '_')

```


The parameters can be adjusted to select the minimum number of peptides
modified per group, and whether or not you would like to visualize the 
Unmodified peptides.


# Helper Functions
This package provides two extra functions to helps to analyze the proteomics 
data from MaxQuant:

The function `make_MQCombined()` takes as input the path to the *combined*
folder resulting from MaxQuant analysis. It will read the tables needed and by
default remove the potential contaminants, Reverse, and proteins identified
only by site.

The function `ReportTables()` takes as input the path to the **combined**
folder, and returns tables with information needed to create some of the most
important plots in this package.

```{r report_tables, message=FALSE}

ReportTables(MQCombined,
           log_base = 2,
           intensity_type = 'Intensity')

```


# Session Info

```{r add_session_info}
sessionInfo()
```