--- title: "MSstatsLiP Workflow: An example workflow and analysis of the MSstatsLiP package" author: "Devon Kohler ()" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{MSstatsLiP Workflow: An example workflow and analysis of the MSstatsLiP package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width=8, fig.height=8 ) ``` ## MSstatsLiP Workflow Vignette This Vignette provides an example workflow for how to use the package MSstatsLiP. ## Installation To install this package, start R (version "4.1") and enter: ``` {r, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("MSstatsLiP") ``` ```{r,include=TRUE,results="hide",message=FALSE,warning=FALSE} library(MSstatsLiP) library(tidyverse) library(data.table) ``` ## Workflow ### 1. Preprocessing ### 1.1 Raw Data Format The first step is to load in the raw dataset for both the PTM and Protein datasets. This data can then be converted into MSstatsLiP format with one of the built in converters, or manually converted. In this case we use the converter for Spectronaut. ``` {r} ## Read in raw data files head(LiPRawData) head(TrPRawData) ``` ### 1.2 Converter ``` {r} ## Run converter MSstatsLiP_data <- SpectronauttoMSstatsLiPFormat(LiPRawData, "../inst/extdata/ExampleFastaFile.fasta", TrPRawData, use_log_file = FALSE, append = FALSE) head(MSstatsLiP_data[["LiP"]]) head(MSstatsLiP_data[["TrP"]]) ## Make conditions match MSstatsLiP_data[["LiP"]][MSstatsLiP_data[["LiP"]]$Condition == "Control", "Condition"] = "Ctrl" MSstatsLiP_data[["TrP"]]$Condition = substr(MSstatsLiP_data[["TrP"]]$Condition, 1, nchar(MSstatsLiP_data[["TrP"]]$Condition)-1) ``` In the case above the SpectronauttoMSstatsLiPFormat was ran to convert the data into the format required for MSstatsLiP. Note that the condition names did not match between the LiP and TrP datasets. Here we edit the conditions in each dataset to match. Additionally, it is important that the column "FULL_PEPTIDE" in the LiP dataset contains both the Protein and Peptide information, seperated by an underscore. This allows us to summarize up to the peptide level, while keeping important information about which protein the peptide corresponds to. ### 2. Summarization The next step in the MSstatsLiP workflow is to summarize feature intensities per run into one value using the dataSummarizationLiP function. This function takes as input the formatted data from the converter. #### 2.1 Summarization Function ``` {r} MSstatsLiP_Summarized <- dataSummarizationLiP(MSstatsLiP_data, MBimpute = FALSE, use_log_file = FALSE, append = FALSE) names(MSstatsLiP_Summarized[["LiP"]]) lip_protein_data <- MSstatsLiP_Summarized[["LiP"]]$ProteinLevelData trp_protein_data <- MSstatsLiP_Summarized[["TrP"]]$ProteinLevelData head(lip_protein_data) head(trp_protein_data) ``` Again the summarization function returns a list of two dataframes one each for LiP and TrP. Each LiP and TrP is also a list with additional summary information. This summarized data can be used as input into some of the plotting functions included in the package. #### 2.2 Tryptic barplot MSstatsLiP has a wide variety of plotting functionality to analysis and assess the results of experiments. Here we plot the number of half vs fully tryptic peptides per replicate. ``` {r} trypticHistogramLiP(MSstatsLiP_Summarized, "../inst/extdata/ExampleFastaFile.fasta", color_scale = "bright", address = FALSE) ``` #### 2.3 Run Correlation Plot MSstatsLiP also provides a function to plot run correlation. ``` {r} correlationPlotLiP(MSstatsLiP_Summarized, address = FALSE) ``` #### 2.4 Coefficient of Variation Here we provide a simple script to examine the coefficient of variance between conditions ``` {r} MSstatsLiP_Summarized[["LiP"]]$FeatureLevelData %>% group_by(PEPTIDE, GROUP) %>% summarize(cv = sd(INTENSITY) / mean(INTENSITY)) %>% ggplot() + geom_violin(aes(x = GROUP, y = cv, fill = GROUP)) + labs(title = "Coefficient of Variation between Condtions", y = "Coefficient of Variation", x = "Conditon") ``` #### 2.5 QCPlot The following plots are used to view the summarized data and check for potential systemic issues. ``` {r} ## Quality Control Plot dataProcessPlotsLiP(MSstatsLiP_Summarized, type = 'QCPLOT', which.Peptide = "allonly", address = FALSE) ``` #### 2.6 Profile Plot ``` {r} dataProcessPlotsLiP(MSstatsLiP_Summarized, type = 'ProfilePlot', which.Peptide = c("P14164_ILQNDLK"), address = FALSE) ``` #### 2.7 PCA Plot In addition, Priciple Component Analysis can also be done on the summarized dataset. Three different PCA plots can be created one each for: Percent of explained variance per component, PC1 vs PC2 for peptides, and PC1 vs PC2 for conditions. ``` {r} PCAPlotLiP(MSstatsLiP_Summarized, bar.plot = FALSE, protein.pca = FALSE, comparison.pca = TRUE, which.comparison = c("Ctrl", "Osmo"), address=FALSE) PCAPlotLiP(MSstatsLiP_Summarized, bar.plot = FALSE, protein.pca = TRUE, comparison.pca = FALSE, which.pep = c("P14164_ILQNDLK", "P17891_ALQLINQDDADIIGGRDR"), address=FALSE) ``` #### 2.8 Calculate Trypticity Finally, the trypticity of a peptide can also be calculated and added to any dataframe with the ProteinName and PeptideSequence column. ``` {r} feature_data <- data.table::copy(MSstatsLiP_Summarized$LiP$FeatureLevelData) data.table::setnames(feature_data, c("PEPTIDE", "PROTEIN"), c("PeptideSequence", "ProteinName")) feature_data$PeptideSequence <- substr(feature_data$PeptideSequence, 1, nchar(as.character( feature_data$PeptideSequence)) - 2) calculateTrypticity(feature_data, "../inst/extdata/ExampleFastaFile.fasta") MSstatsLiP_Summarized$LiP$FeatureLevelData%>% rename(PeptideSequence=PEPTIDE, ProteinName=PROTEIN)%>% mutate(PeptideSequence=substr(PeptideSequence, 1, nchar(as.character(PeptideSequence))-2) ) %>% calculateTrypticity("../inst/extdata/ExampleFastaFile.fasta") ``` ### 3. Modeling The modeling function groupComparisonLiP takes as input the output of the summarization function dataSummarizationLiP. #### 3.1 Function ```{r} MSstatsLiP_model <- groupComparisonLiP(MSstatsLiP_Summarized, fasta = "../inst/extdata/ExampleFastaFile.fasta", use_log_file = FALSE, append = FALSE) lip_model <- MSstatsLiP_model[["LiP.Model"]] trp_model <- MSstatsLiP_model[["TrP.Model"]] adj_lip_model <- MSstatsLiP_model[["Adjusted.LiP.Model"]] head(lip_model) head(trp_model) head(adj_lip_model) ## Number of significant adjusted lip peptides adj_lip_model %>% filter(adj.pvalue < .05) %>% nrow() ``` The groupComparisonLiP function outputs a list with three separate models. These models are as follows: LiP model, TrP model, and adjusted LiP model. #### 3.2 Volcano Plot ``` {r} groupComparisonPlotsLiP(MSstatsLiP_model, type = "VolcanoPlot", address = FALSE) ``` #### 3.3 Heatmap ``` {r} groupComparisonPlotsLiP(MSstatsLiP_model, type = "HEATMAP", numProtein=50, address = FALSE) ``` #### 3.4 Barcode Here we show a barcode plot, showing the coverage of LiP and TrP peptides. This function requires the data in MSstatsLiP format and the path to a fasta file. ```{r} StructuralBarcodePlotLiP(MSstatsLiP_model, "../inst/extdata/ExampleFastaFile.fasta", model_type = "Adjusted", which.prot = c("P53858"), address = FALSE) ```