--- title: "In-silico cleavage of polypeptides using the cleaver package" author: - name: Sebastian Gibb affiliation: Department of Anesthesiology and Intensive Care, University Medicine Greifswald, Germany. package: cleaver abstract: > This vignette describes the in-silico cleavage of polypeptides using the `cleaver` package. output: BiocStyle::html_document: toc_float: TRUE tidy: TRUE bibliography: cleaver.bib vignette: > %\VignetteIndexEntry{In-silico cleavage of polypeptides} %\VignetteEngine{knitr::rmarkdown} %\VignetteKeywords{Proteomics, Bioinformatics, Cleavage, Polypeptides} %\VignetteEncoding{UTF-8} %\VignettePackage{cleaver} --- ```{r environment, echo=FALSE, message=FALSE, warning=FALSE} library("cleaver") library("UniProt.ws") library("BRAIN") ``` # Introduction Most proteomics experiments need protein (peptide) separation and cleavage procedures before these molecules could be analyzed or identified by mass spectrometry or other analytical tools. `r BiocStyle::Biocpkg("cleaver")` allows in-silico cleavage of polypeptide sequences to e.g. create theoretical mass spectrometry data. The cleavage rules are taken from the [ExPASy PeptideCutter tool](https://web.expasy.org/peptide_cutter/peptidecutter_enzymes.html) [@peptidecutter]. # Simple Usage Loading the `r BiocStyle::Biocpkg("cleaver")` package: ```{r} library("cleaver") ``` Getting help and list all available cleavage rules: ```{r, eval=FALSE} help("cleave") ``` Cleaving of *Gastric juice peptide 1 (P01358)* using *Trypsin*: ```{r} ## cleave it cleave("LAAGKVEDSD", enzym="trypsin") ## get the cleavage ranges cleavageRanges("LAAGKVEDSD", enzym="trypsin") ## get only cleavage sites cleavageSites("LAAGKVEDSD", enzym="trypsin") ``` Sometimes cleavage is not perfect and the enzym miss some cleavage positions: ```{r} ## miss one cleavage position cleave("LAAGKVEDSD", enzym="trypsin", missedCleavages=1) cleavageRanges("LAAGKVEDSD", enzym="trypsin", missedCleavages=1) ## miss zero or one cleavage positions cleave("LAAGKVEDSD", enzym="trypsin", missedCleavages=0:1) cleavageRanges("LAAGKVEDSD", enzym="trypsin", missedCleavages=0:1) ``` Combine `r BiocStyle::Biocpkg("cleaver")` and `r BiocStyle::Biocpkg("Biostrings")` [@Biostrings]: ```{r} ## create AAStringSet object p <- AAStringSet(c(gaju="LAAGKVEDSD", pnm="AGEPKLDAGV")) ## cleave it cleave(p, enzym="trypsin") cleavageRanges(p, enzym="trypsin") cleavageSites(p, enzym="trypsin") ``` # Insulin \& Somatostatin Example Downloading *Insulin (P01308)* and *Somatostatin (P61278)* sequences from the [UniProt](http://www.uniprot.org)[@uniprot] database using `r BiocStyle::Biocpkg("UniProt.ws")` [@UniProt.ws]. ```{r} ## load UniProt.ws library library("UniProt.ws") ## select species Homo sapiens UniProt.ws <- UniProt.ws(taxId=9606) ## download sequences of Insulin/Somatostatin s <- select(UniProt.ws, keys=c("P01308", "P61278"), columns=c("SEQUENCE")) ## fetch only sequences sequences <- setNames(s$SEQUENCE, s$UNIPROTKB) ## remove whitespaces sequences <- gsub(pattern="[[:space:]]", replacement="", x=sequences) ``` Cleaving using *Pepsin*: ```{r} cleave(sequences, enzym="pepsin") ``` # Isotopic Distribution Of Tryptic Digested Insulin A common use case of in-silico cleavage is the calculation of the isotopic distribution of peptides (which were enzymatic digested in the in-vitro experimental workflow). Here `r BiocStyle::Biocpkg("BRAIN")` [@BRAIN; @BRAIN2] is used to calculate the isotopic distribution of `r BiocStyle::Biocpkg("cleaver")`'s output. (please note: it is only a toy example, e.g. the relation of intensity values between peptides isn't correct). ```{r} ## load BRAIN library library("BRAIN") ## cleave insulin cleavedInsulin <- cleave(sequences[1], enzym="trypsin")[[1]] ## create empty plot area plot(NA, xlim=c(150, 4300), ylim=c(0, 1), xlab="mass", ylab="relative intensity", main="tryptic digested insulin - isotopic distribution") ## loop through peptides for (i in seq(along=cleavedInsulin)) { ## count C, H, N, O, S atoms in current peptide atoms <- BRAIN::getAtomsFromSeq(cleavedInsulin[[i]]) ## calculate isotopic distribution d <- useBRAIN(atoms) ## draw peaks lines(d$masses, d$isoDistr, type="h", col=2) } ``` # Session Information ```{r sessioninfo, echo=FALSE} sessionInfo() ``` # References