--- title: "Data Generation for topdownr" author: - name: Pavel V. Shliaha affiliation: Department of Biochemistry and Molecular Biology, University of Southern Denmark, Denmark. - name: Sebastian Gibb affiliation: Department of Anesthesiology and Intensive Care, University Medicine Greifswald, Germany. - name: Ole Nørregaard Jensen affiliation: Department of Biochemistry and Molecular Biology, University of Southern Denmark, Denmark. package: topdownr abstract: > This vignette describes the setup and the data preparation to create the input files needed for the analysis with the functionality the `topdownr` package. output: BiocStyle::html_document: toc_float: TRUE tidy: TRUE bibliography: topdownr.bib vignette: > %\VignetteIndexEntry{Data Generation for topdownr} %\VignetteEngine{knitr::rmarkdown} %\VignetteKeywords{Mass Spectrometry, Proteomics, Bioinformatics} %\VignetteEncoding{UTF-8} %\VignettePackage{topdownr} --- ```{r environment, echo=FALSE, message=FALSE, warning=FALSE} library("topdownr") library("BiocStyle") ``` # Foreword {-} `r BiocStyle::Biocpkg("topdownr")` is free and open-source software. If you use it, please support the project by citing it in publications: ```{r citation, echo=FALSE, results="asis"} ct <- format(citation("topdownr"), "textVersion") cat(gsub("DOI: *(.*)$", "DOI: [\\1](https://doi.org/\\1)", ct), "\n") ``` # Questions and bugs {-} For bugs, typos, suggestions or other questions, please file an issue in our tracking system (https://github.com/sgibb/topdownr/issues) providing as much information as possible, a reproducible example and the output of `sessionInfo()`. If you don't have a GitHub account or wish to reach a broader audience for general questions about proteomics analysis using R, you may want to use the Bioconductor support site: https://support.bioconductor.org/. # Introduction ## The `topdownr` Data Generation Workflow # Installation of Additional Software ## Setup the Thermo Software To create methods the user will have to install and modify Orbitrap Fusion LUMOS workstation first: 1. Request `TribridSeriesWorkstationSetup-v3.2.exe` from Thermo Scientific. 2. Install the workstation by running `TribridSeriesWorkstationSetup-v3.2.exe`. ## Setup XMLMethodChanger *XMLMethodChanger* is needed to convert the xml methods into `.meth` files. It could be found at https://github.com/thermofisherlsms/meth-modifications The user has to download and compile it himself (or request it from Thermo Scientific as well). You would need at least the *3.2 beta* version. ## Setup Operating System In order to use *XMLMethodChanger* the operating system has to use the `.` (dot) as decimal mark and the `,` (comma) as digit group separator (one thousand dot two should be formated as `1,000.2`). In *Windows 7* the settings are located at `Windows Control Panel > Region and Language > Formats`. Choose *English (USA)* here or use the *Additional settings* button to change it manually. ## Setup ScanHeadsman After data aquisition `topdownr` would need the header information from the `.raw` files. Therefore the *ScanHeadsman* software is used. It could be downloaded from https://bitbucket.org/caetera/scanheadsman It requires Microsoft **.NET 4.5** or later (it is often preinstalled on a typical modern Windows or could be found in Microsoft's Download Center, e.g. https://www.microsoft.com/en-us/download/details.aspx?id=30653). Additionally you would need Thermo's *MS File Reader* which could be downloaded free of charge (but you have to register) from the Thermo FlexNet website: https://thermo.flexnetoperations.com/ *ScanHeadsman* was created by Vladimir Gorshkov . # Creating Methods Importantly, *XMLmethodChanger* does not create methods *de novo*, but modifies pre-existing methods (supplied with *XMLMethodChanger*) using modifications described in XML files. Thus the whole process of creating user specified methods consists of 2 parts: 1. Construction of XML files with all possible combination of fragmentation parameters (see `topdownr::createExperimentsFragmentOptimisation`, and `topdownr::writeMethodXmls` below). 2. Submitting the constructed XML files together with a template `.meth` file to *XMLmethodChanger*. We choose to use targeted MS2 scans (TMS2) as a way to store the fragmentation parameters. Each TMS2 is stored in a separate experiment. Experiments do not overlap. ![Method Editor](images/methodeditor-exp-tms.png) # Data preparation with `topdownr` Shown below is the process of creating XML files and using them to modify the *TMS2IndependentTemplateForTD.meth* template file. ```{r writeMethodXml, eval=FALSE} library("topdownr") ## Create MS1 settings ms1 <- expandMs1Conditions( FirstMass=400, LastMass=1200, Microscans=as.integer(10) ) ## Set TargetMass targetMz <- cbind(mz=c(560.6, 700.5, 933.7), z=rep(1, 3)) ## Set common settings common <- list( OrbitrapResolution="R120K", IsolationWindow=1, MaxITTimeInMS=200, Microscans=as.integer(40), AgcTarget=c(1e5, 5e5, 1e6) ) ## Create settings for different fragmentation conditions cid <- expandTms2Conditions( MassList=targetMz, common, ActivationType="CID", CIDCollisionEnergy=seq(7, 35, 7) ) hcd <- expandTms2Conditions( MassList=targetMz, common, ActivationType="HCD", HCDCollisionEnergy=seq(7, 35, 7) ) etd <- expandTms2Conditions( MassList=targetMz, common, ActivationType="ETD", ETDReactionTime=as.double(1:2) ) etcid <- expandTms2Conditions( MassList=targetMz, common, ActivationType="ETD", ETDReactionTime=as.double(1:2), ETDSupplementalActivation="ETciD", ETDSupplementalActivationEnergy=as.double(1:2) ) uvpd <- expandTms2Conditions( MassList=targetMz, common, ActivationType="UVPD" ) ## Create experiments with all combinations of the above settings ## for fragment optimisation exps <- createExperimentsFragmentOptimisation( ms1=ms1, cid, hcd, etd, etcid, uvpd, groupBy=c("AgcTarget", "replication"), nMs2perMs1=10, scanDuration=0.5, replications=2, randomise=TRUE ) ## Write experiments to xml files writeMethodXmls(exps=exps) ## Run XMLMethodChanger runXmlMethodChanger( modificationXml=list.files(pattern="^method.*\\.xml$"), templateMeth="TMS2IndependentTemplateForTD.meth", executable="path\\to\\XmlMethodChanger.exe" ) ``` # Data Acquisition After setting up direct infusion make sure that MS1 spectrum produces expected protein mass after deconvolution by *Xtract*. Shown below is a deconvoluted MS1 spectrum for myoglobin. The dominant mass corresponds to myoglobin with Met removed. ![Xtract myoglobin](images/xtract-myo.png) # Data Preparation Prior to `R` analysis of protein fragmentation data we have to convert the `.raw` files. ## Extracting Header Information Some of the information (SpectrumId, Ion Injection Time (ms), Orbitrap Resolution, targeted Mz, ETD reaction time, CID activation and HCD activation) is stored in scan headers, while other (ETD reagent target and AGC target) is only available in method table. You can run *ScanHeadsman* from the commandline (`ScanHeadsman.exe --noMS --methods:CSV`) or use the function provided by `topdownr`: ```{r ScanHeadsman, eval=FALSE} runScanHeadsman( path="path\\to\\raw-files", executable="path\\to\\ScanHeadsman.exe" ) ``` *ScanHeadsman* will generate a `.txt` (scan header table) and a `.csv` (method table) file for each `.raw` file. ## Convert .raw files into mzML The spectra have to be charge state deconvoluted with *Xtract* node in *Proteome Discoverer 2.1*. The software returns deconvoluted spectra in mzML format. ![Proteome Discoverer](images/proteomediscoverer.png) Once a `.csv`, `.txt`, and `.mzML` file for each `.raw` have been produced we can start the analysis using `topdownr`. Please see *analysis* vignette (`vignette("analysis", package="topdownr")`) for an example. # Session Info ```{r sessioninfo} sessionInfo() ``` # References