---
title: "XCMS Parameter Optimization with IPO"
author: "Gunnar Libiseller, Thomas Riebenbauer
JOANNEUM RESEARCH Forschungsgesellschaft m.b.H., Graz, Austria"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{XCMS Parameter Optimization with IPO}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r, echo = FALSE, message = FALSE, warning = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(faahKO)
```
## Introduction
This document describes how to use the R-package `IPO`
to optimize `xcms` parameters. Code examples on how to
use `IPO` are provided. Additional to `IPO` the R-packages
`xcms` and `rsm` are required. The R-package `msdata` and`mtbls2`
are recommended. The optimization process looks as following:
IPO optimization process
## Installation ```{r install_IPO, eval=FALSE} # try http:// if https:// URLs are not supported if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("IPO") ``` Installing main suggested packages ```{r install_IPO_suggestions, eval=FALSE} # for examples of peak picking parameter optimization: BiocManager::install("msdata") # for examples of optimization of retention time correction and grouping # parameters: BiocManager::install("faahKO") ``` ## Raw data `xcms` handles the file processing hence all files can be used that can be processed by `xcms`. ```{r file_choosing} datapath <- system.file("cdf", package = "faahKO") datafiles <- list.files(datapath, recursive = TRUE, full.names = TRUE) ``` ## Optimize peak picking parameters To optimize parameters different values (levels) have to tested for these parameters. To efficiently test many different levels design of experiment (DoE) is used. Box-Behnken and central composite designs set three evenly spaced levels for each parameter. The method `getDefaultXcmsSetStartingParams` provides default values for the lower and upper levels defining a range. Since the levels are evenly spaced the middle level or center point is calculated automatically. To edit the starting levels of a parameter set the lower and upper level as desired. If a parameter should not be optimized, set a single default value for `xcms` processing, do not set this parameter to NULL. The method `getDefaultXcmsSetStartingParams` creates a list with default values for the optimization of the peak picking methods `centWave` or `matchedFilter`. To choose between these two method set the parameter accordingly. The method `optimizeXcmsSet` has the following parameters: - files: the raw data which is the basis for optimization. This does not necessarly need to be the whole dataset, only quality controls should suffice. - params: a list consisting of items named according to `xcms` peak picking methods parameters. A default list is created by `getDefaultXcmsSetStartingParams()`. - BPPARAM: a `BiocParallelParam`-object (see `?BiocParallel::BiocParallelParam`) to controll the use of parallelisation of `xcms`. Defaults to `bpparam()`. - nSlaves: the number of experiments of an DoE processed in parallel - subdir: a directory where the response surface models are stored. Can also be `NULL` if no rsm's should be saved. The optimization process starts at the specified levels. After the calculation of the DoE is finished the result is evaluated and the levels automatically set accordingly. Then a new DoE is generated and processed. This continues until an optimum is found. The result of peak picking optimization is a list consisting of all calculated DoEs including the used levels, design, response, rsm and best setting. Additionally the last list item is a list (`\$best_settings`) providing the optimized parameters (`\$parameters`), an xcmsSet object (`\$xset`) calculated with these parameters and the response this `xcms`-object gives. ```{r load_IPO, message=FALSE} library(IPO) ``` ```{r optimize_peak_picking, fig.height=7, fig.width=7, warning=FALSE} peakpickingParameters <- getDefaultXcmsSetStartingParams('matchedFilter') #setting levels for step to 0.2 and 0.3 (hence 0.25 is the center point) peakpickingParameters$step <- c(0.2, 0.3) peakpickingParameters$fwhm <- c(40, 50) #setting only one value for steps therefore this parameter is not optimized peakpickingParameters$steps <- 2 time.xcmsSet <- system.time({ # measuring time resultPeakpicking <- optimizeXcmsSet(files = datafiles[1:2], params = peakpickingParameters, nSlaves = 1, subdir = NULL, plot = TRUE) }) ``` ```{r optimize_peak_picking_result} resultPeakpicking$best_settings$result optimizedXcmsSetObject <- resultPeakpicking$best_settings$xset ``` The response surface models of all optimization steps for the parameter optimization of peak picking are shown above. Currently the `xcms` peak picking methods `centWave` and `matchedFilter` are supported. The parameter `peakwidth` of the peak picking method `centWave` needs two values defining a minimum and maximum peakwidth. These two values need separate optimization and are therefore split into `min_peakwidth` and `max_peakwidth` in `getDefaultXcmsSetStartingParams`. Also for the `centWave` parameter prefilter two values have to be set. To optimize these use set `prefilter` to optimize the first value and `prefilter_value` to optimize the second value respectively. ## Optimize retention time correction and grouping parameters Optimization of retention time correction and grouping parameters is done simultaneously. The method `getDefaultRetGroupStartingParams` provides default optimization levels for the `xcms` retention time correction method `obiwarp` and the grouping method `density`. Modifying these levels should be done the same way done for the peak picking parameter optimization. The method `getDefaultRetGroupStartingParams` only supports one retention time correction method (`obiwarp`) and one grouping method (`density`) at the moment. The method `optimizeRetGroup` provides the following parameter: - xset: an `xcmsSet`-object used as basis for retention time correction and grouping. - params: a list consisting of items named according to `xcms` retention time correction and grouping methods parameters. A default list is created by `getDefaultRetGroupStartingParams`. - nSlaves: the number of experiments of an DoE processed in parallel - subdir: a directory where the response surface models are stored. Can also be NULL if no rsm's should be saved. A list is returned similar to the one returned from peak picking optimization. The last list item consists of the optimized retention time correction and grouping parameters (`\$best_settings`). ```{r optimize_retcor_group, fig.height=7, fig.width=7, warning = FALSE} retcorGroupParameters <- getDefaultRetGroupStartingParams() retcorGroupParameters$profStep <- 1 retcorGroupParameters$gapExtend <- 2.7 time.RetGroup <- system.time({ # measuring time resultRetcorGroup <- optimizeRetGroup(xset = optimizedXcmsSetObject, params = retcorGroupParameters, nSlaves = 1, subdir = NULL, plot = TRUE) }) ``` The response surface models of all optimization steps for the retention time correction and grouping parameters are shown above. Currently the `xcms` retention time correction method `obiwarp` and grouping method `density` are supported. ## Display optimized settings A script which you can use to process your raw data can be generated by using the function `writeRScript`. ```{r display_settings} writeRScript(resultPeakpicking$best_settings$parameters, resultRetcorGroup$best_settings) ``` ## Running times and session info Above calculations proceeded with following running times. ```{r times} time.xcmsSet # time for optimizing peak picking parameters time.RetGroup # time for optimizing retention time correction and grouping parameters sessionInfo() ```