--- title: "An introduction to biodbKegg" author: "Pierrick Roger" date: "`r BiocStyle::doc_date()`" package: "`r BiocStyle::pkg_ver('biodbKegg')`" vignette: | %\VignetteIndexEntry{Introduction to the biodbKegg package.} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document: toc: yes toc_depth: 4 toc_float: collapsed: false BiocStyle::pdf_document: default bibliography: references.bib --- # Introduction biodbKegg is a *biodb* extension package that implements a connector to KEGG Compound database [@kanehisa2000_KEGG]. # Installation Install using Bioconductor: ```{r, eval=FALSE} if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install('biodbKegg') ``` # Initialization The first step in using *biodbKegg*, is to create an instance of the biodb class `BiodbMain` from the main *biodb* package. This is done by calling the constructor of the class: ```{r, results='hide'} mybiodb <- biodb::newInst() ``` During this step the configuration is set up, the cache system is initialized and extension packages are loaded. We will see at the end of this vignette that the *biodb* instance needs to be terminated with a call to the `terminate()` method. # Creating a connector to KEGG Compound database In *biodb* the connection to a database is handled by a connector instance that you can get from the factory. biodbKegg implements a connector to a remote database. Here is the code to instantiate a connector: ```{r} kegg.comp.conn <- mybiodb$getFactory()$createConn('kegg.compound') ``` # Accessing entries To retrieve entries, use: ```{r} entries <- kegg.comp.conn$getEntry(c('C00133', 'C00751')) entries ``` To convert a list of entries into a dataframe, run: ```{r} x <- mybiodb$entriesToDataframe(entries, compute=FALSE) x ``` # Search for compounds of a certain mass ```{r} ids <- kegg.comp.conn$searchForEntries(list(monoisotopic.mass=list(value=64, delta=2.0)), max.results=10) entries <- mybiodb$getFactory()$getEntry('kegg.compound', ids) ``` # Add information to a data frame containing KEGG Compound IDs If you have a data frame containing a column with KEGG Compound IDs, you can add information such as associated KEGG Enzymes, associated KEGG Pathways and KEGG Modules to your data frame, for a specific organism. For the example we use the list of compound IDs we already have, to construct a data frame: ```{r Data frame with a column containing KEGG Compound IDs} kegg.comp.ids <- c('C06144', 'C06178', 'C02659') mydf <- data.frame(kegg.ids=kegg.comp.ids) ``` Using the `addInfo()` method of `KeggCompoundConn` class, we add information about pathways, enzymes and modules for these compounds: ```{r Calling addInfo()} kegg.comp.conn$addInfo(mydf, id.col='kegg.ids', org='mmu') ``` Note that, by default, the number of values for each field is limited to 3. Please see the help page of `KeggCompoundConn` for more information about `addInfo()`, and a description of all parameters. The list of organisms is available at . # Getting pathways related to compounds using KEGG databases In this example we will look for pathways related to specified compounds, count to how many pathways each compound is related, build a pathway graph, and create a decorated pathway graph picture. For that we will start from a given list of KEGG compound IDs, and explore KEGG to find to which organisms they can be related and try to discover links between them through KEGG pathways. As an example, we will start from a predefined list of KEGG compound IDs, and focus on one organism, the mouse. ## Look for pathways related to specified compounds Given a list of compounds and an organism, we can look for related pathways in a single command: ```{r} kegg.comp.ids <- c('C06144', 'C06178', 'C02659') pathways <- kegg.comp.conn$getPathwayIds(kegg.comp.ids, 'mmu') pathways ``` ## Count to how many pathways each compound is related With another function we can get the pathways found for each compound: ```{r} path.per.comp <- kegg.comp.conn$getPathwayIdsPerCompound(kegg.comp.ids, 'mmu') fct <- function(i) { if (i %in% names(path.per.comp)) length(path.per.comp[[i]]) else 0 } nb_mmu_gene_pathways <- vapply(kegg.comp.ids, fct, FUN.VALUE=0) names(nb_mmu_gene_pathways) <- kegg.comp.ids ``` Here, in the final table, we list the number of pathways for each KEGG compound: ```{r} nb_mmu_gene_pathways ``` ## Build a pathway graph To build a pathway graph, we need a connector to the KEGG Pathway database: ```{r Getting a connector to the KEGG Pathway database} kegg.path.conn <- mybiodb$getFactory()$getConn('kegg.pathway') ``` Building list of edges and vertices for pathways is done by calling buildPathwayGraph(): ```{r Building a pathway graph} kegg.path.conn$buildPathwayGraph(pathways[[1]]) ``` The object returned is a list whose names are the pathway IDs submitted, and the values are lists containing two data frames (edges and vertices). We can also get an igraph object for the a pathway (or a list of pathways): ```{r Getting an igraph for the pathway} ig <- kegg.path.conn$getPathwayIgraph(pathways[[1]]) ``` And we plot it: ```{r Plotting the pathway igraph} vert <- igraph::as_data_frame(ig, 'vertices') shapes <- vapply(vert[['type']], function(x) if (x == 'reaction') 'rectangle' else 'circle', FUN.VALUE='', USE.NAMES=FALSE) colors <- vapply(vert[['type']], function(x) if (x == 'reaction') 'yellow' else 'red', FUN.VALUE='', USE.NAMES=FALSE) plot(ig, vertex.shape=shapes, vertex.color=colors, vertex.label.dist=1, vertex.size=4, vertex.size2=4) ``` ## Create a decorated pathway graph picture We will now use a KEGG pathway picture and highlight some of the enzymes and compounds on it. For this, we first get the enzymes related to the compounds: ```{r Getting the enzymes} kegg.enz.ids <- mybiodb$entryIdsToSingleFieldValues(kegg.comp.ids, db='kegg.compound', field='kegg.enzyme.id') kegg.enz.ids ``` define the colors we want to apply: ```{r Attributing colors to entries} color2ids <- list(yellow=kegg.enz.ids, red=kegg.comp.ids) ``` Then we call the method that builds the highlighted image and print it: ```{r Creating a decorated pathway picture, message=FALSE} kegg.path.conn$getDecoratedGraphPicture(pathways[[1]], color2ids=color2ids) ``` # Terminate the biodb instance When done with your *biodb* instance you have to terminate it, in order to ensure release of resources (file handles, database connection, etc): ```{r} mybiodb$terminate() ``` # Session information ```{r} sessionInfo() ``` # References