---
title: "Provide PathbankDb databases for AnnotationHub"
author: "Kozo Nishida"
graphics: no
package: AHPathbankDbs
output:
    BiocStyle::html_document:
        toc_float: true
vignette: >
    %\VignetteIndexEntry{Provide PathbankDb databases for AnnotationHub}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
    %\VignetteDepends{AnnotationHub}
---

```{r style, echo = FALSE, results = 'asis', message=FALSE}
BiocStyle::markdown()
```

# Fetch PathBank databases from `AnnotationHub`

The `AHPathbankDbs` package provides the metadata for all PathBank tibble
databases in `r Biocpkg("AnnotationHub")`. First we load/update the
`AnnotationHub` resource.

```{r load-lib, message = FALSE}
library(AnnotationHub)
ah <- AnnotationHub()
```

Next we list all PathBank entries from `AnnotationHub`.

```{r list-pathbankdb}
query(ah, "pathbank")
```

We can confirm the metadata in AnnotationHub in Bioconductor S3 bucket
with `mcols()`.

```{r confirm-metadata}
mcols(query(ah, "pathbank"))
```

We query only the PathBank tibble for species *Escherichia coli*.

```{r query-ecoli}
qr <- query(ah, c("pathbank", "Escherichia coli"))
qr
```
There are two types of tibble in the result, metabolites and proteins.
Let's get a tibble of metabolites here.

```{r load-ecolitbl}
ecolitbl <- qr[[1]]
ecolitbl
```

Each row shows information for one metabolite.
This tibble indicates which pathway of PathBank has those metabolites.
Each metabolite has a the name, HMDB ID, KEGG ID, ChEBI ID, DrugBank ID, CAS,
Formula, IUPAC, SMILES, InChi, and InChI Key as well as the pathway information
to which it belongs.

To get the metabolites defined for *TCA Cycle* we can call.

```{r get-metabolites4TCA}
ecolitbl[ecolitbl$`Pathway Name`=="TCA Cycle", ]
```

# Creating PathBank tibbles

This section describes the automated way to create PathBank tibble databases
using [PathBank pathways CSV](https://pathbank.org/downloads).

## Creating PathBank tibble databases

To create the databases we use the `createPathbankMetabolitesDb` and
`createPathbankProteinsDb` functions. These function downloads the "Metabolite
names linked to PathBank pathways CSV" and "Protein names linked to PathBank
pathways CSV". Then, those CSVs are divided into tables for each species
and tibbleed.

These functions have no parameters.
In other words, it does not have the function of making tibble only for a
specific species, but makes tibble for all species in PathBank CSV.

```{r create-rda, eval = FALSE}
library(AHPathbankDbs)
scr <- system.file("scripts/make-data.R", package = "AHPathBankDbs")
source(scr)
createPathbankMetabolitesDb()
createPathbankProteinsDb()
```

The each tibble is stored in the rda file and saved in the current working
directory.