---
title: "An introduction to biodbNci"
author: "Pierrick Roger"
date: "`r BiocStyle::doc_date()`"
package: "`r BiocStyle::pkg_ver('biodbNci')`"
abstract: |
  How to use the NCI CACTUS connector and its methods.
vignette: |
  %\VignetteIndexEntry{Introduction to the biodbNci package.}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
output:
  BiocStyle::html_document:
    toc: yes
    toc_depth: 4
    toc_float:
      collapsed: false
  BiocStyle::pdf_document: default
bibliography: references.bib
---

# Purpose

biodbNci is a *biodb* extension package that implements a connector to
biodbNci, a library for connecting to the National Cancer Institute (USA)
CACTUS API [@nci2022_CACTUS]. 

# Installation

Install using Bioconductor:
```{r, eval=FALSE}
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install('biodbNci')
```

# Initialization

The first step in using *biodbNci*, is to create an instance of the biodb
class `Biodb` from the main *biodb* package. This is done by calling the
constructor of the class:
```{r, results='hide'}
mybiodb <- biodb::newInst()
```
During this step the configuration is set up, the cache system is initialized
and extension packages are loaded.

We will see at the end of this vignette that the *biodb* instance needs to be
terminated with a call to the `terminate()` method.

# Creating a connector to CACTUS

In *biodb* the connection to a database is handled by a connector instance that
you can get from the factory.
biodbNci implements a connector to a remote database.
Here is the code to instantiate a connector:
```{r}
conn <- mybiodb$getFactory()$createConn('nci.cactus')
```

For this vignette, we will avoid the downloading of the full NCI CACTUS
database, and use instead an extract containing a few entries:
```{r}
dbExtract <- system.file("extdata", 'generated', "cactus_extract.txt.gz",
    package="biodbNci")
conn$setPropValSlot('urls', 'db.gz.url', dbExtract)
```

# Accessing entries

To get some of the first entry IDs (accession numbers) from the database, run:
```{r}
ids <- conn$getEntryIds(2)
ids
```

To retrieve entries, use:
```{r}
entries <- conn$getEntry(ids)
entries
```

To convert a list of entries into a dataframe, run:
```{r}
x <- mybiodb$entriesToDataframe(entries)
x
```

# Chemical Identifier Resolver web service

Here is an example of calling the Chemical Identifier Resolver for converting a
SMILES into an InChI:
```{r}
conn$wsChemicalIdentifierResolver(structid='C=O', repr='InChI')
```

# Convert CAS IDs

There are currently two methods in NCI CACTUS for converting from CAS IDs to
InChI or InChI keys:
```{r}
conn$convCasToInchi('87605-72-9')
conn$convCasToInchikey('87605-72-9')
```

The conversion is made thanks to the Chemical Identifier Resolver web service.

# Closing biodb instance

When done with your *biodb* instance you have to terminate it, in order to
ensure release of resources (file handles, database connection, etc):
```{r}
mybiodb$terminate()
```

# Session information

```{r}
sessionInfo()
```

# References