Querying Unichem Database

Jermiah Joseph, Shahzada Muhammad Shameel Farooq, and Christopher Eeles

Introduction to the Unichem API

The UniChem database provides a publicly available REST API for programmatic retrieval of mappings from standardized structural compound identifiers to unique compound IDs across a range of large online cheminformatic databases such as PubChem, ChEMBL, DrugBank and many more. The service accepts POST requests to two different end-points: /compound and /connectivity. Both endpoints accept query parameters via the POST body in JSON format. The /compound API returns exact matches for the queried compound, while the /connectivity API uses layers of the International Chemical Identifier (InChI) of the query compound to return exact matches as well as structurally related compounds such as isomers, salts, ionizations and more. [@UniChemBeta; @chambersUniChemUnifiedChemical2013]

The functions in AnnotationGx have been designed to allow package users to easily query UniChem resources without any pre-existing knowledge of HTTP requests or the API specifications. In doing so we hope to provide an R native interface for mapping between various cheminformatic databases, accessible to anyone familar with using R functions!

Licensing

UniChem is provided under the EMBL-EBI Terms of Use. Source: https://www.ebi.ac.uk/licencing/

library(AnnotationGx)

Available Databases

To see a table of database identifiers available via UniChem, you can call the getUniChemSources function. By default, just the database shortname (“Name”) and UniChem’s ID for it (“SourceID”) columns are returned. To return all columns, pass the all_columns = TRUE argument

getUnichemSources()
#>                 Name SourceID
#>               <char>    <int>
#>  1: probes_and_drugs       49
#>  2:          pubchem       22
#>  3:        bindingdb       31
#>  4:        lipidmaps       33
#>  5:           fdasrs       14
#>  6:      nmrshiftdb2       24
#>  7:      drugcentral       34
#>  8:           chembl        1
#>  9:         rcsb_pdb        3
#> 10:             rhea       38
#> 11:       surechembl       15
#> 12:           brenda       37
#> 13:      swisslipids       41
#> 14:             CCDC       50
#> 15:          molport       28
#> 16:           gtopdb        4
#> 17:            chebi        7
#> 18:         drugbank        2
#> 19:             hmdb       18
#> 20:             pdbe        5
#> 21:          comptox       32
#> 22:   clinicaltrials       46
#>                 Name SourceID
#>               <char>    <int>

When mapping using the queryUnichemCompound function, these are the sources that can be used from, and the databases to which the compound mappings will be returned.

Querying UniChem Compound API

The queryUnichemCompound function allows you to query the UniChem Compound API to retrieve mappings for a given compound identifier. The function takes two mandatory arguments. The first is the compound argument which is the compound identifier to be queried. The second is the type argument which is the type of compound identifier to search for. Options are “uci”, “inchi”, “inchikey”, and “sourceID”. The sourceID argument is optional and is only required if the type argument is “sourceID”.

The function returns a list of:

  1. “External_Mappings” data.table containing the mapping to other Databases with the following headings:
    1. “compoundID” character The compound identifier
    2. “Name” character The name of the database
    3. “NameLong” character The long name of the database
    4. “SourceID” character The UniChem Source ID
    5. “sourceURL” character The URL of the source
  2. “UniChem_Mappings” list of the following six mappings:
    1. “UCI” character The UniChem Identifier
    2. “InchiKey” character The InChIKey
    3. “Inchi” character The InChI
    4. “formula” character The molecular formula
    5. “connections” character connection representation “1-6(10)13-8-5-3-2-4-7(8)9(11)12”
    6. “hAtoms” character hydrogen atom connections “2-5H,1H3,(H,11,12)”

Example Searching using uci (UniChem Identifier)

Note: This type of query requires you to know the UniChem Identifier for the compound.

queryUnichemCompound(compound = "161671", type = "uci")
#> $External_Mappings
#>        compoundID             Name                            NameLong sourceID
#>            <char>           <char>                              <char>    <int>
#>    1:    CHEMBL25           chembl                              ChEMBL        1
#>    2:     DB00945         drugbank                            DrugBank        2
#>    3:         AIN         rcsb_pdb                            RCSB PDB        3
#>    4:        4139           gtopdb               Guide to Pharmacology        4
#>    5:         AIN             pdbe         Protein Data Bank in Europe        5
#>   ---                                                                          
#> 1011: NCT06451198   clinicaltrials                     Clinical Trials       46
#> 1012: NCT06468202   clinicaltrials                     Clinical Trials       46
#> 1013: NCT06478537   clinicaltrials                     Clinical Trials       46
#> 1014:    PD002467 probes_and_drugs                        Probes&Drugs       49
#> 1015:      ACSALA             CCDC CSD (Cambridge Structural Database)       50
#>                                                                         sourceURL
#>                                                                            <char>
#>    1:                    https://www.ebi.ac.uk/chembldb/compound/inspect/CHEMBL25
#>    2:                                       https://go.drugbank.com/drugs/DB00945
#>    3:                                             https://www.rcsb.org/ligand/AIN
#>    4: https://www.guidetopharmacology.org/GRAC/LigandDisplayForward?ligandId=4139
#>    5:           https://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/AIN
#>   ---                                                                            
#> 1011:                                https://clinicaltrials.gov/study/NCT06451198
#> 1012:                                https://clinicaltrials.gov/study/NCT06468202
#> 1013:                                https://clinicaltrials.gov/study/NCT06478537
#> 1014:                             https://www.probes-drugs.org/compounds/PD002467
#> 1015:     https://www.ccdc.cam.ac.uk/structures/search?sid=UNICHEM&pid=csd:ACSALA
#> 
#> $UniChem_Mappings
#> $UniChem_Mappings$UniChem.UCI
#> [1] 161671
#> 
#> $UniChem_Mappings$UniChem.InchiKey
#> [1] "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"
#> 
#> $UniChem_Mappings$UniChem.Inchi
#> [1] "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"
#> 
#> $UniChem_Mappings$UniChem.formula
#> [1] "C9H8O4"
#> 
#> $UniChem_Mappings$UniChem.connections
#> [1] "1-6(10)13-8-5-3-2-4-7(8)9(11)12"
#> 
#> $UniChem_Mappings$UniChem.hAtoms
#> [1] "2-5H,1H3,(H,11,12)"
sessionInfo()
#> R version 4.6.0 alpha (2026-04-05 r89794)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.18.2.1 AnnotationGx_0.99.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] crayon_1.5.3    cli_3.6.6       knitr_1.51      rlang_1.2.0    
#>  [5] xfun_0.57       otel_0.2.0      jsonlite_2.0.0  glue_1.8.1     
#>  [9] backports_1.5.1 htmltools_0.5.9 sass_0.4.10     rappdirs_0.3.4 
#> [13] rmarkdown_2.31  evaluate_1.0.5  jquerylib_0.1.4 fastmap_1.2.0  
#> [17] yaml_2.3.12     lifecycle_1.0.5 httr2_1.2.2     memoise_2.0.1  
#> [21] compiler_4.6.0  digest_0.6.39   R6_2.6.1        curl_7.0.0     
#> [25] parallel_4.6.0  magrittr_2.0.5  bslib_0.10.0    checkmate_2.3.4
#> [29] withr_3.0.2     tools_4.6.0     xml2_1.5.2      cachem_1.1.0