The UniChem database provides a publicly available REST API for programmatic retrieval of mappings from standardized structural compound identifiers to unique compound IDs across a range of large online cheminformatic databases such as PubChem, ChEMBL, DrugBank and many more. The service accepts POST requests to two different end-points: /compound and /connectivity. Both endpoints accept query parameters via the POST body in JSON format. The /compound API returns exact matches for the queried compound, while the /connectivity API uses layers of the International Chemical Identifier (InChI) of the query compound to return exact matches as well as structurally related compounds such as isomers, salts, ionizations and more. [@UniChemBeta; @chambersUniChemUnifiedChemical2013]
The functions in AnnotationGx have been designed to allow package users to easily query UniChem resources without any pre-existing knowledge of HTTP requests or the API specifications. In doing so we hope to provide an R native interface for mapping between various cheminformatic databases, accessible to anyone familar with using R functions!
UniChem is provided under the EMBL-EBI Terms of Use. Source: https://www.ebi.ac.uk/licencing/
To see a table of database identifiers available via UniChem, you can call the getUniChemSources function. By default, just the database shortname (“Name”) and UniChem’s ID for it (“SourceID”) columns are returned. To return all columns, pass the all_columns = TRUE argument
getUnichemSources()
#> Name SourceID
#> <char> <int>
#> 1: probes_and_drugs 49
#> 2: pubchem 22
#> 3: bindingdb 31
#> 4: lipidmaps 33
#> 5: fdasrs 14
#> 6: nmrshiftdb2 24
#> 7: drugcentral 34
#> 8: chembl 1
#> 9: rcsb_pdb 3
#> 10: rhea 38
#> 11: surechembl 15
#> 12: brenda 37
#> 13: swisslipids 41
#> 14: CCDC 50
#> 15: molport 28
#> 16: gtopdb 4
#> 17: chebi 7
#> 18: drugbank 2
#> 19: hmdb 18
#> 20: pdbe 5
#> 21: comptox 32
#> 22: clinicaltrials 46
#> Name SourceID
#> <char> <int>When mapping using the queryUnichemCompound function, these are the sources that can be used from, and the databases to which the compound mappings will be returned.
The queryUnichemCompound function allows you to query the UniChem Compound API to retrieve mappings for a given compound identifier. The function takes two mandatory arguments. The first is the compound argument which is the compound identifier to be queried. The second is the type argument which is the type of compound identifier to search for. Options are “uci”, “inchi”, “inchikey”, and “sourceID”. The sourceID argument is optional and is only required if the type argument is “sourceID”.
The function returns a list of:
data.table containing the mapping to other Databases with the following headings:
character The compound identifiercharacter The name of the databasecharacter The long name of the databasecharacter The UniChem Source IDcharacter The URL of the sourcelist of the following six mappings:
character The UniChem Identifiercharacter The InChIKeycharacter The InChIcharacter The molecular formulacharacter connection representation “1-6(10)13-8-5-3-2-4-7(8)9(11)12”character hydrogen atom connections “2-5H,1H3,(H,11,12)”uci (UniChem Identifier)Note: This type of query requires you to know the UniChem Identifier for the compound.
queryUnichemCompound(compound = "161671", type = "uci")
#> $External_Mappings
#> compoundID Name NameLong sourceID
#> <char> <char> <char> <int>
#> 1: CHEMBL25 chembl ChEMBL 1
#> 2: DB00945 drugbank DrugBank 2
#> 3: AIN rcsb_pdb RCSB PDB 3
#> 4: 4139 gtopdb Guide to Pharmacology 4
#> 5: AIN pdbe Protein Data Bank in Europe 5
#> ---
#> 1011: NCT06451198 clinicaltrials Clinical Trials 46
#> 1012: NCT06468202 clinicaltrials Clinical Trials 46
#> 1013: NCT06478537 clinicaltrials Clinical Trials 46
#> 1014: PD002467 probes_and_drugs Probes&Drugs 49
#> 1015: ACSALA CCDC CSD (Cambridge Structural Database) 50
#> sourceURL
#> <char>
#> 1: https://www.ebi.ac.uk/chembldb/compound/inspect/CHEMBL25
#> 2: https://go.drugbank.com/drugs/DB00945
#> 3: https://www.rcsb.org/ligand/AIN
#> 4: https://www.guidetopharmacology.org/GRAC/LigandDisplayForward?ligandId=4139
#> 5: https://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/AIN
#> ---
#> 1011: https://clinicaltrials.gov/study/NCT06451198
#> 1012: https://clinicaltrials.gov/study/NCT06468202
#> 1013: https://clinicaltrials.gov/study/NCT06478537
#> 1014: https://www.probes-drugs.org/compounds/PD002467
#> 1015: https://www.ccdc.cam.ac.uk/structures/search?sid=UNICHEM&pid=csd:ACSALA
#>
#> $UniChem_Mappings
#> $UniChem_Mappings$UniChem.UCI
#> [1] 161671
#>
#> $UniChem_Mappings$UniChem.InchiKey
#> [1] "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"
#>
#> $UniChem_Mappings$UniChem.Inchi
#> [1] "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"
#>
#> $UniChem_Mappings$UniChem.formula
#> [1] "C9H8O4"
#>
#> $UniChem_Mappings$UniChem.connections
#> [1] "1-6(10)13-8-5-3-2-4-7(8)9(11)12"
#>
#> $UniChem_Mappings$UniChem.hAtoms
#> [1] "2-5H,1H3,(H,11,12)"sessionInfo()
#> R version 4.6.0 alpha (2026-04-05 r89794)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] data.table_1.18.2.1 AnnotationGx_0.99.2
#>
#> loaded via a namespace (and not attached):
#> [1] crayon_1.5.3 cli_3.6.6 knitr_1.51 rlang_1.2.0
#> [5] xfun_0.57 otel_0.2.0 jsonlite_2.0.0 glue_1.8.1
#> [9] backports_1.5.1 htmltools_0.5.9 sass_0.4.10 rappdirs_0.3.4
#> [13] rmarkdown_2.31 evaluate_1.0.5 jquerylib_0.1.4 fastmap_1.2.0
#> [17] yaml_2.3.12 lifecycle_1.0.5 httr2_1.2.2 memoise_2.0.1
#> [21] compiler_4.6.0 digest_0.6.39 R6_2.6.1 curl_7.0.0
#> [25] parallel_4.6.0 magrittr_2.0.5 bslib_0.10.0 checkmate_2.3.4
#> [29] withr_3.0.2 tools_4.6.0 xml2_1.5.2 cachem_1.1.0