1 Introduction

The tRNAdb and mttRNAdb (Jühling et al. 2009) is a compilation of tRNA sequences and tRNA genes. It is a follow up version of the database of Sprinzl et al.  (Sprinzl and Vassilenko 2005).

Using tRNAdbImport the tRNAdb can be accessed as outlined on the website http://trna.bioinf.uni-leipzig.de/ and the results are returned as a GRanges object.

2 Importing as GRanges

library(tRNAdbImport)
# accessing tRNAdb
# tRNA from yeast for Alanine and Phenylalanine
gr <- import.tRNAdb(organism = "Saccharomyces cerevisiae",
                    aminoacids = c("Phe","Ala"))
# get a Phenylalanine tRNA from yeast
gr <- import.tRNAdb.id(tdbID = gr[gr$tRNA_type == "Phe",][1L]$tRNAdb_ID)
# find the same tRNA via blast
gr <- import.tRNAdb.blast(blastSeq = gr$tRNA_seq)
# accessing mtRNAdb
# get the mitochrondrial tRNA for Alanine in Bos taurus
gr <- import.mttRNAdb(organism = "Bos taurus", 
                      aminoacids = "Ala")
# get one mitochrondrial tRNA in Bos taurus. 
gr <- import.mttRNAdb.id(mtdbID = gr[1L]$tRNAdb_ID)
# check that the result has the appropriate columns
istRNAdbGRanges(gr)
## [1] TRUE

3 Importing as GRanges from the RNA database

The tRNAdb offers two different sets of data, one containing DNA sequences and one containing RNA sequences. Depending on the database selected, DNA as default, the GRanges will contain a DNAStringSet or a ModRNAStringSet as the tRNA_seq column. Because the RNA sequences can contain modified nucleotides, the ModRNAStringSet class is used instead of the RNAStringSet class to store the sequences correctly with all information intact.

gr <- import.tRNAdb(organism = "Saccharomyces cerevisiae",
                    aminoacids = c("Phe","Ala"),
                    database = "RNA")
gr$tRNA_seq
##   A ModRNAStringSet instance of length 3
##     width seq                                               names               
## [1]    76 GGGCGUGUKGCGUAGDCGGDAGC...TPCGAUUCCGGACUCGUCCACCA tdbR00000012
## [2]    76 GCGGAUUUALCUCAGDDGGGAGA...TPCG"UCCACAGAAUUCGCACCA tdbR00000083
## [3]    76 GCGGACUUALCUCAGDDGGGAGA...TPCG"UCCACAGAGUUCGCACCA tdbR00000084

The special characters in the sequence might no exactly match the ones shown on the website, since they are sanitized internally to a unified dictionary defined in the Modstrings package. However, the type of modification encoded will remain the same (See the Modstrings package for more details).

The information on the position and type of the modifications can also be converted into a tabular format using the separate function from the Modstrings package.

separate(gr$tRNA_seq)
## GRanges object with 38 ranges and 1 metadata column:
##            seqnames    ranges strand |         mod
##               <Rle> <IRanges>  <Rle> | <character>
##    [1] tdbR00000012         9      + |         m1G
##    [2] tdbR00000012        16      + |           D
##    [3] tdbR00000012        20      + |           D
##    [4] tdbR00000012        26      + |       m2,2G
##    [5] tdbR00000012        34      + |           I
##    ...          ...       ...    ... .         ...
##   [34] tdbR00000084        46      + |         m7G
##   [35] tdbR00000084        49      + |        f5Cm
##   [36] tdbR00000084        54      + |         m5U
##   [37] tdbR00000084        55      + |           Y
##   [38] tdbR00000084        58      + |         m1A
##   -------
##   seqinfo: 3 sequences from an unspecified genome; no seqlengths

4 Further analysis

The output can be saved or directly used for further analysis.

library(Biostrings)
library(rtracklayer)
# saving the tRAN sequences as fasta file
writeXStringSet(gr$tRNA_seq, filepath = tempfile())
# converting tRNAdb information to GFF compatible values
gff <- tRNAdb2GFF(gr)
gff
## GRanges object with 3 ranges and 20 metadata columns:
##           seqnames    ranges strand |   source     type     score     phase
##              <Rle> <IRanges>  <Rle> | <factor> <factor> <integer> <integer>
##   [1] tdbR00000012      1-76      * |   tRNAdb     tRNA      <NA>      <NA>
##   [2] tdbR00000083      1-76      * |   tRNAdb     tRNA      <NA>      <NA>
##   [3] tdbR00000084      1-76      * |   tRNAdb     tRNA      <NA>      <NA>
##                 ID        no tRNA_length   tRNA_type tRNA_anticodon
##        <character> <integer>   <integer> <character>    <character>
##   [1] tdbR00000012         1          76         Ala            IGC
##   [2] tdbR00000083         2          76         Phe            #AA
##   [3] tdbR00000084         3          76         Phe            #AA
##                     tRNA_seq               tRNA_str tRNA_CCA.end      tRNAdb
##                  <character>            <character>    <logical> <character>
##   [1] GGGCGUGUKGCGUAGDCGGD.. <<<<<.<..<<<<.........         TRUE         RNA
##   [2] GCGGAUUUALCUCAGDDGGG.. <<<<<<<..<<<<.........         TRUE         RNA
##   [3] GCGGACUUALCUCAGDDGGG.. <<<<<<<..<<<<.........         TRUE         RNA
##          tRNAdb_ID        tRNAdb_organism tRNAdb_strain tRNAdb_taxonomyID
##        <character>            <character>   <character>       <character>
##   [1] tdbR00000012 Saccharomyces cerevi..                            4932
##   [2] tdbR00000083 Saccharomyces cerevi..                            4932
##   [3] tdbR00000084 Saccharomyces cerevi..                            4932
##       tRNAdb_verified       tRNAdb_reference     tRNAdb_pmid
##             <logical>        <CharacterList> <CharacterList>
##   [1]            TRUE J.R.PENSWICK, R.MART..                
##   [2]            TRUE P.E.NIELSEN, V.LEICK..                
##   [3]            TRUE G.KEITH, G.DIRHEIMER..                
##   -------
##   seqinfo: 3 sequences from an unspecified genome; no seqlengths
# Saving the information as gff3 file
export.gff3(gff, con = tempfile())

Please have a look at the tRNA package for further analysis of the tRNA sequences.

5 Session info

sessionInfo()
## R version 4.3.0 RC (2023-04-13 r84269)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] rtracklayer_1.60.0   tRNAdbImport_1.18.0  tRNA_1.18.0         
##  [4] Structstrings_1.16.0 Modstrings_1.16.0    Biostrings_2.68.0   
##  [7] XVector_0.40.0       GenomicRanges_1.52.0 GenomeInfoDb_1.36.0 
## [10] IRanges_2.34.0       S4Vectors_0.38.0     BiocGenerics_0.46.0 
## [13] BiocStyle_2.28.0    
## 
## loaded via a namespace (and not attached):
##  [1] SummarizedExperiment_1.30.0 gtable_0.3.3               
##  [3] rjson_0.2.21                xfun_0.39                  
##  [5] bslib_0.4.2                 ggplot2_3.4.2              
##  [7] lattice_0.21-8              Biobase_2.60.0             
##  [9] vctrs_0.6.2                 tools_4.3.0                
## [11] bitops_1.0-7                generics_0.1.3             
## [13] curl_5.0.0                  parallel_4.3.0             
## [15] tibble_3.2.1                fansi_1.0.4                
## [17] pkgconfig_2.0.3             Matrix_1.5-4               
## [19] lifecycle_1.0.3             GenomeInfoDbData_1.2.10    
## [21] compiler_4.3.0              stringr_1.5.0              
## [23] Rsamtools_2.16.0            munsell_0.5.0              
## [25] codetools_0.2-19            htmltools_0.5.5            
## [27] sass_0.4.5                  RCurl_1.98-1.12            
## [29] yaml_2.3.7                  pillar_1.9.0               
## [31] crayon_1.5.2                jquerylib_0.1.4            
## [33] BiocParallel_1.34.0         DelayedArray_0.26.0        
## [35] cachem_1.0.7                tidyselect_1.2.0           
## [37] digest_0.6.31               stringi_1.7.12             
## [39] dplyr_1.1.2                 restfulr_0.0.15            
## [41] bookdown_0.33               fastmap_1.1.1              
## [43] grid_4.3.0                  colorspace_2.1-0           
## [45] cli_3.6.1                   magrittr_2.0.3             
## [47] XML_3.99-0.14               utf8_1.2.3                 
## [49] scales_1.2.1                rmarkdown_2.21             
## [51] httr_1.4.5                  matrixStats_0.63.0         
## [53] evaluate_0.20               knitr_1.42                 
## [55] BiocIO_1.10.0               rlang_1.1.0                
## [57] glue_1.6.2                  BiocManager_1.30.20        
## [59] xml2_1.3.3                  jsonlite_1.8.4             
## [61] R6_2.5.1                    MatrixGenerics_1.12.0      
## [63] GenomicAlignments_1.36.0    zlibbioc_1.46.0

References

Jühling, Frank, Mario Mörl, Roland K. Hartmann, Mathias Sprinzl, Peter F. Stadler, and Joern Pütz. 2009. “TRNAdb 2009: Compilation of tRNA Sequences and tRNA Genes.” Nucleic Acids Research 37: D159–D162. https://doi.org/10.1093/nar/gkn772.

Sprinzl, Mathias, and Konstantin S. Vassilenko. 2005. “Compilation of tRNA Sequences and Sequences of tRNA Genes.” Nucleic Acids Research 33: D139–D140. https://doi.org/10.1093/nar/gki012.