--- title: "Annotation resources - `ensembldb`" author: "Johannes Rainer
Eurac Research, Bolzano, Italy
johannes.rainer@eurac.edu - github: jorainer - twitter: jo_rainer" date: "CSAMA 2019" output: ioslides_presentation: widescreen: false fig_width: 7 fig_height: 5 fig_retina: 2 fig_caption: false transition: faster css: jostyle.css --- ## Annotation of genomic regions - Annotations for genomic features provided by `TxDb` (`GenomicFeatures`) and `EnsDb` (`ensembldb`) databases. - `EnsDb`: - Designed for Ensembl - One database per species and Ensembl release - Extract data using methods: `genes`, `transcripts`, `exons`, `txBy`, `exonsBy`, ... - Results returned as `GRanges`, `GRangesList` or `DataFrame`. ## Annotation of genomic regions {.build} - Example: get all gene annotations from an `EnsDb`: ```{r, message = FALSE} library(EnsDb.Hsapiens.v86) edb <- EnsDb.Hsapiens.v86 genes(edb) ``` ## Filtering annotation resources - Extracting the full data not always required: filter database. - `AnnotationFilter`: provides *concepts* for filtering data resources. - One filter class for each annotation type/database column. ## Filtering annotation resources {.build} - Example: create filters ```{r} GeneNameFilter("BCL2", condition = "!=") AnnotationFilter(~ gene_name != "BCL2") AnnotationFilter(~ seq_name == "X" & gene_biotype == "lincRna") ``` ## Filtering `EnsDb` databases {.build} - Example: what filters can we use? ```{r} supportedFilters(edb) ``` ## Filtering `EnsDb` databases {.build} - Example: get all protein coding transcripts for the gene *BCL2*. ```{r} transcripts(edb, filter = ~ gene_name == "BCL2" & tx_biotype == "protein_coding") ``` ## Filtering `EnsDb` databases {.build} - Example: *filter* the whole database ```{r, message = FALSE} library(magrittr) edb %>% filter(~ genename == "BCL2" & tx_biotype == "protein_coding") %>% transcripts ``` ## Additional `ensembldb` capabilities - `EnsDb` contain also protein annotation data: - Protein sequence. - Mapping of transcripts to proteins. - Annotation to Uniprot accessions. - Annotation of all protein domains within protein sequences. - Functionality to map coordinates: - `genomeToTranscript`, `genomeToProtein`, - `transcriptToGenome`, `transcriptToProtein`, - `proteinToGenome`, `proteinToTranscript`. ## Where to find `EnsDb` databases? {.build} - `AnnotationHub`! ```{r, message = FALSE} library(AnnotationHub) query(AnnotationHub(), "EnsDb") ``` ## Finally *Thank you for your attention!*