--- title: "Mapping protein features to splicing events" shorttitle: "Mapping protein features to splicing events" author: "Diogo Veiga" affiliation: The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA" email: Diogo.Veiga@jax.org package: maser date: "`r Sys.Date()`" output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{Mapping protein features to splicing events} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", dpi = 100 ) ``` # Overview of protein annotation In this vignette, we describe a `r Rpackage("maser")` workflow for annotation and visualization of protein features affected by splicing. Integration of protein features to splicing events may reveal the impact of alternative splicing on protein function. We developed `r Rpackage("maser")` to enable systematic mapping of protein annotation from [UniprotKB](http://www.uniprot.org/) to splicing events. Protein features can be annotated and visualized along with transcripts affected by the splice event. In this manner, `r Rpackage("maser")` can identify whether the splicing is affecting regions of interest containing known domains or motifs, mutations, post-translational modification and other described protein structural features. # Annotation of protein features ## Creating the maser object We illustrate the workflow using the hypoxia dataset from the previous vignette. ```{r, warning = FALSE, message = FALSE} library(maser) library(rtracklayer) # Creating maser object using hypoxia dataset path <- system.file("extdata", file.path("MATS_output"), package = "maser") hypoxia <- maser(path, c("Hypoxia 0h", "Hypoxia 24h")) # Remove low coverage events hypoxia_filt <- filterByCoverage(hypoxia, avg_reads = 5) ``` ## Query available protein features at Uniprot Available UniprotKB protein annotation can be queried using `availableFeaturesUniprotKB()`. Currently there are 30 distinct features grouped into broader categories, including *Domain and Sites, PTM (Post-translational modifications), Molecule Processing, Topology, Mutagenesis and Structural Features*. ```{r, warning = FALSE, message = FALSE, echo=FALSE} knitr::kable( head(availableFeaturesUniprotKB(), 10) ) ``` ## Steps for annotation Protein feature annotation of splicing events is performed in two steps. 1. Use `mapTranscriptsToEvents()` to add both transcript and protein IDs to all events in the maser object. 2. Use `mapProteinFeaturesToEvents()` for annotation specifying UniprotKB features or categories. `mapTranscriptsToEvents()` identifies transcripts compatible with the splicing event by overlapping exons involved in splicing to the gene models provided in the Ensembl GTF. Each type of splice event applies a specific overlapping rule (described in the Introduction vignette). The function also maps transcripts to their corresponding protein identifiers in Uniprot when available. `mapTranscriptsToEvents()` requires an Ensembl or Gencode GTF using the hg38 build of the human genome. Ensembl GTFs can be retrieved using `r Biocpkg("AnnotationHub")` or imported using `import.gff()` from the `r Biocpkg("rtracklayer")` package. Several GTF releases are available, and `r Rpackage("maser")` is compatible with any version using the hg38 build. We are using reduced GTF extracted from Ensembl Release 85 for running examples. ```{r, warning = FALSE, message = FALSE} ## Ensembl GTF annotation gtf_path <- system.file("extdata", file.path("GTF","Ensembl85_examples.gtf.gz"), package = "maser") ens_gtf <- rtracklayer::import.gff(gtf_path) ``` In the second step, `mapProteinFeaturesToEvents()` retrieves data from UniprotKB and overlaps splicing events to genomic coordinates of protein features. ## SRSF6 example The splicing factor SRSF6 undergoes splicing during hypoxia by expressing an alternative exon. We will annotate the exon skipping event with domain, sites and topology information. The first step is to obtain a maser object containing SRSF6 splicing information, and then map transcripts to splicing events. ```{r, warning = FALSE, message = FALSE} # Retrieve gene specific splicing events srsf6_events <- geneEvents(hypoxia_filt, "SRSF6") srsf6_events ``` ```{r, warning = FALSE, message = FALSE} # Map transcripts to splicing events srsf6_mapped <- mapTranscriptsToEvents(srsf6_events, ens_gtf) ``` If transcript mapping worked correctly, Ensembl and Uniprot identifiers will be added to splicing events. Possible `NA` values indicates non-protein coding transcripts. In this case, the splicing involves two Ensembl transcripts coding for the Q13247 isoform of SRSF6. ```{r, warning = FALSE, message = FALSE, eval=FALSE} head(annotation(srsf6_mapped, "SE")) ``` ```{r, warning = FALSE, message = FALSE, echo=FALSE} knitr::kable( head(annotation(srsf6_mapped, "SE")) ) ``` Now we are ready to call `mapProteinFeaturesToEvents()` for annotation. Feature annotation can be interactively displayed in a web browser using `display()` or retrieved as a `data.frame` using `annotation()`. `mapProteinFeaturesToEvents()` will add extra columns describing the feature name, feature description and protein identifiers for which the annotation has been assigned. Possible `NA` values indicate the particular feature is not annotated for the splice event. ```{r, warning = FALSE, message = FALSE} # Annotate splicing events with protein features srsf6_annot <- mapProteinFeaturesToEvents(srsf6_mapped, c("Domain_and_Sites", "Topology"), by="category") ``` ```{r, warning = FALSE, message = FALSE, eval=FALSE} head(annotation(srsf6_annot, "SE")) ``` ```{r, warning = FALSE, message = FALSE, echo=FALSE} knitr::kable( head(annotation(srsf6_annot, "SE")) ) ``` By inspecting the results, we see that the SRSF6 exon skipping event is annotated with the Uniprot features **domain, chain and mod-res (modidifed residue)**. Visualization of the splice event, transcripts and protein features is performed with `plotUniprotKBFeatures()`. In this example, exons in the splice event overlap the Serine/arginine-rich splicing factor 6 region of the protein, while the upstream exon and downstream exons are overlapping the RRM1 and RRM2 domains of SRSF6, respectively. ```{r, warning = FALSE, message = FALSE} # Plot splice event, transcripts and protein features plotUniprotKBFeatures(srsf6_mapped, "SE", event_id = 33209, gtf = ens_gtf, features = c("domain", "chain"), show_transcripts = TRUE) ``` ## RIPK2 example RIPK2 has an exon skipping event in the hypoxia dataset. Following the example above, we map transcripts to splicing events and annotate protein features overlapping the splice event. We find out that the alternative exon overlaps the kinase domain of the protein, thus possibly changing the configuration of this domain during hypoxia. The ATP and proton acceptor binding sites are overlapping exons flanking the alternative exon. ```{r, warning = FALSE, message = FALSE} ripk2_events <- geneEvents(hypoxia_filt, "RIPK2") ripk2_mapped <- mapTranscriptsToEvents(ripk2_events, ens_gtf) ripk2_annot <- mapProteinFeaturesToEvents(ripk2_mapped, tracks = c("Domain_and_Sites"), by = "category") ``` ```{r, warning = FALSE, message = FALSE} plotUniprotKBFeatures(ripk2_annot, type = "SE", event_id = 14319, features = c("domain", "binding", "act-site"), gtf = ens_gtf, zoom = FALSE, show_transcripts = TRUE) ``` # Session info Here is the output of `sessionInfo()` on the system on which this document was compiled: ```{r, echo=FALSE} sessionInfo() ```