---
title: "svaRetro Quick Overview"
author: "Ruining Dong"
date: "`r Sys.Date()`"
output: 
  BiocStyle::html_document
  # html_document:
  #   toc: yes
  #   toc_float:
  #     collapsed: yes
  #     smooth_scroll: yes
vignette: >
  %\VignetteIndexEntry{svaRetro Package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


```{r setup, include=FALSE, echo=FALSE}
options(width=1000)
knitr::opts_chunk$set(
  collapse = TRUE, 
  comment = "#>")
```

# Introduction
This vignette outlines a workflow of detecting retrotransposed transcripts (RTs) 
from Variant Call Format (VCF) using the `svaRetro` package. 

# Installation
The `svaRetro` package can be installed from *Bioconductor* as follows:
```{r, eval=FALSE}
if(!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("svaRetro")
```


# Using GRanges for structural variants: a breakend-centric data structure

This package uses a breakend-centric event notation adopted from the 
`StructuralVariantAnnotation` package. In short, breakends are stored in a 
GRanges object with strand used to indicate breakpoint orientation, where 
breakpoints are represented using a `partner` field containing the name of the 
breakend at the other side of the breakend. This notation was chosen as it 
simplifies the annotations of RTs which are detected at breakend-level.

# Workflow
## Loading data from VCF

VCF data is parsed into a `VCF` object using the `readVCF` function from the
Bioconductor package `VariantAnnotation`. Simple filters could be applied to a 
`VCF` object to remove unwanted calls. The `VCF` object is then converted to a 
`GRanges` object with breakend-centric notations using 
`StructuralVariantAnnotation`. More information about `VCF` objects and 
breakend-centric GRanges object can be found by
consulting the vignettes in the corresponding packages with 
`browseVignettes("VariantAnnotation")` and 
`browseVignettes("StructuralVariantAnnotation")`.

```{r input, include=TRUE,results="hide",message=FALSE,warning=FALSE}
library(StructuralVariantAnnotation)
library(VariantAnnotation)
library(svaRetro)

RT_vcf <- readVcf(system.file("extdata", "diploidSV.vcf", package = "svaRetro"))
```

```{r, include=TRUE,results="hide",message=FALSE,warning=FALSE}
RT_gr <- StructuralVariantAnnotation::breakpointRanges(RT_vcf, 
                                                       nominalPosition=TRUE)
head(RT_gr)
```
Note that `StructuralVariantAnnotation` requires the `GRanges` object to be 
composed entirely of valid breakpoints. Please consult the vignette of the 
`StructuralVariantAnnotation` package for ensuring breakpoint consistency.

## Identifying Retrotransposed Transcripts
The package provides `rtDetect` to identify RTs using the provided SV calls. 
This is achieved by detecting intronic deletions, which are breakpoints at 
exon-intron (and intron-exon) boundaries of a transcript. Fusions consisting of 
an exon boundary and a second genomic location are reported as potential 
insertion sites. Due to the complexity of RT events, insertion sites can be 
discovered on both left and right sides, only one side, or none at all.

```{r, include=TRUE,results="hide",message=FALSE,warning=FALSE}
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(dplyr)
hg19.genes <- TxDb.Hsapiens.UCSC.hg19.knownGene

RT <- rtDetect(RT_gr, hg19.genes, maxgap=10, minscore=0.8)
```

The output is a list of `GRanges` object consisting of two sets of `GRanges` 
calls, `insSite` and `junctions`, containing candidate insertion sites and 
exon-exon junctions respectively. Candidate insertion sites are annotated by 
the source transcripts and whether exon-exon junctions are detected for the 
source transcripts. RT junction breakends are annotated by the UCSC exon IDs, 
corresponding transcripts, and NCBI gene symbols.

```{r}
RT$SKA3
```


# Visualising breakpoint pairs via circos plots

One way of visualising RT is by circos plots. Here we use the package
[`circlize`](https://doi.org/10.1093/bioinformatics/btu393) to demonstrate 
the visualisation of insertion site and exon-exon junctions. 

To generate a simple circos plot of RT event with SKA3 transcript:
```{r, include=TRUE,results="hide",message=FALSE,warning=FALSE}
library(circlize)
rt_chr_prefix <- c(RT$SKA3$junctions, RT$SKA3$insSite)
seqlevelsStyle(rt_chr_prefix) <- "UCSC"
pairs <- breakpointgr2pairs(rt_chr_prefix)
pairs
```
To see supporting breakpoints clearly, we generate the circos plot according 
to the loci of event.
```{r}
circos.initializeWithIdeogram(
    data.frame(V1=c("chr13", "chr11"),
               V2=c(21720000,108585000),
               V3=c(21755000,108586000),
               V4=c("q12.11","q24.3"),
               V5=c("gneg","gpos50")))
circos.genomicLink(as.data.frame(S4Vectors::first(pairs)), 
                   as.data.frame(S4Vectors::second(pairs)))
circos.clear()
```

# SessionInfo
```{r}
sessionInfo()
```