--- title: "svaNUMT Quick Overview" author: "Ruining Dong" date: "`r Sys.Date()`" output: html_document: toc: yes toc_float: collapsed: yes smooth_scroll: yes vignette: > %\VignetteIndexEntry{svaNUMT Package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(#echo = TRUE, collapse = TRUE, comment = "#>") ``` ## Introduction This vignette outlines a workflow of detecting nuclear-mitochondrial DNA fusions from Variant Call Format (VCF) using the `svaNUMT` package. ## Using GRanges for structural variants: a breakend-centric data structure This package uses a breakend-centric event notation adopted from the `StructuralVariantAnnotation` package. In short, breakends are stored in a GRanges object with strand used to indicate breakpoint orientation. where breakpoints are represented using a `partner` field containing the name of the breakend at the other side of the breakend. This notation was chosen as it simplifies the annotations of RTs which are detected at breakend-level. ## Workflow ### Loading data from VCF VCF data is parsed into a `VCF` object using the `readVCF` function from the Bioconductor package `VariantAnnotation`. Simple filters could be applied to a `VCF` object to remove unwanted calls. The `VCF` object is then converted to a `GRanges` object with breakend-centric notations using `StructuralVariantAnnotation`. More information about `VCF` objects and breakend-centric GRanges object can be found by consulting the vignettes in the corresponding packages with `browseVignettes("VariantAnnotation")` and `browseVignettes("StructuralVariantAnnotation")`. ```{r input, include=TRUE,results="hide",message=FALSE,warning=FALSE} library(StructuralVariantAnnotation) library(VariantAnnotation) library(svaNUMT) vcf <- readVcf(system.file("extdata", "chr1_numt_pe_HS25.sv.vcf", package = "svaNUMT")) gr <- breakpointRanges(vcf) ``` Note that `StructuralVariantAnnotation` requires the `GRanges` object to be composed entirely of valid breakpoints. Please consult the vignette of the `StructuralVariantAnnotation` package for ensuring breakpoint consistency. ### Identifying Nuclear-mitochondrial Genome Fusion Events Function `svaNUMT` searches for NUMT events by identifying breakends supporting the fusion of nuclear chromosome and mitochondrial genome. `svaNUMT` returns identified breakends supporting candidate NUMTs in 2 lists of list of GRanges, grouped by chromosome and insertion sites. ```{r, include=TRUE,results="hide",message=FALSE,warning=FALSE} library(readr) numtS <- read_table(system.file("extdata", "numtS.txt", package = "svaNUMT"), col_names = FALSE) colnames(numtS) <- c("bin", "seqnames", "start", "end", "name", "score", "strand") numtS <- `seqlevelsStyle<-`(GRanges(numtS), "NCBI") library(BSgenome.Hsapiens.UCSC.hg19) genome <- BSgenome.Hsapiens.UCSC.hg19 genomeMT <- genome$chrMT ``` ```{r} NUMT <- numtDetect(gr, numtS, genomeMT, max_ins_dist = 20) ``` The breakends supporting the insertion sites and the MT sequence are arranged by the order of events. Below is an example of a detected NUMT event, where MT sequence `MT:15737-15836` followed by polyadenylation is inserted between `chr1:1688363-1688364`. ```{r} GRangesList(NU=NUMT$MT$NU$`1`[[1]], MT=NUMT$MT$MT$`1`[[1]]) ``` Below is an example to subset the detected NUMTs by a genomic region given `seqnames`, `start`, and `end`. For region `chr1:1000000-3000000`, there are 3 NUMTs detected. ```{r} seqnames = 1 start = 1000000 end = 3000000 i <- sapply(NUMT$MT$NU[[seqnames]], function(x) sum(countOverlaps(x, GRanges(seqnames = seqnames, IRanges(start, end))))>0) list(NU=NUMT$MT$NU[[seqnames]][i], MT=NUMT$MT$MT[[seqnames]][i]) ``` ## Visualising breakpoint pairs via circos plots One way of visualising paired breakpoints is by circos plots. Here we use the package [`circlize`](https://doi.org/10.1093/bioinformatics/btu393) to demonstrate breakpoint visualisation. The `bedpe2circos` function takes BEDPE-formatted dataframes (see `breakpointgr2bedpe()`) and plotting parameters for the `circos.initializeWithIdeogram()` and `circos.genomicLink()` functions from `circlize`. To generate a simple circos plot of one candidate NUMT event: ```{r, include=TRUE,results="hide",message=FALSE,warning=FALSE} library(circlize) numt_chr_prefix <- c(NUMT$MT$NU$`1`[[2]], NUMT$MT$MT$`1`[[2]]) GenomeInfoDb::seqlevelsStyle(numt_chr_prefix) <- "UCSC" pairs <- breakpointgr2pairs(numt_chr_prefix) pairs ``` To see supporting breakpoints clearly, we generate the circos plot according to the loci of event. ```{r} circos.initializeWithIdeogram( data.frame(V1=c("chr1", "chrM"), V2=c(1791073,1), V3=c(1791093,16571), V4=c("p15.4",NA), V5=c("gpos50",NA)), sector.width = c(0.2, 0.8)) #circos.initializeWithIdeogram() circos.genomicLink(as.data.frame(S4Vectors::first(pairs)), as.data.frame(S4Vectors::second(pairs))) circos.clear() ``` ## SessionInfo ```{r} sessionInfo() ```