--- title: "VaSP: Quantification and Visulization of Variations of Splicing in Population" author: "*Huihui Yu, Qian Du and Chi Zhang*" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Variations of Splicing in Population} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} options(tinytex.verbose = TRUE) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## 1. Introduction {.tabset .tabset-fade .tabset-pills} **VaSP** is an R package for the discovery of genome-wide variable splicing events from short-read RNA-seq data. Based on R package **Ballgown**, VaSP calculates the Single Splicing Strength (3S) score of an intron by the junction count normalized by the gene-level average read coverage (score=junction count/gene-level average read coverage). The 3S scores can be used for further analysis, such as differential splicing analysis between two sample groups and sQTL (splicing Quantitative Trait Locus) identification in a large population. The VaSP package provides a function to find large-effect differential splicing events without the need of genotypic information in an inbred plant population, so called genotype-specific splicing (GSS). Integrated with functions from R package **Sushi**, VaSP package also provides a function to visualize gene splicing information for publication-quality multi-panel figures. ## 2. Installation Start R (>= 4.0) and run: ```{r,eval=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") # The following initializes usage of Bioc devel BiocManager::install(version='devel') BiocManager::install("vasp", build_vignettes=TRUE) vignette('vasp') ``` If you use an older version of R (>=3.5.0), enter: ```{r,eval=FALSE} BiocManager::install("yuhuihui2011/vasp@R3", build_vignettes=TRUE) vignette('vasp') ``` ## 3. Data input Users need to follow the manual of R package Ballgown () to create a ballgown object as an input for the VaSP package. See `?ballgown` for detailed information on creating Ballgown objects. The object can be stored in a `.RDate` file by `save()` . Here is an example of constructing rice.bg object from HISAT2+StringTie output ```{r,eval=FALSE} library(vasp) ?ballgown path<-system.file('extdata', package='vasp') rice.bg<-ballgown(samples = list.dirs(path = path,recursive = F) ) ``` ## 4. Quick start Calculate 3S (Single Splicing Strength) scores, find GSS (genotype-specific splicing) events and display the splicing information. * Calculating 3S scores: ```{r} library(vasp) data(rice.bg) ?rice.bg rice.bg score<-spliceGene(rice.bg, gene="MSTRG.183", junc.type = "score") tail(round(score,2),2) ``` * Discovering GSS: ```{r} gss <- BMfinder(score, cores = 1) gss ``` * Extracing intron information ```{r} gss_intron<-structure(rice.bg)$intron (gss_intron<-gss_intron[gss_intron$id%in%rownames(gss)]) range(gss_intron) ``` * Showing the splicing information ```{r splicePlot, fig.width=8, fig.height=5} splicePlot(rice.bg,gene='MSTRG.183',samples = sampleNames(rice.bg)[c(1,3,5)], start = 1179000, end = 1179300) ``` ## 5. Functions Currently, there are 6 functions in VaSP: ***getDepth***: Get read depth from a BAM file (in bedgraph format) ***getGeneinfo***: Get gene informaton from a ballgown object ***spliceGene***: Calculate 3S scores for one gene ***spliceGenome***: Calculate genome-wide splicing scores ***BMfinder***: Discover bimodal distrubition features ***splicePlot***: Visualization of read coverage, splicing information and gene information in a gene region ### 5.1 getDepth Get read depth from a BAM file (in bedgraph format) and return a data.frame in bedgraph file format which can be used as input for `plotBedgraph` in the **SuShi** package. ```{r plotBedgraph, fig.height=4, fig.width=8} path <- system.file("extdata", package = "vasp") bam_files <- list.files(path, "*.bam$") bam_files depth <- getDepth(file.path(path, bam_files[1]), "Chr1", start = 1171800, end = 1179400) head(depth) library(Sushi) par(mar=c(3,5,1,1)) plotBedgraph(depth, "Chr1", chromstart = 1171800, chromend = 1179400,yaxt = "s") mtext("Depth", side = 2, line = 2.5, cex = 1.2, font = 2) labelgenome("Chr1", 1171800, 1179400, side = 1, scipen = 20, n = 5,scale = "Kb") ``` ### 5.2 getGeneinfo Get gene informaton from a ballgown object by genes or by genomic regions and return a data.frame in bed-like file format that can be used as input for `plotGenes` in the **SuShi** package ```{r plotGenes, fig.height=5,fig.width=8} unique(geneIDs(rice.bg)) gene_id <- c("MSTRG.181", "MSTRG.182", "MSTRG.183") geneinfo <- getGeneinfo(genes = gene_id, rice.bg) trans <- table(geneinfo$name) # show how many exons each transcript has trans chrom = geneinfo$chrom[1] chromstart = min(geneinfo$start) - 1e3 chromend = max(geneinfo$stop) + 1e3 color = rep(SushiColors(2)(length(trans)), trans) par(mar=c(3,1,1,1)) p<-plotGenes(geneinfo, chrom, chromstart, chromend, col = color, bheight = 0.2, bentline = FALSE, plotgenetype = "arrow", labeloffset = 0.5) labelgenome(chrom, chromstart , chromend, side = 1, n = 5, scale = "Kb") ``` ### 5.3 spliceGene Calculate 3S Scores from ballgown object for a given gene. This function can only calculate one gene. Please use function `spliceGenome` to obtain genome-wide 3S scores. ```{r} rice.bg head(geneIDs(rice.bg)) score <- spliceGene(rice.bg, "MSTRG.183", junc.type = "score") count <- spliceGene(rice.bg, "MSTRG.183", junc.type = "count") ## compare tail(score) tail(count) ## get intron structrue intron <- structure(rice.bg)$intron intron[intron$id %in% rownames(score)] ``` ### 5.4 spliceGenome Calculate 3S scores from ballgown objects for all genes and return a list of two elements: "score' is a matrix of intron 3S scores with intron rows and sample columns and "intron" is a `GRanges` object of intron structure. ```{r} rice.bg splice <- spliceGenome(rice.bg, gene.select = NA, intron.select = NA) names(splice) head(splice$score) splice$intron ``` ### 5.5 BMfinder Find bimodal distrubition features and divide the samples into 2 groups by k-means clustering and return a matrix with feature rows and sample columns. ```{r} score <- spliceGene(rice.bg, "MSTRG.183", junc.type = "score") score <- round(score, 2) as <- BMfinder(score, cores = 1) # 4 bimodal distrubition features found ## compare as score[rownames(score) %in% rownames(as), ] ``` ### 5.6 spliceplot Visualization of read coverage, splicing information and gene information in a gene region. This function is a wrapper of `getDepth`, `getGeneinfo`, `spliceGene`, `plotBedgraph` and `plotGenes`. ```{r, fig.width=8, fig.height=4.4} samples <- paste("Sample", c("027", "102", "237"), sep = "_") bam.dir <- system.file("extdata", package = "vasp") ## plot the whole gene region without junction lables splicePlot(rice.bg, samples, bam.dir, gene = "MSTRG.183", junc.text = FALSE, bheight = 0.2) ## plot the alternative splicing region with junction splicing scores splicePlot(rice.bg, samples, bam.dir, gene = "MSTRG.183", start = 1179000) ``` If the bam files are provided (`bam.dir` is not NA), the read depth for each sample is plotted. Otherwise (`bam.dir=NA`), the conserved exons of the samples are displayed by rectangles (an example is the figure in **4. Quick start**). And by default (`junc.type = 'score'`, `junc.text = TRUE`), the junctions (represented by arcs) are labeled with splicing scores. You can change the argument `junc.text = FALSE` to unlabel the junctions or change the argument `junc.type = 'count'` to label with junction read counts. ```{r, fig.width=8, fig.height=4.4} splicePlot(rice.bg, samples, bam.dir, gene = "MSTRG.183", junc.type = 'count', start = 1179000) ``` There are other more options to modify the plot, please see the function `?splicePlot` for details. ## 6. Session Information ```{r} sessionInfo() ```