---
title: 'NxtIRFdata: a data package for NxtIRF'
author: "Alex Chit Hei Wong"
date: "`r format(Sys.Date(), '%m/%d/%Y')`"
abstract: >
    Intron Retention (IR) is a form of alternative splicing whereby the intron 
    is retained (i.e. not spliced) in final messenger RNA. Although many 
    bioinformatics tools are available to quantitate other forms of alternative
    splicing, dedicated tools to quantify Intron Retention are limited. 
    Quantifying IR requires not only measurement of spliced transcripts (often 
    using mapped splice junction reads), but also measurement of the coverage of
    the putative retained intron. The latter requires adjustment for the fact 
    that many introns contain repetitive regions as well as other RNA expressing
    elements. IRFinder corrects for many of these complexities; however its 
    dependencies on Linux and STAR limits its wider usage. Also, IRFinder does
    not calculate other forms of splicings besides IR. Finally, IRFinder 
    produces text-based output, requiring an established understanding of the
    data produced in order to interpret its results.
    
    NxtIRF overcomes the above limitations. Firstly, NxtIRF incorporates the
    IRFinder C++ routines, allowing users to run the IRFinder algorithm in the
    R/Bioconductor environment on multiple platforms. NxtIRF is a full pipeline
    that quantifies IR (and other alternative splicing) events, organises the
    data and produces relevant visualisation. Additionally, NxtIRF offers an
    interactive graphical interface that allows users to explore the data.
    
    NxtIRFdata is a data package containing ready-made BED files of Mappability
    exclusion genomic regions. It also contains a fully-functioning example
    data set with a "mock" genome and genome annotation to demonstrate the
    functionalities of NxtIRF
output:
    rmarkdown::html_document:
        highlight: pygments
        toc: true
        toc_float: true
vignette: >
    %\VignetteIndexEntry{NxtIRFdata: a data package for NxtIRF}
    %\VignetteEngine{knitr::rmarkdown}
    %\usepackage[utf8]{inputenc}
---

# 1 Installation

To install this package, start R (version "4.1") and enter: 

```{r eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("NxtIRFdata")
```

Start using NxtIRFdata:

```{r}
library(NxtIRFdata)
```

# 2 Obtaining the example NxtIRF genome and gene annotation files

Examples in NxtIRF are demonstrated using an artificial genome and gene
annotation. A synthetic reference, with genome sequence (FASTA) and gene 
annotation (GTF) files are provided, based on the genes SRSF1, SRSF2, SRSF3, 
TRA2A, TRA2B, TP53 and NSUN5. These genes, each with an additional 100 flanking 
nucleotides, were used to construct an artificial "chromosome Z" (chrZ). 
Gene annotations, based on release-94 of Ensembl GRCh38 (hg38), were modified 
with genome coordinates corresponding to this artificial chromosome.

These files can be accessed as follows:

```{r}
example_fasta = chrZ_genome()
example_gtf = chrZ_gtf()
```

# 3 Obtaining the example BAM file dataset for NxtIRF

The set of 6 BAM files used in the NxtIRF vignette / example code can be
downloaded to a path of the user's choice using the following function:

```{r, results = FALSE, message=FALSE}
bam_paths = example_bams(path = tempdir())
```

Note that this downloads BAM files and not their respective BAI (BAM file 
indices). This is because NxtIRF reads BAM files natively and does not require
RSamtools. BAI files are provided with BAM files in their respective
ExperimentHub entries for users wishing to view these files using RSamtools.

# 4 Obtaining the Mappability Exclusion BED files

NxtIRFdata retrieves the relevant records from AnnotationHub and makes a local
copy of the BED file. This BED file is used to produce a genome reference
for NxtIRF.

Note that this function is intended to be called internally by NxtIRF. Users
interested in the format or nature of the Mappability BED file can call this
function to examine the contents of the BED file

```{r, results = FALSE, message=FALSE}

# To get the MappabilityExclusion for hg38 as a GRanges object
gr = get_mappability_exclusion(genome_type = "hg38", as_type = "GRanges")

# To get the MappabilityExclusion for hg38 as a locally-copied gzipped BED file
bed_path = get_mappability_exclusion(genome_type = "hg38", as_type = "bed.gz",
	path = tempdir())

# Other `genome_type` values include "hg19", "mm10", and "mm9"
```

# 5 Accessing NxtIRFdata via ExperimentHub

The data deposited in ExperimentHub can be accessed as follows:

```{r results = FALSE, message=FALSE}
library(ExperimentHub)
```

```{r eval = FALSE}
eh = ExperimentHub()
NxtIRF_hub = query(eh, "NxtIRF")

NxtIRF_hub

temp = eh[["EH6792"]]
temp

temp = eh[["EH6787"]]
temp
```

# 5 Information about the example BAM files

For more information about the example BAM files, refer to the NxtIRFdata
package documentation:

```{r eval=FALSE}
?`NxtIRFdata-package`
```

# SessionInfo

```{r}
sessionInfo()
```