---
title: "miRmine dataset as RangedSummarizedExperiment"
author:
    name: Dusan Randjelovic
    email: dusan.randjelovic@sbgenomics.com
package: miRmine
output:
    BiocStyle::html_document
abstract: |
    miRmine is data package built for easier use of miRmine dataset 
    (Panwar et al (2017) miRmine: A Database of Human miRNA Expression. 
    Bioinformatics, btx019. doi: 10.1093/bioinformatics/btx019). 
    In it's current version miRmine contains 304 publicly available 
    experiments from NCBI SRA. Annotations used are from miRBase v21 
    (miRBase: Annotating high confidence microRNAs using deep sequencing data.
    Kozomara A, Griffiths-Jones S. NAR 2014 42:D68-D73). 
vignette: |
    %\VignetteIndexEntry{miRmine}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
---

# Data preparation

miRmine dataset contains rich metadata around 304 selected publicly available, 
miRNA-Seq experiments. Authors' processed the data with miRdeep2 using 
annotation files from miRBase v21. Mentioned metadata is used as colData 
and miRBase annotations as GRanges are used as rowRanges while preparing 
this dataset as RangedSummarizedExperiment. Data used for preprocessing and 
constructing the `miRmine` RangedSummarizedExperiment are available in 
`extdata` folder. Details of this proccess could be followed in 
data help file: `?miRmine`.


```{r}
#library(GenomicRanges)
#library(rtracklayer)
#library(SummarizedExperiment)
#library(Biostrings)
#library(Rsamtools)

ext.data <- system.file("extdata", package = "miRmine")
list.files(ext.data)
```

Number of ranges from miRBase GFF and number of features output 
by miRdeep2 are not the same (2813 vs. 2822). After closer look it turns out 
that 2 rows from either **tissues** or **cell.lines** data are duplicated
(with same mature miRNA and same precursor miRNA) and 7 rows don't correspond 
to mirna/precursor combination existing in miRBase v21. These rows were 
removed for all samples, as seen in `?miRmine`.


# Usage

To load this dataset use:

```{r}
library("miRmine")
data(miRmine)
miRmine
```

You may want to further subset data on some of many colData features
(Tissue, Cell Line, Disease, Sex, Instrument) or output some specifics of 
particular experiment(s) (Accession #, Description, Publication):

```{r}
adenocarcinoma = miRmine[ , miRmine$Disease %in% c("Adenocarcinoma")]
adenocarcinoma
as.character(adenocarcinoma$Sample.Accession)
```

rowRanges data is also rich in metadata, containing all the features from 
miRBase hsa.gff3, with addition of actual miRNA sequence as DNAString 
instance. For example to read the sequence of top expressed miRNA over 
a subset of samples:

```{r}
top.mirna = names(sort(rowSums(assays(adenocarcinoma)$counts))[1])
rowRanges(adenocarcinoma)$mirna_seq[[top.mirna]]
```

`miRmine` could be directly used in DESeq2 
(note that expression values are RPM not raw reads):

```{r}
library("DESeq2")

mirmine.subset = miRmine[, miRmine$Tissue %in% c("Lung", "Saliva")]
mirmine.subset = SummarizedExperiment(
    assays = SimpleList(counts=ceiling(assays(mirmine.subset)$counts)), 
    colData=colData(mirmine.subset), 
    rowRanges=rowRanges(mirmine.subset),
    rowData=NULL
)

ddsSE <- DESeqDataSet(mirmine.subset, design = ~ Tissue)
ddsSE <- ddsSE[ rowSums(counts(ddsSE)) > 1, ]

dds <- DESeq(ddsSE)
res <- results(dds)
res
```

# Session info {.unnumbered}

```{r sessionInfo, echo=FALSE}
sessionInfo()
```