---
title: "RNAmodR: creating classes for additional modification detection from high throughput sequencing."
author: "Felix G.M. Ernst"
date: "`r Sys.Date()`"
package: RNAmodR
output:
  BiocStyle::html_document:
    toc: true
    toc_float: true
    df_print: paged
vignette: >
  %\VignetteIndexEntry{RNAmodR - creating new classes for a new detection strategy}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: references.bib
---

```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown(css.files = c('custom.css'))
```

# Introduction

For users interested in the general aspect of any `RNAmodR` based package please
have a look at the [main vignette](RNAmodR.html) of the package.

This vignette is aimed at developers and researchers, who want to use the 
functionality of the `RNAmodR` package to develop a new modification strategy
based on high throughput sequencing data.

```{r, echo=FALSE}
suppressPackageStartupMessages({
  library(RNAmodR)
})
```
```{r, eval=FALSE}
library(RNAmodR)
```

Two classes have to be considered to establish a new analysis pipeline using
`RNAmodR`. These are the `SequenceData` and the `Modifier` class.

# A new `SequenceData` class

First, the `SequenceData` class has to be considered. Several classes are 
already implemented, which are:

 * `End5SequenceData`
 * `End3SequenceData`
 * `EndSequenceData`
 * `ProtectedEndSequenceData`
 * `CoverageSequenceData`
 * `PileupSequenceData`
 * `NormEnd5SequenceData`
 * `NormEnd3SequenceData`

If these cannot be reused, a new class can be implemented quite easily. First
the DataFrame class, the Data class and a constructor has to defined. The only
value, which has to be provided, is a default `minQuality` integer value and 
some basic information.

```{r}
setClass(Class = "ExampleSequenceDataFrame",
         contains = "SequenceDFrame")
ExampleSequenceDataFrame <- function(df, ranges, sequence, replicate,
                                      condition, bamfiles, seqinfo){
  RNAmodR:::.SequenceDataFrame("Example",df, ranges, sequence, replicate,
                               condition, bamfiles, seqinfo)
}
setClass(Class = "ExampleSequenceData",
         contains = "SequenceData",
         slots = c(unlistData = "ExampleSequenceDataFrame"),
         prototype = list(unlistData = ExampleSequenceDataFrame(),
                          unlistType = "ExampleSequenceDataFrame",
                          minQuality = 5L,
                          dataDescription = "Example data"))
ExampleSequenceData <- function(bamfiles, annotation, sequences, seqinfo, ...){
  RNAmodR:::SequenceData("Example", bamfiles = bamfiles, 
                         annotation = annotation, sequences = sequences,
                         seqinfo = seqinfo, ...)
}
```

Second, the `getData` function has to be implemented. This is used to load
the data from a bam file and must return a named list `IntegerList`,
`NumericList` or `CompressedSplitDataFrameList` per file.

```{r}
setMethod("getData",
          signature = c(x = "ExampleSequenceData",
                        bamfiles = "BamFileList",
                        grl = "GRangesList",
                        sequences = "XStringSet",
                        param = "ScanBamParam"),
          definition = function(x, bamfiles, grl, sequences, param, args){
            ###
          }
)
```

Third, the `aggregate` function has to be implemented. This function is used to
aggregate data over replicates for all or one of the conditions. The resulting 
data is passed on to the `Modifier` class.

```{r}
setMethod("aggregateData",
          signature = c(x = "ExampleSequenceData"),
          function(x, condition = c("Both","Treated","Control")){
            ###
          }
)
```

# A new `Modifier` class

A new `Modifier` class is probably the main class, which needs to be 
implemented. Three variable have to be set. `mod` must be a single element from
the `Modstrings::shortName(Modstrings::ModRNAString())`. `score` is the default
score, which is used for several function. A column with this name should be 
returned from the `aggregate` function. `dataType` defines the `SequenceData`
class to be used. `dataType` can contain multiple names of a `SequenceData` 
class, which are then combined to form a `SequenceDataSet`. 

```{r}
setClass("ModExample",
         contains = c("RNAModifier"),
         prototype = list(mod = "X",
                          score = "score",
                          dataType = "ExampleSequenceData"))
ModExample <- function(x, annotation, sequences, seqinfo, ...){
  RNAmodR:::Modifier("ModExample", x = x, annotation = annotation,
                     sequences = sequences, seqinfo = seqinfo, ...)
}
```

`dataType` can also be a `list` of `character` vectors, which leads then to the
creation of `SequenceDataList`. However, for now this is a hypothetical case and
should only be used, if the detection of a modification requires bam files from
two or more different methods to be used to detect one modification.

The `settings<-` function can be amended to save specifc settings (
`.norm_example_args` must be defined seperatly to normalize input arguments in
any way one sees fit).

```{r}
setReplaceMethod(f = "settings", 
                 signature = signature(x = "ModExample"),
                 definition = function(x, value){
                   x <- callNextMethod()
                   # validate special setting here
                   x@settings[names(value)] <- unname(.norm_example_args(value))
                   x
                 })
```

The `aggregateData` function is used to take the aggregated data from the 
`SequenceData` object and to calculate the specific scores, which are then 
stored in the `aggregate` slot.

```{r}
setMethod(f = "aggregateData", 
          signature = signature(x = "ModExample"),
          definition = 
            function(x, force = FALSE){
              # Some data with element per transcript
            }
)
```

The `findMod` function takes the aggregate data and searches for modifications,
which are then returned as a GRanges object and stored in the `modifications`
slot.

```{r}
setMethod("findMod",
          signature = c(x = "ModExample"),
          function(x){
            # an element per modification found.
          }
)
```

## A new `ModifierSet` class

The `ModifierSet` class is implemented very easily by defining the class and
the constructor. The functionality is defined by the `Modifier` class.

```{r}
setClass("ModSetExample",
         contains = "ModifierSet",
         prototype = list(elementType = "ModExample"))
ModSetExample <- function(x, annotation, sequences, seqinfo, ...){
  RNAmodR:::ModifierSet("ModExample", x = x, annotation = annotation,
                        sequences = sequences, seqinfo = seqinfo, ...)
}
```

# Visualization functions

Additional functions, which need to be implemented, are `getDataTrack` for the 
new `SequenceData` and new `Modifier` classes and 
`plotData`/`plotDataByCoord` for the new `Modifier` and `ModifierSet`
classes. `name` defines a transcript name found in `names(ranges(x))` and
`type` is the data type typically found as a column in the `aggregate` slot.

```{r}
setMethod(
  f = "getDataTrack",
  signature = signature(x = "ExampleSequenceData"),
  definition = function(x, name, ...) {
    ###
  }
)
setMethod(
  f = "getDataTrack",
  signature = signature(x = "ModExample"),
  definition = function(x, name, type, ...) {
  }
)
setMethod(
  f = "plotDataByCoord",
  signature = signature(x = "ModExample", coord = "GRanges"),
  definition = function(x, coord, type = "score", window.size = 15L, ...) {
  }
)
setMethod(
  f = "plotData",
  signature = signature(x = "ModExample"),
  definition = function(x, name, from, to, type = "score", ...) {
  }
)
setMethod(
  f = "plotDataByCoord",
  signature = signature(x = "ModSetExample", coord = "GRanges"),
  definition = function(x, coord, type = "score", window.size = 15L, ...) {
  }
)
setMethod(
  f = "plotData",
  signature = signature(x = "ModSetExample"),
  definition = function(x, name, from, to, type = "score", ...) {
  }
)
```

If unsure, how to modify these functions, have a look a the code in the 
`Modifier-Inosine-viz.R` file of this package.

# Summary

As suggested directly above, for a more detailed example have a look at the 
`ModInosine` class source code found in the `Modifier-Inosine-class.R` and
`Modifier-Inosine-viz.R` files of this package.

# Sessioninfo

```{r}
sessionInfo()
```