---
title: "Motif import, export, and manipulation"
shorttitle: "Motif manipulation"
date: 17 October 2021
author:
- Benjamin Jean-Marie Tremblay^[benjamin.tremblay@uwaterloo.ca]
bibliography: universalmotif.bib
abstract: >
  The universalmotif package offers a number of functions to manipulate motifs. These are introduced and explored here, including those relating to: import, export, motif modification, creation, visualization, and other miscellaneous utilities.
vignette: >
  %\VignetteIndexEntry{Motif import, export, and manipulation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
output:
  bookdown::pdf_document2
---

```{r setup, echo=FALSE}
knitr::opts_chunk$set(collapse=TRUE,comment = "#>")
suppressPackageStartupMessages(library(universalmotif))
suppressMessages(suppressPackageStartupMessages(library(MotifDb)))
suppressMessages(suppressPackageStartupMessages(library(TFBSTools)))
data(examplemotif)
data(MA0003.2)
```

# Introduction

This vignette will introduce the `universalmotif` class and its structure, the import and export of motifs in R, basic motif manipulation, creation, and visualization. For an introduction to sequence motifs, see the [introductory](IntroductionToSequenceMotifs.pdf) vignette. For sequence-related utilities, see the [sequences](SequenceSearches.pdf) vignette. For motif comparisons and P-values, see the [motif comparisons and P-values](MotifComparisonAndPvalues.pdf) vignette.

# The universalmotif class and conversion utilities

## The universalmotif class

The `universalmotif` package stores motifs using the `universalmotif` class. The most basic `universalmotif` object exposes the `name`, `alphabet`, `type`, `type`, `strand`, `icscore`, `consensus`, and `motif` slots; furthermore, the `pseudocount` and `bkg` slots are also stored but not shown. `universalmotif` class motifs can be PCM, PPM, PWM, or ICM type.

```{r}
library(universalmotif)
data(examplemotif)
examplemotif
```

A brief description of all the available slots:

* `name`: motif name
* `altname`: (optional) alternative motif name
* `family`: (optional) a word representing the transcription factor or matrix family
* `organism`: (optional) organism of origin
* `motif`: the actual motif matrix
* `alphabet`: motif alphabet
* `type`: motif 'type', one of PCM, PPM, PWM, ICM; see the [introductory](IntroductionToSequenceMotifs.pdf) vignette
* `icscore`: (generated automatically) Sum of information content for the motif
* `nsites`: (optional) number of sites the motif was created from
* `pseudocount`: this value to added to the motif matrix during certain type conversions; this is necessary to avoid `-Inf` values from appearing in PWM type motifs
* `bkg`: a named vector of probabilities which represent the background letter frequencies
* `bkgsites`: (optional) total number of background sequences from motif creation
* `consensus`: (generated automatically) for DNA/RNA/AA motifs, the motif consensus
* `strand`: strand motif can be found on
* `pval`: (optional) P-value from _de novo_ motif search
* `qval`: (optional) Q-value from _de novo_ motif search
* `eval`: (optional) E-value from _de novo_ motif search
* `multifreq`: (optional) higher-order motif representations.
* `extrainfo`: (optional) any extra motif information that cannot fit in the existing slots

The other slots will be shown as they are filled.

```{r}
library(universalmotif)
data(examplemotif)

## The various slots can be accessed individually using `[`

examplemotif["consensus"]

## To change a slot, use `[<-`

examplemotif["family"] <- "My motif family"
examplemotif
```

Though the slots can easily be changed manually with `[<-`, a number of safeguards have been put in place for some of the slots which will prevent incorrect values from being introduced.

```{r,error=TRUE}
library(universalmotif)
data(examplemotif)

## The consensus slot is dependent on the motif matrix

examplemotif["consensus"]

## Changing this would mean it no longer matches the motif

examplemotif["consensus"] <- "GGGAGAG"

## Another example of trying to change a protected slot:

examplemotif["strand"] <- "x"
```

Below the exposed metadata slots, the actual 'motif' matrix is shown. Each position is its own column: row names showing the alphabet letters, and the column names showing the consensus letter at each position.

## Converting to and from another package's class

The `universalmotif` package aims to unify most of the motif-related Bioconductor packages by providing the `convert_motifs()` function. This allows for easy transition between supported packages (see `?convert_motifs` for a complete list of supported packages). Should you ever come across a motif class from another Bioconductor package which is not supported by the `universalmotif` package, but believe it should be, then feel free to bring it up with me.

The `convert_motifs` function is embedded in most of the `universalmotif` functions, meaning that compatible motif classes from other packages can be used without needed to manually convert them first. However keep in mind some conversions are final. Furthermore, internally, all motifs regardless of class are handled as `universalmotif` objects, even if the returning class is not. This will result in at times slightly different objects (though usually no information should be lost).

```{r}
library(universalmotif)
library(MotifDb)
data(examplemotif)
data(MA0003.2)

## convert from a `universalmotif` motif to another class

convert_motifs(examplemotif, "TFBSTools-PWMatrix")

## convert to universalmotif

convert_motifs(MA0003.2)

## convert between two packages

convert_motifs(MotifDb[1], "TFBSTools-ICMatrix")
```

# Importing and exporting motifs

## Importing

The `universalmotif` package offers a number of `read_*()` functions to allow for easy import of various motif formats. These include:

* `read_cisbp()`: CIS-BP [@cisbp]
* `read_homer()`: `HOMER` [@homer]
* `read_jaspar()`: JASPAR [@jaspar]
* `read_matrix()`: generic reader for simply formatted motifs
* `read_meme()`: `MEME` [@meme]
* `read_motifs()`: native `universalmotif` format (not recommended; use `saveRDS()` instead)
* `read_transfac()`: TRANSFAC [@transfac]
* `read_uniprobe()`: UniPROBE [@uniprobe]

These functions should work natively with these formats, but if you are generating your own motifs in one of these formats than it must adhere quite strictly to the format. An example of each of these is included in this package (see `system.file("extdata", package="universalmotif")`). If you know of additional motif formats which are not supported in the `universalmotif` package that you believe should be, or of any mistakes in the way the `universalmotif` package parses supported formats, then please let me know.

## Exporting

Compatible motif classes can be written to disk using:

* `write_homer()`
* `write_jaspar()`
* `write_matrix()`
* `write_meme()`
* `write_motifs()`
* `write_transfac()`

The `write_matrix()` function, similar to its `read_matrix()` counterpart, can write motifs as simple matrices with an optional header. Additionally, please keep in mind format limitations. For example, multiple `MEME` motifs written to a single file will all share the same alphabet, with identical background letter frequencies.

# Modifying motifs and related functions

## Converting motif type

Any `universalmotif` object can transition between PCM, PPM, PWM, and ICM types seamlessly using the `convert_type()` function. The only exception to this is if the ICM calculation is performed with sample correction, or as relative entropy. If this occurs, then back conversion to another type will be inaccurate (and `convert_type()` would not warn you, since it won't know this has taken place).

```{r}
library(universalmotif)
data(examplemotif)

## This motif is currently a PPM:

examplemotif["type"]
```
When converting to PCM, the `nsites` slot is needed to tell it how many sequences it originated from. If empty, 100 is used.
```{r}
convert_type(examplemotif, "PCM")
```
For converting to PWM, the `pseudocount` slot is used to determine if any correction should be applied:
```{r}
examplemotif["pseudocount"]
convert_type(examplemotif, "PWM")
```
You can either change the `pseudocount` slot manually beforehand, or pass one to `convert_type()`.
```{r}
convert_type(examplemotif, "PWM", pseudocount = 1)
```
There are a couple of additional options for ICM conversion: `nsize_correction` and `relative_entropy`. The former uses the `TFBSTools:::schneider_correction()` function (and thus requires that the `TFBSTools` package be installed) for sample size correction. The latter uses the `bkg` slot to calculate information content. See the IntroductionToSequenceMotifs vignette for an overview on the various types of ICM calculations.
```{r}
examplemotif["nsites"] <- 10
convert_type(examplemotif, "ICM", nsize_correction = FALSE)

convert_type(examplemotif, "ICM", nsize_correction = TRUE)

examplemotif["bkg"] <- c(A = 0.4, C = 0.1, G = 0.1, T = 0.4)
convert_type(examplemotif, "ICM", relative_entropy = TRUE)
```

## Merging motifs

The `universalmotif` package includes the `merge_motifs()` function to combine motifs. Motifs are first aligned, and the best match found before the motif matrices are averaged. The implementation for this is identical to that used by `compare_motifs()` (see the [motif comparisons vignette](MotifComparisonAndPvalues.pdf) for more information).

```{r, fig.height=4, fig.width=5}
library(universalmotif)

m1 <- create_motif("TTAAACCCC", name = "1")
m2 <- create_motif("AACC", name = "2")
m3 <- create_motif("AACCCCGG", name = "3")

view_motifs(c(m1, m2, m3),
  show.positions.once = FALSE, show.names = FALSE)
```
```{r,fig.height=2,fig.width=5}
view_motifs(merge_motifs(c(m1, m2, m3), method = "PCC"))
```

This functionality can also be automated to reduce the number of overly similar motifs in larger datasets via the `merge_similar()` function.

```{r}
library(universalmotif)
library(MotifDb)

motifs <- filter_motifs(MotifDb, family = "bHLH")[1:100]
length(motifs)

motifs <- merge_similar(motifs)
length(motifs)
```

Comparison and merging parameters can be fine-tuned as users wish. See the `compare_motifs()` and `merge_motifs()` documentation for more details, as well as the "Motif comparison and P-values" vignette.

## Motif reverse complement

Get the reverse complement of a motif.

```{r}
library(universalmotif)
data(examplemotif)

## Quickly switch to the reverse complement of a motif

## Original:

examplemotif

## Reverse complement:

motif_rc(examplemotif)
```

## Switching between DNA and RNA alphabets

Since not all motif formats or programs support RNA alphabets by default, the `switch_alph()` function can quickly go between DNA and RNA motifs.

```{r}
library(universalmotif)
data(examplemotif)

## DNA --> RNA

switch_alph(examplemotif)

## RNA --> DNA

motif <- create_motif(alphabet = "RNA")
motif

switch_alph(motif)
```

## Motif trimming

Get rid of low information content edges on motifs, such as `NNCGGGCNN` to `CGGGC`. The 'amount' of trimming can also be controlled by setting a minimum required information content, as well as the direction of trimming (by default both edges are trimmed).

```{r}
library(universalmotif)

motif <- create_motif("NNGCSGCGGNN")
motif

trim_motifs(motif)
trim_motifs(motif, trim.from = "right")
```

## Rounding motifs

Round off near-zero probabilities.

```{r, fig.height=3.5, fig.width=5}
motif1 <- create_motif("ATCGATGC", pseudocount = 10, type = "PPM", nsites = 100)
motif2 <- round_motif(motif1)
view_motifs(c(motif1, motif2))
```

# Motif creation

Though `universalmotif` class motifs can be created using the `new` constructor, the `universalmotif` package provides the `create_motif()` function which aims to provide a simpler interface to motif creation. The `universalmotif` class was initially designed to work natively with DNA, RNA, and amino acid motifs. Currently though, it can handle any custom alphabet just as easily. The only downsides to custom alphabets is the lack of support for certain slots such as the `consensus` and `strand` slots.

The `create_motif()` function will be introduced here only briefly; see `?create_motif` for details.

## From a PCM/PPM/PWM/ICM matrix

Should you wish to make use of the `universalmotif` functions starting from a motif class unsupported by `convert_motifs()`, you can instead manually create `universalmotif` class motifs using the `create_motif()` function and the motif matrix.

```{r}
motif.matrix <- matrix(c(0.7, 0.1, 0.1, 0.1,
                         0.7, 0.1, 0.1, 0.1,
                         0.1, 0.7, 0.1, 0.1,
                         0.1, 0.7, 0.1, 0.1,
                         0.1, 0.1, 0.7, 0.1,
                         0.1, 0.1, 0.7, 0.1,
                         0.1, 0.1, 0.1, 0.7,
                         0.1, 0.1, 0.1, 0.7), nrow = 4)

motif <- create_motif(motif.matrix, alphabet = "RNA", name = "My motif",
                      pseudocount = 1, nsites = 20, strand = "+")

## The 'type', 'icscore' and 'consensus' slots will be filled for you

motif
```

As a brief aside: if you have a motif formatted simply as a matrix, you can still use it with the `universalmotif` package functions natively without creating a motif with `create_motif()`, as `convert_motifs()` also has the ability to handle motifs formatted simply as matrices. However it is much safer to first format the motif beforehand with `create_motif()`.

## From sequences or character strings

If all you have is a particular consensus sequence in mind, you can easily create a full motif using `create_motif()`. This can be convenient if you'd like to create a quick motif to use with an external program such as from the `MEME` suite or `HOMER`. Note that ambiguity letters can be used with single strings.

```{r}
motif <- create_motif("CCNSNGG", nsites = 50, pseudocount = 1)

## Now to disk:
## write_meme(motif, "meme_motif.txt")

motif
```

## Generating random motifs

If you wish to, it's easy to create random motifs. The values within the motif are generated using `rgamma()` to avoid creating low information content motifs. If background probabilities are not provided, then they are generated with `rpois()`.

```{r}
create_motif()
```
You can change the probabilities used to generate the values within the motif matrix:
```{r}
create_motif(bkg = c(A = 0.2, C = 0.4, G = 0.2, T = 0.2))
```
 With a custom alphabet:
```{r}
create_motif(alphabet = "QWERTY")
```

# Motif visualization

## Motif logos

There are several packages which offer motif visualization capabilities, such as `seqLogo`, `motifStack`, and `ggseqlogo`. The `universalmotif` package has its own implementation via the function `view_motifs()`, which renders motifs using the `ggplot2` package (similar to `ggseqlogo`). Here I will briefly show how to use these to visualize `universalmotif` class motifs.

```{r, fig.height=2, fig.width=5}
library(universalmotif)
data(examplemotif)

## With the native `view_motifs` function:
view_motifs(examplemotif)
```

The `view_motifs()` function generates `ggplot` objects; feel free to manipulate them as such. For example, flipping the position numbers for larger motifs (where the text spacing can become tight):

```{r, fig.height=2.5, fig.width=5}
view_motifs(create_motif(15)) +
  ggplot2::theme(
    axis.text.x = ggplot2::element_text(angle = 90, hjust = 1)
  )
```

A large number of options are available for tuning the way motifs are plotted in `view_motifs()`. Visit the documentation for more information.

Using the other Bioconductor packages to view `universalmotif` motifs is fairly easy as well:

```{r, fig.height=2.5, fig.width=5}
## For all the following examples, simply passing the functions a PPM is
## sufficient
motif <- convert_type(examplemotif, "PPM")
## Only need the matrix itself
motif <- motif["motif"]

## seqLogo:
seqLogo::seqLogo(motif)

## motifStack:
motifStack::plotMotifLogo(motif)

## ggseqlogo:
ggseqlogo::ggseqlogo(motif)
```

## Stacked motif logos

The `motifStack` package allows for a number of different motif stacking visualizations. The `universalmotif` package, while not capable of emulating most of these, still offers basic stacking via `view_motifs()`. The motifs are aligned using `compare_motifs()`.

```{r, fig.height=5, fig.width=5}
library(universalmotif)
library(MotifDb)

motifs <- convert_motifs(MotifDb[50:54])
view_motifs(motifs, show.positions.once = FALSE, names.pos = "right")
```

## Plot arbitrary text logos

The logo plotting capabilities of `view_motifs()` can be used for any kind of arbitrary text logo. All you need is a numeric matrix (the heights of the characters), with the desired characters as row names. The following example is taken from the `view_logo()` documentation.

```{r, fig.height=2.5, fig.width=5}
library(universalmotif)
data(examplemotif)

## Start from a numeric matrix:
toplot <- examplemotif["motif"]

# Adjust the character heights as you wish (negative values are possible):
toplot[4] <- 2
toplot[20] <- -0.5

# Mix and match the number of characters per letter/position:
rownames(toplot)[1] <- "AA"

toplot <- toplot[c(1, 4), ]

toplot

view_logo(toplot)
```

# Higher-order motifs

Though PCM, PPM, PWM, and ICM type motifs are still widely used today, a few 'next generation' motif formats have been proposed. These wish to add another layer of information to motifs: positional interdependence. To illustrate this, consider the following sequences:

 # | Sequence
-- | --------
 1 | CAAAACC
 2 | CAAAACC
 3 | CAAAACC
 4 | CTTTTCC
 5 | CTTTTCC
 6 | CTTTTCC
: (\#tab:seqs2) Example sequences.

This becomes the following PPM:

Position |   1 |   2 |   3 |   4 |   5 |   6 |   7
-------- | --- | --- | --- | --- | --- | --- | ---
       A | 0.0 | 0.5 | 0.5 | 0.5 | 0.5 | 0.0 | 0.0
       C | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0
       G | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0
       T | 0.0 | 0.5 | 0.5 | 0.5 | 0.5 | 0.0 | 0.0
: (\#tab:ppm2) Position Probability Matrix.

Based on the PPM representation, all three of CAAAACC, CTTTTCC, and CTATACC are equally likely. Though looking at the starting sequences, should CTATACC really be considered so? For transcription factor binding sites, this sometimes is not the case. By incorporating this type of information into the motif, it can allow for increased accuracy in motif searching. A few example implementations of this include: TFFM by @tffm, BaMM by @bamm, and KSM by @ksm.

The `universalmotif` package implements its own, rather simplified, version of this concept. Plainly, the standard PPM has been extended to include `k`-letter frequencies, with `k` being any number higher than 1. For example, the 2-letter version of the table \@ref(tab:ppm2) motif would be:

Position |   1 |   2 |   3 |   4 |   5 |   6 
-------- | --- | --- | --- | --- | --- | --- 
      AA | 0.0 | 0.5 | 0.5 | 0.5 | 0.0 | 0.0
      AC | 0.0 | 0.0 | 0.0 | 0.0 | 0.5 | 0.0
      AG | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0
      AT | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0
      CA | 0.5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0
      CC | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0
      CG | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0
      CT | 0.5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0
      GA | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0
      GC | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0
      GG | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0
      GT | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0
      TA | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0
      TC | 0.0 | 0.0 | 0.0 | 0.0 | 0.5 | 0.0
      TG | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0
      TT | 0.0 | 0.5 | 0.5 | 0.5 | 0.0 | 0.0
: (\#tab:multi) 2-letter probability matrix.

This format shows the probability of each letter combined with the probability of the letter in the next position. The seventh column has been dropped, since it is not needed: the information in the sixth column is sufficient, and there is no eighth position to draw 2-letter probabilities from. Now, the probability of getting CTATACC is no longer equal to CTTTTCC and CAAAACC. This information is kept in the `multifreq` slot of `universalmotif` class motifs. To add this information, use the `add_multifreq()` function.

```{r}
library(universalmotif)

motif <- create_motif("CWWWWCC", nsites = 6)
sequences <- DNAStringSet(rep(c("CAAAACC", "CTTTTCC"), 3))
motif.k2 <- add_multifreq(motif, sequences, add.k = 2)

## Alternatively:
# motif.k2 <- create_motif(sequences, add.multifreq = 2)

motif.k2
```

To plot these motifs, use `view_motifs()`:

```{r, fig.height=2.5, fig.width=5}
view_motifs(motif.k2, use.freq = 2)
```

This information is most useful with functions such as `scan_sequences()` and `enrich_motifs()`. Though other tools in the `universalmotif` can work with `multifreq` motifs (such as `motif_pvalue()`, `compare_motifs()`), keep in mind they are not as well supported as regular motifs (getting P-values from `multifreq` motifs is exponentially slower, and P-values from using `compare_motifs()` for `multifreq` motifs are not available by default). See the [sequences](SequenceSearches.pdf) vignette for using `scan_sequences()` with the `multifreq` slot.

# Tidy motif manipulation with the `universalmotif_df` data structure

For those who enjoy using the tidyverse functions for data handling, motifs can additionally represented as the modified `data.frame` format: `universalmotif_df`. This format allows one to modify motif slots for multiples motifs simultaneously using the `universalmotif_df` columns, and then return to a list of motifs afterwards to resume use with `universalmotif` package functions. A few key functions have been provided in relation to this format:

* `to_df()`: Generate a `universalmotif_df` object from a list of motifs.
* `update_motifs()`: After modifying the `universalmotif_df` object, apply these modifications to the actual `universalmotif` objects (contained within the `motif` column).
* `to_list()`: Return to a list of `universalmotif` objects for use with `universalmotif` package functions. Note that it is not required to use `update_motifs()` before using `to_list()`, as modifications will be checked for and applied if found.
* `requires_update()`: Boolean check as to whether the `universalmotif` objects and the `universalmotif_df` columns differ and require either a `update_motifs()` or `to_list()` call to re-sync them.

```{r}
library(universalmotif)
library(MotifDb)

## Obtain a `universalmotif_df` object
motifs <- to_df(MotifDb)
head(motifs)
```

Some tidy manipulation:

```{r}
library(dplyr)

motifs <- motifs %>%
  mutate(bkg = case_when(
    organism == "Athaliana" ~ list(c(A = 0.32, C = 0.18, G = 0.18, T = 0.32)),
    TRUE ~ list(c(A = 0.25, C = 0.25, G = 0.25, T = 0.25))
  ))
head(filter(motifs, organism == "Athaliana"))
```

Feel free to add columns as well. You can add 1d vectors which will be added to the `extrainfo` slots of motifs. (Note that they will be coerced to character vectors!)

```{r}
motifs <- motifs %>%
  mutate(MotifIndex = 1:n())
head(motifs)

to_list(motifs)[[1]]
```

If during the course of your manipulation you've generated temporary columns which you wish to drop, you can set `extrainfo = FALSE` to discard all extra columns. Be careful though, this will discard any previously existing `extrainfo` data as well.

```{r}
to_list(motifs, extrainfo = FALSE)[[1]]
```

# Miscellaneous motif utilities

A number of convenience functions are included for manipulating motifs.

## DNA/RNA/AA consensus functions

For DNA, RNA and AA motifs, the `universalmotif` will automatically generate a `consensus` string slot. Furthermore, `create_motif()` can generate motifs from consensus strings. The internal functions for these have been made available:

* `consensus_to_ppm()`
* `consensus_to_ppmAA()`
* `get_consensus()`
* `get_consensusAA()`

```{r}
library(universalmotif)

get_consensus(c(A = 0.7, C = 0.1, G = 0.1, T = 0.1))

consensus_to_ppm("G")
```

## Filter through lists of motifs

Filter a list of motifs, using the `universalmotif` slots with `filter_motifs()`.

```{r}
library(universalmotif)
library(MotifDb)

## Let us extract all of the Arabidopsis and C. elegans motifs (note that
## conversion from the MotifDb format is final)

motifs <- filter_motifs(MotifDb, organism = c("Athaliana", "Celegans"))

## Only keeping motifs with sufficient information content and length:

motifs <- filter_motifs(motifs, icscore = 10, width = 10)

head(summarise_motifs(motifs))
```

## Generate random motif matches

Get a random set of sequences which are created using the probabilities of the motif matrix, in effect generating motif sites, with `sample_sites()`.

```{r}
library(universalmotif)
data(examplemotif)

sample_sites(examplemotif)
```

## Motif shuffling

Shuffle a set of motifs with `shuffle_motifs()`. The original shuffling implementation is taken from the `linear` shuffling method of `shuffle_sequences()`, described in the [sequences](SequenceSearches.pdf) vignette.

```{r}
library(universalmotif)
library(MotifDb)

motifs <- convert_motifs(MotifDb[1:50])
head(summarise_motifs(motifs))

motifs.shuffled <- shuffle_motifs(motifs, k = 3)
head(summarise_motifs(motifs.shuffled))
```

## Scoring and match functions

Motif matches in a set of sequences are typically obtained using logodds scores. Several functions are exposed to reveal some of the internal work that goes on.

* `get_matches()`: show all possible sequence matches above a certain score
* `get_scores()`: obtain all possible scores from all possible sequence matches
* `motif_score()`: translate score thresholds to logodds scores
* `prob_match()`: return probabilities for sequence matches
* `score_match()`: return logodds scores for sequence matches

```{r}
library(universalmotif)
data(examplemotif)
examplemotif

## Get the min and max possible scores:
motif_score(examplemotif)

## Show matches above a score of 10:
get_matches(examplemotif, 10)

## Get the probability of a match:
prob_match(examplemotif, "TTTTTTT", allow.zero = FALSE)

## Score a specific sequence:
score_match(examplemotif, "TTTTTTT")

## Take a look at the distribution of scores:
plot(density(get_scores(examplemotif), bw = 5))
```

## Type conversion functions

While `convert_type()` will take care of switching the current type for `universalmotif` objects, the individual type conversion functions are also available for personal use. These are:

* `icm_to_ppm()`
* `pcm_to_ppm()`
* `ppm_to_icm()`
* `ppm_to_pcm()`
* `ppm_to_pwm()`
* `pwm_to_ppm()`

These functions take a one dimensional vector. To use these for matrices:

```{r}
library(universalmotif)

m <- create_motif(type = "PCM")["motif"]
m

apply(m, 2, pcm_to_ppm)
```

Additionally, the `position_icscore()` can be used to get the total information content per position:

```{r}
library(universalmotif)

position_icscore(c(0.7, 0.1, 0.1, 0.1))
```

# Session info {.unnumbered}

```{r sessionInfo, echo=FALSE}
sessionInfo()
```

# References {.unnumbered}