---
title: "The UPDhmm's User Guide"
author:
- name: Marta Sevilla Porras
  affiliation:
  - Universitat Pompeu Fabra (UPF)
  - Centro de Investigación Biomédica en Red (CIBERER)
  email: marta.sevilla@upf.edu
- name: Carlos Ruiz Arenas
  affiliation:
  - Universidad de Navarra (UNAV)
  email: cruizarenas@unav.es
date: "`r Sys.Date()`"
package: "`r BiocStyle::pkg_ver('UPDhmm')`"
abstract: > 
  This vignette provides an introductory guide to the UPDhmm package, 
  developed for detecting uniparental disomies (UPDs) events on genetic 
  data from trio experiments (i.e. involving a father, mother, and proband). 
  UPDs are a genetic condition in which an individual inherits both copies 
  from the same parent. UPDs can consist on a whole chromosome or a segment.
  These events can result in disease by leading to homozygosity or specific 
  alleles, potentially manifesting recessive traits or contributing to genetic
  disorders through errors associated with genetic imprinting. The key topics 
  covered in this document include: 
  
  (1) Package installation 
  (2) Data loading and pre-processing 
  (3) Identification of UPDs
output: 
  BiocStyle::html_document:
  number_sections: true
  toc: yes
  fig_caption: yes
  toc_float: true
vignette: >
  %\VignetteIndexEntry{Detection of UPDs in HTS data}
  %\VignetteEngine{knitr::knitr}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
    comment = "",
    warning = FALSE,
    message = FALSE,
    cache = FALSE
)
```

# Introduction

## Background

Uniparental disomy (UPD) is a rare genetic condition where an individual
inherits both copies of a chromosome from one parent, as opposed to the typical
inheritance of one copy from each parent. The extent of its contribution as a 
causative factor in rare genetic diseases remains largely unexplored. UPDs can 
lead to disease either by inheriting a carrier pathogenic mutation as homozygous
from a carrier parent or by causing errors in genetic imprinting. Nevertheless,
there are currently no standardized methods available for the detection and 
characterization of these events.

We have developed `UPDhmm` R/Bioconductor package. The package provides
a tool method to detect, classify and stablish the location
of uniparental disomy events.


## Method overview

"`UPDhmm` relies on a Hidden Markov Model (HMM) to identify regions with UPD.  
In our HMM model, observations are the combination of the genotypes from the 
father, mother and proband for every genomic variant in the input data. The 
hidden states of the model represent five inheritance patterns: normal 
(mendelian inheritance), maternal isodisomy, paternal isodisomy, maternal 
heterodisomy and paternal heterodisomy. Emission probabilities were defined 
based on the inheritance patterns.
Viterbi algorithm was employed to infer the most likely combination of hidden 
states underlying a sequence of observations, thus defining the most likely 
inheritance pattern for each genetic variant. `UPDhmm` reports segments in the 
genome having an inheritance pattern different than normal, and thus, likely 
harbouring a UPD."


# Set-up the packages

```{r installation, eval=FALSE, include=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("UPDhmm")
BiocManager::install("regioneR")
BiocManager::install("karyoploteR")
```

## Loading libraries

```{r loading libraries, message = FALSE}
library(UPDhmm)
library(dplyr)
```
## Quick start

The input of the package is a multisample vcf file with the following 
requirements:

- Only biallelic sites are only allowed
- GT format allowed are: 0/1, 1/0, 0/1 and 1/1 or 0|1, 1|0, 0|1
and 1|1


# Datasets {#datasets}

The UPDhmm package includes one example dataset, adapted from
[GIB](https://www.nist.gov/programs-projects/genome-bottle)
This dataset serves as a practical illustration and can be utilized for testing
and familiarizing users with the functionality of the package.

# Preprocessing

After reading the VCF file, the `vcfCheck()` function is employed for 
preprocessing the input data. This function facilitates reading the VCF in the
suitable format for the UPDhmm package. 


```{r read Vcf,echo=TRUE}
library(UPDhmm)
library(VariantAnnotation)
file <- system.file(package = "UPDhmm", "extdata", "test_het_mat.vcf.gz")
vcf <- readVcf(file)
processedVcf <- vcfCheck(
    vcf,
    proband = "NA19675", mother = "NA19678", father = "NA19679"
)
```

# Uniparental disomy detection


The principal function of `UPDhmm` package, `calculateEvents()`, is the central
function for identifying Uniparental Disomy (UPD) events. It takes the output
from the previous `vcfCheck()` function and systematically analyzes genomic
data, splitting the VCF into chromosomes and applying the Viterbi algorithm.


`calculateEvents()` function encompasses a serie of subprocesses for
identifying Uniparental disomies (UPDs):

(1) Split vcf into chromosomes
(2) Apply Viterbi algorithm to every chromosome
(3) Transform the inferred hidden states to coordinates `data.frame`
(4) Create blocks of contiguous variants with the same state
(5) Calculate the statistics (log-likelihood and p-value).



```{r calculateEvents function, echo=TRUE}
updEvents <- calculateEvents(largeCollapsedVcf = processedVcf)
head(updEvents)
```

## Results description

The `calculateEvents` function returns a `data.frame` containing all 
detected events in the provided trio. If no events are found, the function 
will return an empty data.frame.

| Column name   | Description          |
|---------------------|----------------------------------------------------|
| `seqnames`    | Chromosome           | 
|  `start`    | Start position of the block      |
|  `end`    | End position of the block        |
| `n_snps`    | Number of variants within the event    |
| `group`     | Predicted state          |
| `log_likelihood`    | log likelihood ratio         |
| `p_value`     | p-value            |
| `n_mendelian_error`   | Count of Mendelian errors (genotypic combinations that are inconsistent with Mendelian inheritance principles) |


# Results Visualization
To visualize the results, the `karyoploteR` package can be used. Here, 
a custom function is provided to facilitate easy implementation with the 
output format results. This function enables the visualization of the detected 
blocks by the `calculateEvents` function.

In this example, we can observe how the package detects the simulated event on 
chromosome 3, as well as the specific type of event. In the plot, the autosomes
chromosomes are represented, and we can see that the entire q arm of chromosome
3 is colored. With the legend, we can identify that the event is a maternal 
heterodisomy.


```{r echo=TRUE}
library(karyoploteR)
library(regioneR)
plotUPDKp <- function(updEvents) {
    updEvents$seqnames <- paste0("chr", updEvents$seqnames)
    het_fat <- toGRanges(subset(
        updEvents,
        group == "het_fat"
    )[, c("seqnames", "start", "end")])
    het_mat <- toGRanges(subset(
        updEvents,
        group == "het_mat"
    )[, c("seqnames", "start", "end")])
    iso_fat <- toGRanges(subset(
        updEvents,
        group == "iso_fat"
    )[, c("seqnames", "start", "end")])
    iso_mat <- toGRanges(subset(
        updEvents,
        group == "iso_mat"
    )[, c("seqnames", "start", "end")])

    kp <- plotKaryotype(genome = "hg19")
    kpPlotRegions(kp, het_fat, col = "#AAF593")
    kpPlotRegions(kp, het_mat, col = "#FFB6C1")
    kpPlotRegions(kp, iso_fat, col = "#A6E5FC")
    kpPlotRegions(kp, iso_mat, col = "#E9B864")

    colors <- c("#AAF593", "#FFB6C1", "#A6E5FC", "#E9B864")
    legend("topright",
        legend = c("Het_Fat", "Het_Mat", "Iso_Fat", "Iso_Mat"),
        fill = colors
    )
}

plotUPDKp(updEvents)
```

# Session Info

```{r}
sessionInfo()
```

# References