---
title: "Calculating Number of Iterations Required to Reach Steady-State"
author: "Selcen Ari"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
   toc: true
vignette: >
  %\VignetteIndexEntry{A Suggestion: How to Find the Appropriate Iteration for Simulation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# 1. Introduction

In the `ceRNAnetsim` package, regulations of miRNA:target pairs are observed via direct or indirect interactions of elements in network. In this approach, change in expression level of single gene or miRNA can affect the whole network via "ripple effect". So, when the change is applied the system, it affects to primary neighborhood firstly, and then propagates to further neighborhoods.

In the simple interaction network like *minsamp*, the ripple effect could be observed when expression level of *Gene4* changes and subsequently effecting other genes. In the non-complex networks like *minsamp*, the steady-state condition can be provided easily, after network disturbed.

In this vignette, first, we demonstrate a suggestion to determine simulation iteration of existing dataset for gaining steady state after perturbing the network. Additionally, new approach which is useful for defining significant of nodes in terms of perturbation of network is elucidated.

```{r message= FALSE, warning=FALSE}
library(ceRNAnetsim)
library(png)
```

# 2. Installation

```{r, message=FALSE, warning= FALSE, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("ceRNAnetsim")
```

# 3. Comparison of gaining steady-state durations of `middle` and `minimal` datasets

```{r, fig.height=5, fig.width=6, warning=FALSE, message=FALSE, fig.show='hide'}
data("minsamp")

minsamp %>%
  priming_graph(competing_count = Competing_expression, 
                miRNA_count = miRNA_expression) %>%
  update_how("Gene4",2) %>%
  simulate_vis(title = "Minsamp: Common element as trigger", cycle = 15)


minsamp %>%
  priming_graph(competing_count = Competing_expression, 
                miRNA_count = miRNA_expression) %>%
  update_how("Gene4",2) %>%
  simulate(cycle = 5)
```

![Minsamp common target perturbation](gifs/con_iter_chunk3.gif#center)

For example, in *minsamp* dataset, the steady-state is occurred at iteration-14 (as seen above: after iteration-13, it is observed that there are only orange (miRNAs) and green (competing genes) nodes in network. In this case, genes have new regulated (steady) expression values while expression values of microRNAs are same in comparison with initial case.).

However, when network is larger and interactions are more complex, the number of iterations required to reach steady-state may increase. While at cycle 14 *minsamp* dataset has reached steady-state, the *midsamp* (middle sized sample) dataset has not reached steady-state after 15 cycles. In the example below, in *midsamp* data, *Gene17* is upregulated 2 fold as a trigger and simulation is run for 15 cycles.

```{r}
data("midsamp")

midsamp
```

```{r, fig.height=5, fig.width=6, warning=FALSE, message=FALSE, fig.show='hide'}
midsamp %>%
  priming_graph(Gene_expression, miRNA_expression) %>%
  update_how("Gene17",2) %>%
  simulate_vis(title = "Midsamp: Gene with higher degree as trigger", 15)
```

![Midsamp Gene17 perturbation followed with 15 iterations](gifs/con_iter_chunk4.gif)

## 3.1. Suggestion for simulation iteration

Guessing or performing trial and error for large networks is not practical, thus we developed a function which calculates optimal iteration in a network after trigger and simulation steps. `find_iteration()` function analyses the simulated graph and suggests the iteration at which maximum number of nodes are affected. An important argument is `limit` which sets the threshold below which is considered "no change", in other words, any node should have level of change greater than the threshold to be considered "changed". Please be aware that small threshold values will cause excessively long calculation time especially in large networks.

In the example below, *Gene2* is upregulated 2-fold and then iteration number at which maximum number of nodes affected will be calculated. The search for iteration number will go up to 50. Also, since we are searching for maximal propagation, limit is set to zero.

```{r}
midsamp %>%
  priming_graph(Gene_expression, miRNA_expression) %>%
  update_how("Gene2",2) %>%
  simulate(50) %>% 
  find_iteration(limit=0)

```

NOTE: You can edit the dataset manually. You can change Gene2 expression value as 20000 and save that as a new dataset (midsamp_new_counts).

You can use the dataset that includes new expression values of miRNAs and genes.

```{r}
data("midsamp_new_counts")

midsamp %>%
  priming_graph(Gene_expression, miRNA_expression) %>%
  update_variables(current_counts = midsamp_new_counts) %>%
  simulate(50) %>% 
  find_iteration(limit=0)
```

`find_iteration()` function will return a single number: the iteration number at which maximum propagation is achieved. If `plot=TRUE` argument is used then the function will return a plot which calculates percent of perturbed nodes for each iteration number. The latter can be used for picking appropriate number of cycles for `simulate()` function.

```{r, fig.align='center', fig.width=5, fig.height=3, dpi= 120}
midsamp %>%
  priming_graph(Gene_expression, miRNA_expression) %>%
  update_how("Gene17",2) %>%
  simulate(50) %>% 
  find_iteration(limit=0)

```

## 3.2. Find appropriate iteration number with `find_iteration` and then simulate accordingly

As shown in plot above, if "Gene17" is upregulated 2-fold, the network will need around 22 iterations to reach the steady-state. Since we have an idea about appropriate iteration number, let's use `simulate()` function and iterate for 25 cycles using same trigger (Gene17 2-fold):

```{r, fig.height=5, fig.width=6, warning=FALSE, message=FALSE, fig.show='hide'}
midsamp %>%
  priming_graph(Gene_expression, miRNA_expression) %>%
  update_how("Gene17", 2) %>%
  simulate_vis(title = "Midsamp: Gene17 2 fold increase, 25 cycles", 25)
```

![Midsamp Gene17 perturbation with 25 iteration](gifs/con_iter_chunk7.gif#center)

Note: If you ignore decimal change in gene expression, `threshold` argument can be used. With this method, system reaches steady-state early.

```{r, fig.align='center', fig.height=5, fig.width=6, warning=FALSE, message=FALSE, fig.show='hide'}

midsamp %>%
  priming_graph(Gene_expression, miRNA_expression) %>%
  update_how("Gene17", 2) %>%
  simulate_vis(title = "Midsamp: Gene17 2 fold increase, 6 cycles", 6, threshold = 1)

```

![Midsamp Gene17 perturbation with 6 iteration with threshold](gifs/con_iter_chunk8.gif#center)

The workflow that is aforementioned in this vignette should be considered as suggestion. Because the `cycle` is a critical argument that is used with `simulate()` function and affects results of analysis. In light of this vignette and functions, the approach can be developed according to dataset.


# 4. What is perturbation efficiency?

The perturbation efficiency means that the disturbance and propagation efficiency of an element in the network. In a given network not all nodes have same or similar perturbation efficiency. Changes in some nodes might propagate to whole network and for some nodes the effect might be limited to small subgraph of the network. Not only topology but also miRNA:target interaction dynamics determine perturbation efficiency.

- Expression level and type of trigger element plays a crucial role. The trigger element can be an miRNA or competing target. The perturbation efficiency is affected from ratio of miRNA amount to sum of expression levels of its targets. Also, amount of competing element among whole competing elements is important since it determines distribution of miRNA. For example, if trigger is an miRNA with expression level of 1500 and if sum of expression levels of its targets is 1000000, then this miRNA will not perturb its neighborhood efficiently. So, miRNA:target ratio is important for regulation of interaction network.

- In a biological system, the miRNA:target interactions does not depend solely on stoichiometry, unfortunately. miRNAs affect the targets via degradation or inhibition after the binding. The experimental studies have shown that the features of miRNA:target interactions determine the binding and degradation efficiency. For example, binding energy between miRNA and target and seed structure of miRNA determine the binding efficiency of complex. In addition, the binding region on the target affects the degradation of target. 

- On the other hand, interaction factors affecting binding and degradation of miRNA to its targets also have impact one efficiency of perturbation of the change. For instance, if a node has very low binding affinity to targeting miRNA, change in expression level of that node will cause weak or no perturbation.

Thus, we developed functions which can calculate perturbation efficiency of a given node or all nodes. `calc_perturbation()` function calculates perturbation efficiency for given trigger (*e.g.* Gene17 2-fold). `find_node_perturbation()` function screens the whole network and calculate perturbation efficiency of all nodes.

## 4.1. How does the `calc_perturbation()` work?

This function works for a given node from network. It calculates and returns two values:

* **perturbation efficiency** : mean of percentage of expression changes of all elements except trigger in the network 
* **perturbed count** : count of disturbed nodes for given iteration number and limit (the minimum change in expression level needed to be considered "disturbed" or "changed"). 

In the example below, "Gene17" is up-regulated 3-fold in *midsamp* dataset where `Energy` and `seeds` columns are used for calculating affinity effect and `targeting_region` columns is used for calculating degradation effect. The network will be iterated over 30 times and number of disturbed nodes (as taking into account nodes that have changed more than the value of the threshold (0.1 percentage in terms of the change)) will be counted.


```{r}
midsamp %>%
  priming_graph(competing_count = Gene_expression, 
                miRNA_count = miRNA_expression,
                aff_factor = c(Energy,seeds), 
                deg_factor = targeting_region) %>%
  calc_perturbation("Gene17", 3, cycle = 30, limit = 0.1)
```

If you are interested in testing various fold change values of a given node, then we can use `map` (actually parallelized version `future_map`) to run function for set of input values.

First, let's keep the primed version of graph in an object

```{r}
primed_mid <- midsamp %>%
  priming_graph(competing_count = Gene_expression, 
                miRNA_count = miRNA_expression,
                aff_factor = c(Energy,seeds), 
                deg_factor = targeting_region)
```

Now, let's calculate perturbation efficiency caused by 2-fold to 10-fold increase in Gene17

```{r, eval=FALSE}
# for parallel processing
# future::plan(multiprocess) 

seq(2,10) %>%
  rlang::set_names() %>% 
  furrr::future_map_dfr(~ primed_mid %>% calc_perturbation("Gene17", .x, cycle = 30, limit = 0.1), .id="fold_change")
```

If you're interested in screening nodes instead of fold changes then you don't have to write a complicated `map` command, there's already a function available for that purpose.

## 4.2. A Short-cut: Finding perturbation efficiencies for whole nodes of network

The `find_node_perturbation()` function calculates the perturbation efficiency and perturbed node count of each node in network. 

In the example below, each node is increased 2-fold and tested for perturbation efficiency for 4 cycles with threshold of 0.1

```{r, warning=FALSE}
midsamp %>%
  priming_graph(competing_count = Gene_expression,
                miRNA_count = miRNA_expression,
                aff_factor = c(Energy,seeds),
                deg_factor = targeting_region) %>%
  find_node_perturbation(how = 3, cycle = 4, limit = 0.1)%>%
  select(name, perturbation_efficiency, perturbed_count)
```

On the other hand, some of nodes in network might not be affected from perturbation because of low expression or weak interaction factors. In this case, `fast` argument can be used. Argument `fast` calculate affected expression percent of the targets and perturbation calculation is not ran for these elements in network, if that percentage value is smaller than given `fast` value.

```{r}
midsamp %>%
  priming_graph(competing_count = Gene_expression,
                miRNA_count = miRNA_expression,
                aff_factor = c(Energy,seeds),
                deg_factor = targeting_region) %>%
  find_node_perturbation(how = 3, cycle = 4, limit = 0.1, fast=5)%>%
  select(name, perturbation_efficiency, perturbed_count)
```

The results of the `find_node_perturbation()` will list effectiveness or importance of nodes in the network. This function can help selecting nodes which will have effective perturbation in network.

# 5. Session Info
```{r sessioninfo}
sessionInfo()
```