%\VignetteIndexEntry{Main vignette:Playing with networks using CNORode}
%\VignetteKeywords{Training Signalling Pathway Maps to
%	Biochemical Data with Logic-Based Ordinary Differential Equations
%}

%\VignettePackage{CNORode}

\documentclass{article}
\usepackage{Sweave,fullpage}
\usepackage{url, color}

%\usepackage{cmap}

\usepackage{authblk}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}

\usepackage{hyperref}

\hypersetup{
  colorlinks, linkcolor=blue
}

\RequirePackage[T1]{fontenc}
\RequirePackage{graphicx,ae,fancyvrb,color}
\IfFileExists{upquote.sty}{\RequirePackage{upquote}}{}
\setkeys{Gin}{width=0.8\textwidth}


\definecolor{gris90}{gray}{0.90}
\definecolor{gris10}{gray}{0.1}
\definecolor{green}{rgb}{0.6, 0.9,0.6}


\DefineVerbatimEnvironment{Sinput}{Verbatim}{%
    fontshape=sl,
    frame=single,
    xleftmargin=2em,
    fillcolor=\color{gris90},
%    numbers=left % prevent copy/paste entire code
    }
\DefineVerbatimEnvironment{Soutput}{Verbatim}{
    fontshape=sl,
    frame=single,
    xleftmargin=2em,
    fillcolor=\color{green}
}


\DefineVerbatimEnvironment{shell}{Verbatim}{formatcom=\color{blue}}

\title{Training Signalling Pathway Maps to
	Biochemical Data with Logic-Based Ordinary Differential Equations
}
\author[1,2]{David Henriques}
\author[1]{Thomas Cokelaer}
\author[3]{Attila Gabor}
\author[3,4]{Enio Gjerga \thanks{@enio.gjerga@gmail.com}}

\affil[1]{European Bioinformatics Institute, Saez-Rodriguez group, Cambridge, United Kingdom}
\affil[2]{Instituto de Investigaciones Marinas-CSIC, Vigo, Spain}
\affil[3]{Heidelberg University, Faculty of Medicine, Institute for Computational Biomedicine, Bioquant}
\affil[4]{RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine (JRC-COMBINE)}

\begin{document}
\maketitle

\tableofcontents

% gg
<<preliminaries, eval=True, results=hide, include=False, echo=FALSE>>=
options(width=70, useFancyQuotes="UTF-8", prompt=" ", continue="  ")
@

\section{Introduction}

Mathematical models are used to understand protein signalling networks so as to
provide an integrative view of pharmacological and toxicological processes at
molecular level. \emph{CellNOptR}~\cite{CellNOptR} is an existing
package (see
\url{http://bioconductor.org/packages/release/bioc/html/CellNOptR.html}) that
provides functionalities to combine
prior knowledge network (about protein signalling networks) and perturbation
data to infer functional characteristics (of the signalling network).
While \emph{CellNOptR} has demonstrated its ability to infer new functional
characteristics, it is based on a boolean formalism where protein species are
characterised as being fully active or inactive. In contrast, logic-based ordinary
differential equations allow a quantitive description of a given Boolean model.

The method used here was first published by Wittmann et al. \cite{wittman} by
the name of \emph{odefy}.
For a detailed description of the methodology the user is adressed to \cite{wittman} and for
a published application example to \cite{wittmanJS}.

This package implements the Odefy method and focus mainly extending the \emph{CellNOptR}
capabilities in order to simulate and calibrate logic-based ordinary differential
equation model. We provide direct and easy to use interface to optimization methods
available in R such as \emph{eSSR} \cite{eSSR1} (enhanced Scatter Search Metaheuristic for R) and an R genetic
algorithm implementation by the name of \emph{genalg} in order to perform parameter estimation.
Additionally we were specially careful in tackling the main computanional bottlenecks
by implementing CNORode simulation engine in the C language using the
\emph{CVODES} library \cite{cvodes}.

This brief tutorial shows how to use \emph{CNORode} using as a starting point a Boolean model
and a dataset consisting in a time-series of several proteins.

\section{Installation}

\emph{CNORode} depends on \emph{CellNOptR} and \emph{genalg}, which are 2 bioconductor
packages. Therefore, in order to install \emph{CNORode}, open a R
session and type:

<<installCNOR, eval=FALSE, pgf=TRUE, echo=TRUE>>=
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("CNORode")
@

It may take a few minutes to install all dependencies if you start from scratch
(i.e, none of the R packages are installed on your system). Note also that under
Linux system, some of these packages necessitate the R-devel package to be
installed (e.g., under Fedora type \emph{sudo yum install R-devel}).

Additionally, for parameter estimation we recommend the use of \emph{eSSR}. This
algorithm  is part of the MEIGOR toolbox which is available on BioConductor and
it can be downloaded from
\url{https://www.bioconductor.org/packages/release/bioc/html/MEIGOR.html}. 
MEIGOR can be installed by typing

<<installMEIGOR, eval=FALSE>>=
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("MEIGOR")
@

Finally, once \emph{CNORode} is installed you can load it by typing:

<<installCNORode2, eval=TRUE>>=
library(CNORode)
@

\label{sec:quickstart}
\section{Quick Start}

In this section, we provide a quick example on how to use \emph{CNORode} to
find the set of continuous parameters which minimize the squared difference
between a model simulation and the experimental data.

Since here we will not be modifying the model structure as opposed to \emph{CellNOptR}
we will use a model that already contains AND type gates. Such model can be for instance
the result of calibrating a \textit{prior knowledge network}(PKN) with \emph{CellNOptR}.
Please note that a PKN can also be used as Boolean model which will contain only OR type gates.

Detailed information about the model used here (ToyModelMMB\_FeedbackAnd) and additional models can be found at:
\url{https://saezlab.github.io/CellNOptR/5_Models%20and%20Documentation/}


\begin{center}
\begin{figure}[ht]
\includegraphics[height=7cm, width=7cm]{ToyModelMMB_FeedbackAnd}
\includegraphics[height=7cm, width=7cm]{data_ToyModelMMB_FeddbackAnd}
\caption{The used model(left panel). A plot from the data, resulting from
the \emph{plotCNOlist} function (right panel).
\label{fig:}}
\end{figure}
\end{center}

The example used here is shipped with the CNORode. In order to load the data and model you
should type the following commands:

% show data and model loading
<<quickstart, eval=TRUE, results=hide>>=
	library(CNORode)
	model=readSIF(system.file("extdata", "ToyModelMMB_FeedbackAnd.sif",
		package="CNORode"));
	cno_data=readMIDAS(system.file("extdata", "ToyModelMMB_FeedbackAnd.csv",
		package="CNORode"));
	cnolist=makeCNOlist(cno_data,subfield=FALSE);
@

The structure from the CNOlist and the Model object is exactly the same as used in the
\emph{CellNOptR} and therefore for a detailed explanation about these structure we direct
the reader to the \emph{CellNOptR} manual.

In order to simulate the model and perform parameter estimation we first need to create
a list with the ODE parameters associated with each dynamic state as described in \cite{wittman}.
Each dynamic state will have a \(\tau\) parameter, as many \(n\) and \(k\) parameters as inputs.
Although the default is to use the normalized Hill function it also possible to use the
standard Hill or even not to use any transfer function at all. To illustrate the shape of the equations
associated to each dynamic state and the meaning of each parameter, let us show the differential
of \textit{Mek}:


\[ \dot{Mek}=\Bigg[  \Bigg( 1 - \frac{Akt^{n_1}/({k_1}^{n_1} + Akt^{n_1})}{1/({k_1}^{n_1} + 1)}\Bigg)
	\cdot \Bigg(\frac{Raf^{n_2}/({k_2}^{n_2} + Raf^{n_2})}{1/({k_2}^{n_2} + 1)}\Bigg) - Mek \Bigg] \cdotp \tau_{Mek}
\]

To create a list of ODE parameters we will typically use the \emph{createLBodeContPars} function:

<<>>=
	ode_parameters=createLBodeContPars(model, LB_n = 1, LB_k = 0.1,
		LB_tau = 0.01, UB_n = 5, UB_k = 0.9, UB_tau = 10, default_n = 3,
		default_k = 0.5, default_tau = 1, opt_n = TRUE, opt_k = TRUE,
		opt_tau = TRUE, random = FALSE)
@

This function creates a general structure where the ODE parameters are ordered according
to the model. Some tweaks have been added in order to ease tasks we have found
to be common, nevertheless you can edit several attributes manually. If you print the
\emph{ode\_parameters}  list you will see the following attributes.

\clearpage
<<>>=
	print(ode_parameters)
@
\clearpage

Typically before running an optimization run you will want to choose which type of parameters
you want to optimize. The field \emph{index\_opt\_pars} defines which parameters are meant to be optimized.
In the \emph{createLBodeContPars}, if you choose \(opt\_tau \) as \(TRUE\)  all \( \tau \) parameters will be
added to the index\_opt\_pars array, the same idea is valid for \( n \) and \( k \) parameters.

It is also possible to choose default values for lower and upper bounds for the parameters of a given type,
e.g. \( \tau \) ( \emph{LB\_tau} and \emph{UB\_tau}), as well as a default initial value for such parameters.

Once we have the ODE parameters structure we are ready to run a simulation or optimization process.
To run a simulation we can use the \emph{getLBodeModel} or \emph{getLBodeDataSim}, depending on
if we want to simulate only the signals present in the CNOlist object or all the species in the model. Additionally
\emph{plotLBodeDataSim} or \emph{plotLBodeModelSim} will also return the values of a model simulation while
plotting the same values. In figure \ref{fig:plotModelSimFig}, we use plotLBodeModelSim to plot all the experiments
sampled in 5 different instants (\emph{timeSignals}).

\begin{figure}[ht]
\begin{center}
<<label=plotModelSim,include=TRUE,fig=TRUE>>=
modelSim=plotLBodeModelSim(cnolist, model, ode_parameters,
 	timeSignals=seq(0,2,0.5));
@
\caption{A model simulation plotted with \emph{plotLBodeModelSim} ..
\label{fig:plotModelSimFig}}
\end{center}
\end{figure}

\clearpage

As previously mentioned, we provide two optimization algorithms that allow parameter estimation
Both of these algorithms have specific parameters that can be tunned on each specific problem
(please check CNORode manual for detailed information). For instance, in order to run the
genetic algorithm for 10 iterations and a population of size of 10, we can use the following code:

<<eval=TRUE, results=hide>>=
initial_pars=createLBodeContPars(model, LB_n = 1, LB_k = 0.1,
	LB_tau = 0.01, UB_n = 5, UB_k = 0.9, UB_tau = 10, random = TRUE)
#Visualize initial solution
simulatedData=plotLBodeFitness(cnolist, model,initial_pars)
paramsGA = defaultParametersGA()
paramsGA$maxStepSize = 1
paramsGA$popSize = 50
paramsGA$iter = 100
paramsGA$transfer_function = 2

opt_pars=parEstimationLBode(cnolist,model,ode_parameters=initial_pars,
	paramsGA=paramsGA)
#Visualize fitted solution
simulatedData=plotLBodeFitness(cnolist, model,ode_parameters=opt_pars)
@

Model optimisation using eSS:
<<eval=TRUE, results=hide>>=

requireNamespace("MEIGOR")


initial_pars=createLBodeContPars(model,
	LB_n = 1, LB_k = 0.1, LB_tau = 0.01, UB_n = 5,
	UB_k = 0.9, UB_tau = 10, random = TRUE)
#Visualize initial solution

fit_result_ess = 
	parEstimationLBodeSSm(cnolist = cnolist,
						  model = model,
						  ode_parameters = initial_pars,
						  maxeval = 1e5,
						  maxtime = 20,
						  local_solver = "DHC",
						  transfer_function = 3
	)
#Visualize fitted solution
# simulatedData=plotLBodeFitness(cnolist, model,ode_parameters=fit_result_ess)
@


\begin{center}
\begin{figure}[ht]

<<label=plotInit,include=TRUE,fig=TRUE>>=
	simulatedData=plotLBodeFitness(cnolist, model,
								   initial_pars,
								   transfer_function = 3)
@
\caption{The initial solution before optimization. Each row corresponds to an experiment with a particular combination of stimuli and inhibitors.
The columns correspond to the measured values (triangles) and the simulated values (dashed blue lines) from a given signal. The background color
gives an indication of squared difference where red means high error and white low error.}
\label{fig:plotInitFit}

\end{figure}
\end{center}


\begin{figure}[ht]
\begin{center}
<<label=plotFinalFit_fit,include=TRUE,fig=TRUE>>=
	simulatedData=plotLBodeFitness(cnolist, model,
								   ode_parameters=fit_result_ess,
								   transfer_function = 3)
@
\end{center}
\caption{A solution obtained by optimization with a genetic algorithm. Each row corresponds to an experiment with a particular combination of stimuli and inhibitors.
The columns correspond to the measured values (triangles) and the simulated values (dashed blue lines) from a given signal. The background color
gives an indication of squared difference where red means high error and white low error.}
\label{fig:plotFinalFit}
\end{figure}


\clearpage

In addition to eSSR and genalg its is fairly easy to use any other continuous optimization algorithm.
In the following example we show how to generate and use an the objective function in order to use
 it with a variant of eSSR(part of MEIGOR package) that uses multiple cpus:

<<eval=FALSE>>=
library(MEIGOR)
f_hepato<-getLBodeContObjFunction(cnolist, model, initial_pars, indices=NULL,
 time = 1, verbose = 0, transfer_function = 2, reltol = 1e-05, atol = 1e-03,
maxStepSize = Inf, maxNumSteps = 1e4, maxErrTestsFails = 50, nan_fac = 1)
n_pars=length(initial_pars$LB);

problem<-list(f=f_hepato, x_L=initial_pars$LB[initial_pars$index_opt_pars],
	x_U=initial_pars$UB[initial_pars$index_opt_pars]);

#Source a function containing the options used in the CeSSR publication
 source(system.file("benchmarks","get_paper_settings.R",package="MEIGOR"))
#Set max time as 20 seconds per iteration
opts<-get_paper_settings(20);
Results<-CeSSR(problem,opts,Inf,Inf,3,TRUE,global_save_list=c('cnolist','model',
'initial_pars'))
@

\section{Crossvalidation}

CNORode offers the possibility to perform a k-fold cross-validation for logic-ode models in order to assess the predictive performance of our models. In k-iterations a  fraction of the data is eliminated from the CNOlist. The model is trained on the  remaining data and then the model predicts the held-out data. Then the prediction accuracy is reported for each iteration. Three different re-sampling strategies about how we can split the training and the test set: \emph{1)}Re-sampling of the data-points, \emph{2)}Re-sampling of the experimental conditions and \emph{3)}Resampling of the observable nodes.

In the example below, we show an example about how we can apply the cross-validation analysis over a small toy case-study from Macnamara et al. 2012.

<<eval=FALSE>>=
library(CellNOptR)
library(CNORode)
library(MEIGOR)

# MacNamara et al. 2012 case study:
data(PKN_ToyPB, package="CNORode")
data(CNOlist_ToyPB, package="CNORode")

# original and preprocessed network 
plotModel(pknmodel, cnodata)
model = preprocessing(data = cnodata, model = pknmodel,
                      compression = T, expansion = T)
plotModel(model, cnodata)
plotCNOlist(CNOlist = cnodata)

# set initial parameters 
ode_parameters=createLBodeContPars(model, LB_n = 1, LB_k = 0,
                                   LB_tau = 0, UB_n = 4, UB_k = 1, 
                                   UB_tau = 1, default_n = 3, default_k = 0.5, 
                                   default_tau = 0.01, opt_n = FALSE, opt_k = TRUE,
                                   opt_tau = TRUE, random = TRUE)

## Parameter Optimization
# essm
paramsSSm=defaultParametersSSm()
paramsSSm$local_solver = "DHC"
paramsSSm$maxtime = 600;
paramsSSm$maxeval = Inf;
paramsSSm$atol=1e-6;
paramsSSm$reltol=1e-6;
paramsSSm$nan_fac=0;
paramsSSm$dim_refset=30;
paramsSSm$n_diverse=1000;
paramsSSm$maxStepSize=Inf;
paramsSSm$maxNumSteps=10000;
transferFun=4;
paramsSSm$transfer_function = transferFun;

paramsSSm$lambda_tau=0
paramsSSm$lambda_k=0
paramsSSm$bootstrap=F
paramsSSm$SSpenalty_fac=0
paramsSSm$SScontrolPenalty_fac=0

# run the optimisation algorithm
opt_pars=parEstimationLBode(cnodata,model, method="essm", 
                            ode_parameters=ode_parameters, paramsSSm=paramsSSm)
plotLBodeFitness(cnolist = cnodata, model = model, 
                 ode_parameters = opt_pars, transfer_function = 4)

# 10-fold crossvalidation using T1 data
# We use only T1 data for crossvalidation, because data 
# in the T0 matrix is not independent.
# All rows of data in T0 describes the basal condition.

# Crossvalidation produce some text in the command window:  
library(doParallel)
registerDoParallel(cores=3)
R=crossvalidateODE(CNOlist = cnodata, model = model, 
                   type="datapoint", nfolds=3, parallel = TRUE, 
                   ode_parameters = ode_parameters, paramsSSm = paramsSSm)
@

For more, please information about the \emph{crossvalidateODE} function, please check its documentation.

\begin{thebibliography}{}

\bibitem{CellNOptR}
C.~Terfve.
\newblock CellNOptR: R version of CellNOpt, boolean features only.
\newblock {R package version 1.2.0, (2012)
http://www.bioconductor.org/packages/release/bioc/html/CellNOptR.html}

\bibitem{alexopoulos_networks_2010}
L.G.~Alexopoulos, J.~Saez-Rodriguez, B.D.~Cosgrove, D.A.~Lauffenburger,
P.K~Sorger.: Networks inferred from biochemical data reveal profound
  differences in toll-like receptor and inflammatory signaling between normal
  and transformed hepatocytes.
\newblock Molecular \& Cellular Proteomics: {MCP} \textbf{9}(9), 1849--1865
  (2010).

\bibitem{MMB}
M.K.~Morris, I.~Melas, J.~Saez-Rodriguez.
\newblock Construction of cell type-specific logic models of signalling networks using CellNetOptimizer.
\newblock {\em Methods in Molecular Biology: Computational Toxicology}, Ed. B. Reisfeld and A. Mayeno, Humana Press.

\bibitem{fuzzy2011}
M.K.~Morris, J.~Saez-Rodriguez, D.C.~Clarke, P.K.~Sorger, D.A.~Lauffenburger.
\newblock Training Signaling Pathway Maps to Biochemical Data with Constrain
ed Fuzzy Logic: Quantitative Analysis of Liver Cell Responses to Inflammatory
Stimuli.
\newblock{\em PLoS Comput Biol.} 7(3) (2011) : e1001099.

\bibitem{julio2009}
J.~Saez-Rodriguez, L.~Alexopoulos, J.~Epperlein, R.~Samaga, D.~Lauffenburger, S.~Klamt and P.K.~Sorger.
\newblock Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction.
\newblock {\em Molecular Systems Biology}, 5:331, 2009.

\bibitem{wittmanJS}
Dominik Wittmann, Jan Krumsiek, Julio S. Rodriguez, Douglas
Lauffenburger, Steffen Klamt, and Fabian Theis.
\newblock Transforming boolean models to continuous models: methodology and application to t-cell
receptor signaling.
\newblock BMC Systems Biology, 3(1):98+, September 2009.

\bibitem{eSSR1}
Egea, J.A., Maria, R., Banga, J.R. (2010)
\newblock An evolutionary method for complex-process optimization.
\newblock Computers \& Operations Research 37(2):315\-324.

\bibitem{eSSR2}
Egea, J.A., Balsa-Canto, E., Garcia, M.S.G., Banga, J.R. (2009)
\newblock Dynamic optimization of nonlinear processes with an enhanced scatter search
method.
\newblock Industrial \& Engineering Chemistry Research 49(9): 4388\-4401.

\bibitem{wittman}
Jan Krumsiek, Sebastian Polsterl, Dominik Wittmann, and Fabian Theis.
\newblock Odefy - from discrete to continuous models.
\newblock BMC Bioinformatics, 11(1):233+, 2010.

\bibitem{cvodes}
R. Serban and A. C. Hindmarsh
\newblock"CVODES: the Sensitivity\-Enabled ODE Solver in SUNDIALS,"
\newblock Proceedings of IDETC/CIE 2005, Sept. 2005, Long Beach, CA. Also available as LLNL
technical report UCRL\-JP\-200039.

\bibitem{macnamara_2012}
A. MacNamara, C. Terfve, D. Henriques, B.P. Bernabe, and J. Saez-Rodriguez.
\newblock State-time spectrum of signal transduction logic models.
\newblock Phys Biol., 9(4):045003, 2012.

\end{thebibliography}


\end{document}