%
% NOTE -- ONLY EDIT THE .Rnw FILE!!!  The .tex file is
% likely to be overwritten.
%
%\VignetteIndexEntry{bioDist Introduction}
%\VignetteKeywords{Distances}
%\VignettePackage{bioDist}
\documentclass[12pt]{article}

\usepackage{amsmath}
\usepackage[authoryear,round]{natbib}
\usepackage{hyperref}

\textwidth=6.2in
\textheight=8.5in
%\parskip=.3cm
\oddsidemargin=.1in
\evensidemargin=.1in
\headheight=-.3in

\newcommand{\scscst}{\scriptscriptstyle}
\newcommand{\scst}{\scriptstyle}

\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}

\bibliographystyle{plainnat}

\begin{document}

\title{bioDist Introduction}
\maketitle

\section*{Introduction}

The \Rpackage{bioDist} package contains some distance functions that
have been shown to be useful in a number of different biological or
bioinformatic problems. The return values are typically instances of
the S3 class \Rclass{dist}. 

\section{Data}

We will use the \Robject{sample.ExpressionSet} object from the
\Rpackage{Biobase} package as our data. The \Rpackage{bioDist}
functions are in some ways extensions of the distance functions
available via the \Rfunction{dist} function in R, and hence they
compute pairwise distances between the rows of the input. For an
expression matrix, this will correspond to the genes or features
on the array. Since we are generally more interested in distances
between samples, we will transpose the data in this demonstration.

<<loadLib>>=
library("bioDist")
data(sample.ExpressionSet)
exData = t(exprs(sample.ExpressionSet))
@

\section{Distance Measures}

The two most used distance measures in the \Rpackage{bioDist} package
are MI and KLD. These measures focus on very different distributional
aspects of the data. MI is large when the joint distribution is quite
different from the product of the marginals, while KLD measures how
much the shape of one distribution resembles that of the other.

MI can be considered as a multivariate measure of association, and if
the transformation
\begin{equation}
\label{eq:distJOE}
\delta^* = [ 1 - \exp(-2 MI)]^{1/2}
\end{equation}
is used, then $\delta^*$ takes values in the interval $[0,1]$ and can
be interpreted as a a generalization of the correlation. We will make
the further transformation to $1 - \delta^*$ so that this measure has
the same interpretation as other correlation-based distance measures.

There are two functions for computing mutual information distance
measures: \Rfunction{mutualInfo} that computes the distance from
independence and \Rfunction{MIdist} that computes the transformation
in Equation~(\ref{eq:distJOE}). We note that the computations are not
terribly fast, and computing these distances on very large data sets
is time consuming.

<<R>>=
s1 = MIdist(exData)
s2 = as.matrix(s1)
dim(s2)
r1 = mutualInfo(exData)
@

For KL distances, there is one implementation that uses binning,
\Rfunction{KLdist.matrix}, and one that uses density estimation
followed by numerical integration, \Rfunction{KLD.matrix}.

<<KLD>>=
kl1 = KLdist.matrix(exData)
kl2 = KLD.matrix(exData, method="density", supp=range(exData))
@ 

The \Rpackage{bioDist} package also provides implementations of
distances based on two other measures of correlation: Kendall's tau
and Pearson's rho. In the examples below we will measure distance
between genes, not between samples as was done in the first few examples.
We will also restrict our analysis to the last 100 genes in the sample
data in order to keep computing times low.

<<Tau>>=
eS = sample.ExpressionSet[401:500,]
tauD = tau.dist(eS, sample=FALSE)
sp = spearman.dist(eS, sample=FALSE)
@ 

To find a specified number of nearest neighbors, we will use a simple helper
function called \Rfunction{closest.top}.

<<closest>>=
f1 = featureNames(eS)[1]
closest.top(f1, sp, 3)
@ 

\section{Session Information}

The version number of R and packages loaded for generating the vignette were:

\begin{verbatim}
<<echo=FALSE,results=tex>>=
sessionInfo()
@

\end{verbatim}

\end{document}