% \VignetteEngine{knitr::knitr}
% \VignetteIndexEntry{ccrepe}

\documentclass{article}

%% -------------------------
%% Pre-amble:
%% -------------------------

\usepackage[nonamebreak,square]{natbib}

<<style, eval=TRUE, echo=FALSE, results="asis">>=
BiocStyle::latex(width=78, use.unsrturl=FALSE)
@

<<echo=FALSE>>=
library(ccrepe)
@

\title{CCREPE: Compositionality Corrected by Permutation and Renormalization}
\author{Emma Schwager, George Weingart, Craig Bielski, Curtis Huttenhower}

%% -------------------------
%% End pre-amble
%% -------------------------

\begin{document}

\maketitle

\tableofcontents

\section{Introduction}

\Biocpkg{ccrepe} is a package for analysis of sparse compositional data.  Specifically, it determines the significance of association between features in a composition, using any similarity measure (e.g. Pearson correlation, Spearman correlation, etc.)  The CCREPE methodology stands for Compositionality Corrected by Renormalization and Permutation, as detailed below.  The package also provides a novel similarity measure, the N-dimensional checkerboard score (NC-score), particularly appropriate to compositions derived from microbial community sequencing data.  This results in p-values and false discovery rate q-values corrected for the effects of compositionality.  The package contains two functions \Rfunction{ccrepe} and \Rfunction{nc.score} and is maintained by the Huttenhower lab (\email{ccrepe-users@googlegroups.com}).

%---------------------------------------------------------
\section{ccrepe} 
%---------------------------------------------------------

\Rfunction{ccrepe} is the main package function.  It calculates compositionality-corrected p-values and q-values for a user-selected similarity measure, operating on either one or two input matrices.  If given one matrix, all features (columns) in the matrix are compared to each other using the selected similarity measure.  If given two matrices, each feature in the first are compared against all features in the second.

\subsection{General functionality}

Compositional data induces spurious correlations between features due to the nonindependence of values that must sum to a fixed total.  CCREPE abrogates this when determining the significance of a similarity measure for each feature pair using two main steps, permutation/renormalization and bootstrapping.  First, given two features to compare, CCREPE generates a null distribution of the similarity expected just due to compositionality by iteratively permuting one feature, renormalizing all samples in the composition to their previous sum, and computing the resulting similarity measures.  Second, CCREPE bootstraps over sample subsets in order to assess confidence in the "true" similarity measure.  Finally, the two resulting distributions are compared using a pooled-variance Z-test to give a compositionality-corrected p-value.  False discovery rate q-values are additionally calculated using the Benjamin-Hochberg-Yekutieli procedure.  For greater detail, see \citet{faust2012microbial} and \citet{schwager2012unpublished}.

\subsection{Arguments}

\begin{description}
\setlength{\itemsep}{1em}

\item[\Rcode{x}]
First \Rclass{dataframe} or \Rclass{matrix} containing relative abundances.  Columns are features, rows are samples.  Rows should therefore sum to a constant.  Row names are used for identification if present.

\item[\Rcode{y}]
Second \Rclass{dataframe} or \Rclass{matrix} (optional) containing relative abundances.  Columns are features, rows are samples.  Rows should therefore sum to a constant.  If both \Rcode{x} and \Rcode{y} are specified, they will be merged by row names.  If no row names are specified for either or both datasets,   the default is to merge by row number.

\item[\Rcode{sim.score}]
Similarity measure, such as \Rfunction{cor} or \Rfunction{nc.score}.  This can be either an existing R function or user-defined.  If the latter, certain properties should be satisfied as detailed below (also see examples).  The default similarity measure is Spearman correlation.

A user-defined similarity measure should mimic the interface of \Rfunction{cor}:
\begin{enumerate}
\item Take either two \Rclass{vector} inputs one \Rclass{matrix} or \Rclass{dataframe} input.

\item In the case of two inputs, return a single number.

\item In the case of one input, return a matrix in which the (\Rcode{i},\Rcode{j})th entry is the similarity score for column \Rcode{i} and column \Rcode{j} in the original matrix.

\item The resulting matrix (in the case of one input) must be symmetric.

\item The inputs must be named \Rcode{x} and \Rcode{y}.
\end{enumerate}

\item[\Rcode{sim.score.args}]
An optional list of arguments for the measurement function.  When given, they are passed to the \Rcode{sim.score} function directly.  For example, in the case of cor, the following would be acceptable:

<<eval=FALSE>>=
sim.score.args = list(method="spearman", use="complete.obs")
@

Note that this example corresponds to the default behavior.

\item[\Rcode{min.subj}]
Minimum number (count) of samples that must be nonzero in order to apply the similarity measure.  This is to ensure that there are sufficient samples to perform a bootstrap (default: 20).

\item[\Rcode{iterations}]
The number of iterations for both bootstrap and permutation calculations (default: 1000).

\item[\Rcode{subset.cols.x}]
Subset of columns from \Rcode{x} to work on.  The default (\Rcode{NULL}) uses all columns.  Note that all features are used for normalization, but calculations are performed only with the requested subset.

\item[\Rcode{subset.cols.y}]
Subset of columns from \Rcode{y} to work on.  The default (\Rcode{NULL}) uses all columns.  Note that all features are used for normalization, but calculations are performed only with the requested subset.

\item[\Rcode{errthresh1}]
Maximum allowable probability of getting all zeros in a given bootstrapped column for the first dataset \Rcode{x}.  If a feature has a number of zeros that makes the probability of obtaining all zeros when sampling with replacement greater than this value, that feature will be excluded from the subsequent analysis.  This is to ensure that the standard deviation of the bootstrap sample is non-zero. (default: 0.0001).

\item[\Rcode{verbose}]
If \Rcode{TRUE}, print periodic progress of the algorithm through the dataset(s), as well as including more detailed debugging output.  (default: \Rcode{FALSE}).

\item[\Rcode{iterations.gap}]
If \Rcode{verbose=TRUE}, the number of iterations between issuing status messages (default: 100).

\item[\Rcode{distributions}]
Optional output file for detailed log (if given) of all intermediate permutation and renormalization distributions.

\item[\Rcode{compare.within.x}]
  A boolean value indicating whether to do comparisons given by taking all subsets of size 2 from \Rcode{subset.cols.x} or to do comparisons given by taking all possible combinations of \Rcode{subset.cols.x} or \Rcode{subset.cols.y}.  If \Rcode{TRUE} but \Rcode{subset.cols.y=NA}, returns all comparisons involving any features in \Rcode{subset.cols.x}.
  
\item[\Rcode{concurrent.output}]
    Optional output file to which each comparison will be written as it is calculated.
    
\item[\Rcode{make.output.table}]
      A boolean value indicating whether to include table-formatted output.

\end{description}

\subsection{Output}

\Rcode{ccrepe} returns a \Rclass{list} containing both the calculation results and the parameters used:

\begin{description}
\setlength{\itemsep}{1em}

\item[\Rcode{sim.score}]
\Rclass{matrix} of simliarity scores for all requested comparisons.  The (\Rcode{i},\Rcode{j})th element corresponds to the similarity score of column \Rcode{i} (or the \Rcode{i}th column of \Rcode{subset.cols.1}) and column \Rcode{j} (or the \Rcode{j}th column of \Rcode{subset.cols.1}) in one dataset, or to the similarity score of column \Rcode{i} (or the \Rcode{i}th column of \Rcode{subset.cols.1}) in dataset \Rcode{x} and column \Rcode{j} (or the \Rcode{j}th column of \Rcode{subset.cols.2}) in dataset \Rcode{y} in the case of two datasets.

\item[\Rcode{p.values}]
\Rclass{matrix} of the corrected p-values for all requested comparisons.  The (\Rcode{i},\Rcode{j})th element corresponds to the p-value of the (\Rcode{i},\Rcode{j})th element of \Rcode{sim.score}.

\item[\Rcode{q.values}]
\Rclass{matrix} of the Benjamini-Hochberg-Yekutieli corrected p-values.  The (\Rcode{i},\Rcode{j})th element corresponds to the p-value of the (\Rcode{i},\Rcode{j})th element of \Rcode{sim.score}.

\item[\Rcode{z.stat}]
  \Rclass{matrix} of the z-statistics used in generating the p-values for all requested comparisons.  The (\Rcode{i},\Rcode{j})th element corresponds to the z-statistic generating the (\Rcode{i},\Rcode{j})th element of \Rcode{p.values}.

\end{description}

\subsection{Usage}	

<<eval=FALSE>>=
ccrepe(
 x = NA,
 y = NA,
 sim.score = cor,
 sim.score.args = list(),
 min.subj = 20,
 iterations = 1000,
 subset.cols.x = NULL,
 subset.cols.y = NULL,
 errthresh1 = 1e-04,
 verbose = FALSE,
 iterations.gap = 100,
 distributions = NA,
 compare.within.x = TRUE,
 concurrent.output = NA,
 make.output.table = FALSE)
@

\subsection{Example 1}

An example of how to use \Rfunction{ccrepe} with one dataset.

<<>>=
data <- matrix(rlnorm(40,meanlog=0,sdlog=1),nrow=10,ncol=4)
data[,1] = 2*data[,2] + rnorm(10,0,1)
data.rowsum <- apply(data,1,sum)
data.norm <- data/data.rowsum
apply(data.norm,1,sum)  # The rows sum to 1, so the data are normalized
test.input <- data.norm

dimnames(test.input) <- list(c(
 "Sample 1", "Sample 2","Sample 3","Sample 4","Sample 5",
 "Sample 6","Sample 7","Sample 8","Sample 9","Sample 10"),
 c("Feature 1", "Feature 2", "Feature 3","Feature 4"))

test.output <- ccrepe(x=test.input, iterations=20, min.subj=10)
@
<<fig.cap="Non-normalized and normalized associations between feature 1 and feature 2.  In this case we would expect feature 1 and feature 2 to be associated. In the output we see this by the positive sim.score value in the [1,2] element of test.output\\$sim.score and the small q-value in the [1,2] element of test.output\\$q.values.", fig.width=7, fig.height=3.5>>=
par(mfrow=c(1,2))
plot(data[,1],data[,2],xlab="Feature 1",ylab="Feature 2",main="Non-normalized")
plot(data.norm[,1],data.norm[,2],xlab="Feature 1",ylab="Feature 2",
     main="Normalized")
test.output
@

\subsection{Example 2}

An example of how to use \Rfunction{ccrepe} with two datasets.

<<>>=
data <- matrix(rlnorm(40,meanlog=0,sdlog=1),nrow=10,ncol=4)
data[,1] = 2*data[,2] + rnorm(10,0,1)
data.rowsum <- apply(data,1,sum)
data.norm <- data/data.rowsum
apply(data.norm,1,sum)  # The rows sum to 1, so the data are normalized
test.input <- data.norm


data2 <- matrix(rlnorm(105,meanlog=0,sdlog=1),nrow=15,ncol=7)
aligned.rows <- c(seq(1,4),seq(6,9),11,12)  # The datasets dont need 
                                            # to have subjects line up exactly
data2[aligned.rows,1] <-  2*data[,3] + rnorm(10,0,1)
data2.rowsum <- apply(data2,1,sum)
data2.norm <- data2/data2.rowsum
apply(data2.norm,1,sum)  # The rows sum to 1, so the data are normalized
test.input.2 <- data2.norm

dimnames(test.input) <- list(paste("Sample",seq(1,10)),paste("Feature",seq(1,4)))
dimnames(test.input.2) <- list(paste("Sample",c(seq(1,4),11,seq(5,8),12,9,10,13,14,15)),paste("Feature",seq(1,7)))

test.output.two.datasets <- ccrepe(x=test.input, y=test.input.2, iterations=20, min.subj=10)
@
<<fig.cap="Non-normalized and normalized associations between feature 1 and feature 2.  In this case we would expect feature 1 and feature 2 to be associated. In the output we see this by the positive sim.score value in the [1,2] element of test.output\\$sim.score and the small q-value in the [1,2] element of test.output\\$q.values.", fig.width=7, fig.height=3.5, fig.pos="h">>=
par(mfrow=c(1,2))
plot(data2[aligned.rows,1],data[,3],xlab="dataset 2: Feature 1",ylab="dataset 1: Feature 3",main="Non-normalized")
plot(data2.norm[aligned.rows,1],data.norm[,3],xlab="dataset 2: Feature 1",ylab="dataset 1: Feature 3",
     main="Normalized")
test.output.two.datasets
@

\subsection{Example 3}

An example of how to use \Rfunction{ccrepe} with \Rfunction{nc.score} as the similarity score.

<<>>=
data <- matrix(rlnorm(40,meanlog=0,sdlog=1),nrow=10,ncol=4)
data[,1] = 2*data[,2] + rnorm(10,0,1)
data.rowsum <- apply(data,1,sum)
data.norm <- data/data.rowsum
apply(data.norm,1,sum)  # The rows sum to 1, so the data are normalized
test.input <- data.norm

dimnames(test.input) <- list(paste("Sample",seq(1,10)),paste("Feature",seq(1,4)))

test.output.nc.score     <- ccrepe(x=test.input, sim.score=nc.score, iterations=20, min.subj=10)
@
<<fig.cap="Non-normalized and normalized associations between feature 1 and feature 2.  In this case we would expect feature 1 and feature 2 to be associated. In the output we see this by the positive sim.score value in the [1,2] element of test.output\\$sim.score and the small q-value in the [1,2] element of test.output\\$q.values. In this case, however, the sim.score represents the NC-Score between two features rather than the Spearman correlation.", fig.width=7, fig.height=3.5, fig.pos="h">>=
par(mfrow=c(1,2))
plot(data[,1],data[,2],xlab="Feature 1",ylab="Feature 2",main="Non-normalized")
plot(data.norm[,1],data.norm[,2],xlab="Feature 1",ylab="Feature 2",
     main="Normalized")
test.output.nc.score
@

\subsection{Example 4}

An example of how to use \Rfunction{ccrepe} with a user-defined \Rfunction{sim.score} function.

<<>>=
data <- matrix(rlnorm(40,meanlog=0,sdlog=1),nrow=10,ncol=4)
data[,1] = 2*data[,2] + rnorm(10,0,1)
data.rowsum <- apply(data,1,sum)
data.norm <- data/data.rowsum
apply(data.norm,1,sum)  # The rows sum to 1, so the data are normalized
test.input <- data.norm

dimnames(test.input) <- list(paste("Sample",seq(1,10)),paste("Feature",seq(1,4)))

my.test.sim.score <- function(x,y=NA,constant=0.5){
       if(is.vector(x) && is.vector(y)) return(constant)
          if(is.matrix(x) && is.na(y)) return(matrix(rep(constant,ncol(x)^2),ncol=ncol(x)))
          if(is.data.frame(x) && is.na(y)) return(matrix(rep(constant,ncol(x)^2),ncol=ncol(x)))
          else stop('ERROR')
   }

test.output.sim.score    <- ccrepe(x=test.input, sim.score=my.test.sim.score, iterations=20, min.subj=10, sim.score.args = list(constant = 0.6))
@
<<fig.cap="Non-normalized and normalized associations between feature 1 and feature 2.  In this case we would expect feature 1 and feature 2 to be associated. Note that the values of sim.score are all 0.6 and none of the p-values are very small because of the arbitrary definition of the similarity score.", fig.width=7, fig.height=3.5, fig.pos="h">>=
par(mfrow=c(1,2))
plot(data[,1],data[,2],xlab="Feature 1",ylab="Feature 2",main="Non-normalized")
plot(data.norm[,1],data.norm[,2],xlab="Feature 1",ylab="Feature 2",
     main="Normalized")
test.output.sim.score
@

\subsection{Example 5}

An example of how to use \Rfunction{ccrepe} when specifying column subsets.

<<<>>=
data <- matrix(rlnorm(40,meanlog=0,sdlog=1),nrow=10,ncol=4)
data.rowsum <- apply(data,1,sum)
data.norm <- data/data.rowsum
apply(data.norm,1,sum)  # The rows sum to 1, so the data are normalized
test.input <- data.norm

dimnames(test.input) <- list(paste("Sample",seq(1,10)),paste("Feature",seq(1,4)))

test.output.1.3     <- ccrepe(x=test.input, iterations=20, min.subj=10, subset.cols.x=c(1,3))
test.output.1       <- ccrepe(x=test.input, iterations=20, min.subj=10, subset.cols.x=c(1), compare.within.x=FALSE)
test.output.12.3    <- ccrepe(x=test.input, iterations=20, min.subj=10, subset.cols.x=c(1,2),subset.cols.y=c(3), compare.within.x=FALSE)
test.output.1.3$sim.score
test.output.1$sim.score
test.output.12.3$sim.score
@ 

%---------------------------------------------------------
\section{nc.score} 
%---------------------------------------------------------

The \Rfunction{nc.score} similarity measure is an N-dimensional extension of the checkerboard score particularly suited to similarity score calculations between compositions derived from ecological relative abundance measurements.  In such cases, features typically represent species abundances, and the NC-score discretizes these continuous values into one of N bins before computing a normalized similarity of co-occurrence or co-exclusion.  This can be used as a standalone function or with \Rfunction{ccrepe} as above to obtain compositionality-corrected p-values.

\subsection{General Functionality}

The NC-score is an extension to Diamond's checkerboard score (see \citet{cody1975ecology}) to ordinal data.  The generalization of the checkerboard score is through defining general co-variation and co-exclusion patterns in ordinal data.  With ordinal data, the checkerboard considers the $2\times 2$ submatrices which are of the form: 
\[
\left[ \begin{array}{cc}
    a & b\\
    c & d\\
    \end{array} \right]
\]
In the cases that $a < b$ and $c < d$ (or $a > b$ and $c > d$), the submatrix is considered a co-variation pattern; conversely, if $a > b$ and $c < d$ (or $a < b$ and $c > d$) then the submatrix is considered a co-exclusion pattern. The number of co-exclusion patterns for $n$ bins is adjusted based on the expected ratio of co-variaion to co-exclusion patterns in any given sample.  The adjustment factor is given by:
\[
1.5 \ast \frac{n(n-1)}{n^2 - n + 1},
\]
with the complete derivation being left to \citet{bielski2013thesis}.

\noindent
The function as implemented here first performs basic quality control filtering of input relative abundance data.  It then transforms relative abundances to ordinal values based on user-provided or default bin thresholds.  It then computes a raw NC-score:

\[
C_{ij} - 1.5 \ast \frac{n(n-1)}{n^2 - n + 1} \ast D_{ij}
\]
where $C_{ij}$ and $D_{ij}$ are the total number of co-variation and co-exclusion patterns, respectively, between species $i$ and $j$ and where $n$ is the number of bins.

\noindent
The raw NC-score is then normalized to between -1 and 1 for $n$ bins and $s$ samples by dividing by the maximum possible NC-score, which is based on the maximum number of co-variation patterns possible. The normalized NC-score is then analogous to Pearsons $\rho$, where positive values indicate more co-variation than co-exclusion patterns and the magnitude of the score indicates the strength of the association between the species. The derivation of this normalization factor can be found in \citet{bielski2013thesis}. The normalization factor is given by:
\[
\left(\begin{array}{c} 
  s\\ 
  2\\ 
\end{array} \right) - \left[ s \bmod n \ast 
\left(\begin{array}{c} 
  \left \lfloor{\frac{s}{n}} \right \rfloor + 1 \\ 
  2 \\
\end{array} \right) + \big(n - s \bmod n \big) \ast \left( \begin{array}{c} 
  \left \lfloor{\frac{s}{n}} \right \rfloor \\ 
  2 \\
\end{array} \right)\right].
\]


\subsection{Arguments}

\begin{description}
\setlength{\itemsep}{1em}

\item[\Rcode{x}]
First numerical \Rclass{vector}, or single \Rclass{dataframe} or \Rclass{matrix}, containing relative abundances.  If the latter, columns are features, rows are samples.  Rows should therefore sum to a constant.

\item[\Rcode{y}]
If provided, second numerical \Rclass{vector} containing relative abundances.  If given, \Rcode{x} must be a \Rclass{vector} as well.

\item[\Rcode{bins}]
Either the number of bins to use or a \Rclass{vector} specifying bin edges.  If a single number is given, this is used as the number of bins with the \Rfunction{discretize} function of the package \CRANpkg{infotheo}.  If a \Rclass{vector} is specified, the function \Rfunction{findInterval} is used to discretize the data.  The default behavior is to use the defaults for the \Rfunction{discretize} function.

\item[\Rcode{verbose}]
Request verbose output.

\item[\Rcode{min.abundance}]
Minimum abundance threshold for quality control filtering.  For a feature to be included, it must take a value of at least \Rcode{min.abundance} in at least \Rcode{min.samples} percent of samples.

\item[\Rcode{min.samples}]
Minimum sample threshold for quality control filtering.  For a feature to be included, it must take a value of at least \Rcode{min.abundance} in at least \Rcode{min.samples} percent of samples.

\end{description}

\subsection{Output}

\Rfunction{nc.score} returns either a single number (if called with two vectors) or a \Rclass{matrix} of all pairwise scores (if called with a \Rclass{matrix}) of normalized scores.

\subsection{Usage}

<<eval=FALSE>>=
nc.score(
 x = NA,
 y = NA, 
 bins = NA, 
 verbose = FALSE,			 
 min.abundance = 1e-04,
 min.samples = 0.1)
@

\subsection{Example 1}

An example of using \Rfunction{nc.score} to get a single similarity score or a matrix.

<<>>=
data <- matrix(rlnorm(40,meanlog=0,sdlog=1),nrow=10,ncol=4)
data.rowsum <- apply(data,1,sum)
data[,1] = 2*data[,2] + rnorm(10,0,1)
data.norm <- data/data.rowsum
apply(data.norm,1,sum)  # The rows sum to 1, so the data are normalized
test.input <- data.norm

dimnames(test.input) <- list(paste("Sample",seq(1,10)),paste("Feature",seq(1,4)))

test.output.matrix <- nc.score(x=test.input)
test.output.num    <- nc.score(x=test.input[,1],y=test.input[,2])
@
<<fig.cap="Non-normalized and normalized associations between feature 1 and feature 2 of the second example.  Again, we expect to observe a positive association between feature 1 and feature 2.  In terms of generalized checkerboard scores, we would expect to see more co-variation patterns than co-exclusion patterns.  This is shown by the positive and relatively high value of the [1,2] element of test.output.matrix (which is identical to test.output.num)", fig.height=3, fig.pos="h">>=
par(mfrow=c(1, 2))
plot(data[,1],data[,2],xlab="Feature 1",ylab="Feature 2",main="Non-normalized")
plot(data.norm[,1],data.norm[,2],xlab="Feature 1",ylab="Feature 2",
     main="Normalized")
test.output.matrix
test.output.num
@

\subsection{Example 2}

An example of using \Rfunction{nc.score} with an aribitrary bin number.

<<>>=
data <- matrix(rlnorm(40,meanlog=0,sdlog=1),nrow=10,ncol=4)
data.rowsum <- apply(data,1,sum)
data[,1] = 2*data[,2] + rnorm(10,0,1)
data.norm <- data/data.rowsum
apply(data.norm,1,sum)  # The rows sum to 1, so the data are normalized
test.input <- data.norm

dimnames(test.input) <- list(paste("Sample",seq(1,10)),paste("Feature",seq(1,4)))

test.output <- nc.score(x=test.input,bins=2)
@
<<fig.cap="Non-normalized and normalized associations between feature 1 and feature 2 of the second example.  Again, we expect to observe a positive association between feature 1 and feature 2.  In terms of generalized checkerboard scores, we would expect to see more co-variation patterns than co-exclusion patterns.  This is shown by the positive and relatively high value in the [1,2] element of test.output.  In this case, the smaller bin number yields a smaller NC-score because of the coarser partitioning of the data.", fig.height=3, fig.pos="h">>=
par(mfrow=c(1, 2))
plot(data[,1],data[,2],xlab="Feature 1",ylab="Feature 2",main="Non-normalized")
plot(data.norm[,1],data.norm[,2],xlab="Feature 1",ylab="Feature 2",
     main="Normalized")
test.output
@

\subsection{Example 3}

An example of using \Rfunction{nc.score} with user-defined bin edges.

<<>>=
data <- matrix(rlnorm(40,meanlog=0,sdlog=1),nrow=10,ncol=4)
data.rowsum <- apply(data,1,sum)
data[,1] = 2*data[,2] + rnorm(10,0,1)
data.norm <- data/data.rowsum
apply(data.norm,1,sum)  # The rows sum to 1, so the data are normalized
test.input <- data.norm

dimnames(test.input) <- list(paste("Sample",seq(1,10)),paste("Feature",seq(1,4)))

test.output <- nc.score(x=test.input,bins = c(0.001,0.1,0.25,0.6))
@
<<fig.cap="Non-normalized and normalized associations between feature 1 and feature 2 of the second example.  Again, we expect to observe a positive association between feature 1 and feature 2.  In terms of generalized checkerboard scores, we would expect to see more co-variation patterns than co-exclusion patterns.  This is shown by the positive and relatively high value in the [1,2] element of test.output.  The bin edges specified here represent almost absent ([  0,0.001)), low abundance ([0.001,0.1)), medium abundance ([0.1,0.25)), and high abundance ([0.6,1)).", fig.height=3, fig.pos="h">>=
par(mfrow=c(1, 2))
plot(data[,1],data[,2],xlab="Feature 1",ylab="Feature 2",main="Non-normalized")
plot(data.norm[,1],data.norm[,2],xlab="Feature 1",ylab="Feature 2",
     main="Normalized")
test.output
@

%---------------------------------------------------------
\section{References} 
%---------------------------------------------------------

\bibliographystyle{plainnat}
\bibliography{ccrepe}

\end{document}