%\VignetteIndexEntry{flowClean}
%\VignetteDepends{flowCore}
%\VignetteDepends{flowClean}
%\VignetteDepends{bit}
%\VignetteDepends{changepoint}
%\VignetteDepends{sfsmisc}
%\VignettePackage{flowClean}
\documentclass[12pt]{article}
<<echo=FALSE>>=
options(width=70)
@ 
\SweaveOpts{eps=FALSE,echo=TRUE,png=TRUE,pdf=FALSE,figs.only=TRUE,keep.source=TRUE}
\usepackage{fullpage}
\usepackage{times}
\usepackage[colorlinks=TRUE,urlcolor=blue,citecolor=blue]{hyperref}

\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\Rcode}[1]{{\texttt{#1}}}
\newcommand{\software}[1]{\textsf{#1}}
\newcommand{\R}{\software{R}}

\title{flowClean}
\author{Christopher Fletez-Brant, Pratip Chattopadhyay}
\date{Modified: April 1, 2014.  Compiled: \today}
\begin{document}
\setlength{\parskip}{0.2\baselineskip}
\setlength{\parindent}{0pt}
\setkeys{Gin}{width=\textwidth}
\maketitle

\section*{Introduction}

This package contains the flowCore
method for performing quality control on flow cytomery datasets.  This method is described in
\cite{flowCleanpaper}.

\begin{small}
<<load,results=hide>>=
library(flowClean)
library(flowViz)
library(grid)
library(gridExtra)
@ 
\end{small}

\section*{Data}

Example data is a real FCS file in which we intentionally perturbed the fluorescent intensity (FI)
of a subset of cells along the V705 channel (`<V705-A>`).

\begin{small}
<<data1>>=
data(synPerturbed)
synPerturbed
@
\end{small}

\section*{Quality Control}

The full details are available in \cite{flowCleanpaper}.  The motivating idea for this methodology
is that populations in a flow experiment should be collected nearly uniformly with respect to time
of collection.  The primary actor in flowClean is the \Rfunction{clean}, which tests for deviations
from uniformity of collection.  Specifically, the collection time is discretized into $l$ periods,
each of which can be considered a $N$-part composition 
\begin{displaymath}
  D_{j = 1..l} = \left[P_1, P_2, \dots, P_N\right]
\end{displaymath}
with each $P_i$ the frequency of a population defined as +/- with respect to some threshold; 
the default is the median FI of a flow parameter.  By default $l = 100$.

Each $D_j$ then udnergoes the centered log ratio (CLR) transformation \cite{comppaper}:
\begin{displaymath}
  CLR(D_j) = \left[ln\frac{P_1}{g(D_j)}; \ldots ; ln\frac{P_N}{g(D_j)}\right]
\end{displaymath}
where
\begin{displaymath}
  g(D_j) = \sqrt[N]{P_1P_2...P_N}
\end{displaymath}

To avoid \Robject{-Inf} values, substitution of zeroes is performed using the 'modified Aitchison' of 
\cite{zerosub}.

The $L_p$ norm of the subset $CLR(D_j) > 0$, denoted $L_p = \|CLR(D_j)\|^+$, where 
$p = |CLR(D_j) > 0|$, is then calculated for each $D_j$ and changepoint analysis 
is performed on the set of all $\|CLR(D_j)\|^+$.  If there are no changes then the FCS is assumed 
to contain no errors.  Otherwise, the means of the periods are compared relative to the mean of the longest period
between changepoints and thresholded according to some $k$, which empirically works well with $k = 1.3$.

Actually calling \Rfunction{clean} requires only specifying a flowFrame, which markers are to be analyzed
(generally without the 'scatter' parameters), the name to be given to the output (directory structure can 
be included) and the file extension:

\begin{small}
<<cleancall>>=
synPerturbed.c <- clean(synPerturbed, vectMarkers=c(5:16), 
                   filePrefixWithDir="sample_out", ext="fcs", diagnostic=TRUE)
synPerturbed.c
@ 
\end{small}

The result is an FCS file identical to the input file with a new parameter, 'GoodVsBad', in which 'Good' cells
all are given $FI < 10000$ and 'Bad' cells are given $FI \geq 10000$, which allows for easy
programmatic gating out of 'Bad' cells from multiple FCS files.  This parameter can also be used in plots
as any other flow parameter as well.

\begin{small}
<<label=fig1plot, include=FALSE, fig=FALSE>>=
lgcl <- estimateLogicle(synPerturbed.c, unname(parameters(synPerturbed.c)$name[5:16]))
synPerturbed.cl <- transform(synPerturbed.c, lgcl)
p1 <- xyplot(`<V705-A>` ~ `Time`, data=synPerturbed.cl, 
             abs=TRUE, smooth=FALSE, alpha=0.5, xlim=c(0, 100))
p2 <- xyplot(`GoodVsBad` ~ `Time`, data=synPerturbed.cl, 
             abs=TRUE, smooth=FALSE, alpha=0.5, xlim=c(0, 100), ylim=c(0, 20000))
rg <- rectangleGate(filterId="gvb", list("GoodVsBad"=c(0, 9999)))
idx <- filter(synPerturbed.cl, rg)
synPerturbed.clean <- Subset(synPerturbed.cl, idx)
p3 <- xyplot(`<V705-A>` ~ `Time`, data=synPerturbed.clean,
             abs=TRUE, smooth=FALSE, alpha=0.5, xlim=c(0, 100))
grid.arrange(p1, p2, p3, ncol=3)
@ 
\end{small}
\begin{figure}
<<label=fig1,fig=TRUE, echo=FALSE, width=8, height=4>>=
<<fig1plot>>
@
\caption{Left) FCS before flowClean.  Center) New 'GoodVsBad' parameter.
   Right) FCS after flowClean and filtering.}
\label{fig:one}
\end{figure}

\section*{SessionInfo}

<<sessionInfo,results=tex,eval=TRUE,echo=FALSE>>=
toLatex(sessionInfo())
@ 

\begin{thebibliography}{1}
  \bibitem{flowCleanpaper} Fletez-Brant C, Spidlen J, Brinkman R, Roederer M, Chattopadhyay P.  Quailty Control of flow cytometry data through compositional data analysis.  In preparation.
  \bibitem{comppaper} Aitchison J. A concise guide to compositional data analysis.  Compositional Data Analysis Workshop; Girona, Italy.  
  \bibitem{zerosub} Fry J, Fry T, McLaren K.  Compositional data analysis and zeros in micro data.  CoPS/IMPACT Working Paper Number G-120.
\end{thebibliography}
\end{document}

% Local Variables:
% LocalWords: LocalWords flow cytoemtry, compositional data analysis
% LocalWords: clean, flowClean
% End: