% NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % %\VignetteIndexEntry{Removal of contaminants from FACS data} %\VignetteDepends{prada} %\VignetteKeywords{} %\VignettePackage{none} \documentclass[11pt]{article} \usepackage{times} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rvariable}[1]{{\textit{#1}}} \begin{document} \title{Fitting a bivariate normal distribution to a 2D scatterplot} \author{Florian Hahne} \maketitle \section{Overview} Using FACS (fluorescence-activated cell sorter) one can measure certain properties of each individual cell in a population of cells. Examples for these properties: \begin{itemize} \item Forward light scatter (FSC): this measures a cell's size \item Sideward light scatter (SSC): this measures a cell's granularity \item Several fluorescence channels (typically 3 to 4) that measure the abundance of fluorophores, which may be bound to specific antibodies for surface or intracellular markers, or be encoded by a GFP-tagged transcript. \end{itemize} First, we load example data from a FACS analysis that was performed by Mamatha Sauermann at the German Cancer Research Center in Heidelberg. % <>= par(mfrow=c(1,1)) @ <>= library(prada) sampdat <- readFCS(system.file("extdata", "fas-Bcl2-plate323-04-04.A01", package="prada")) fdat <- exprs(sampdat) @ % The scatterplot of FSC vs SSC is often used for quality control. It is shown in Fig.~\ref{fig:plotsbefore}. <>= plot(fdat[,"FSC-H"], fdat[,"SSC-H"], pch=20, col="#303030", xlab="FSC", ylab="SSC", main="Scatter plot FSC vs SSC") @ % \begin{figure}[htbp] \centering \includegraphics[width=0.5\textwidth]{norm2-plotsbefore} \caption{\label{fig:plotsbefore}% Scatter plot FACS data: FSC vs SSC.} \end{figure} @ % The cell population is often contaminated by cell debris or conjugates. These can be identified by their size: they are either much smaller or much larger than the main population, or they have an unusual degree of granularity. Segmentation is often performed manually by looking at the FSC-SCC scatterplot. Here we describe an automated algorithm for this task. \section{Fitting} The package \Rpackage{prada} provides the functions \Rfunction{fitNorm2} and \Rfunction{plotNorm2}. We assume that the shape of the main population in the FSC vs SSC plot can be approximated by a normal distribution. The function \Rfunction{fitNorm2} fits a bivariate normal distribution into the data (by robust estimation of its covariance matrix). Contours of equal probability of a bivariate normal are ellipses. We select those cells as being part of the main population that lie within such an ellipse. Its size is controled by the parameter \Rvariable{scalefac}. The function returns a list. % <>= nfit <- fitNorm2(fdat[,"FSC-H"], fdat[,"SSC-H"], scalefac=2) @ % We can plot this with the function \Rfunction{plotNorm2} (see Fig~\ref{fig:plotNorm2}). It shows the ellipse, and the set of discarded points is marked by a red dot. Also the center of the normal distribution is marked by the red cross. % <>= plotNorm2(nfit, selection=TRUE, ellipse=TRUE) @ \begin{figure}[htp] \centering \includegraphics[width=0.48\textwidth]{norm2-plotNorm2-1} \includegraphics[width=0.48\textwidth]{norm2-plotNorm2-2} \caption{\label{fig:plotNorm2}% Selection of the main population, using two different values of the parameter \Robject{scalefac}.} \end{figure} % <>= nfit3 <- fitNorm2(fdat[,"FSC-H"], fdat[,"SSC-H"], scalefac=3) plotNorm2(nfit3, selection=TRUE, ellipse=TRUE) @ % To select the cells from within the ellipse, the list item \Robject{nfit\$sel} is a logical vector with the same length as the number of data points. <>= cleanfdat <- fdat[nfit$sel,] @ Fig.~\ref{fig:plotafter} shows again a scatter plot of the two fluorescense channels FL1 and FL4 this time using the 'clean' data set \Robject{cleanfdat}. <>= par(mfrow=c(1,2)) xlim <- range(fdat[,"FL1-H"]) ylim <- range(fdat[,"FL4-H"]) plot(fdat[,"FL1-H"], fdat[,"FL4-H"], pch=20, col="#303030", xlab="FL1", ylab="FL4", main="all data", xlim=xlim, ylim=ylim) plot(cleanfdat[,"FL1-H"], cleanfdat[,"FL4-H"], pch=20, col="#303030", xlab="FL1", ylab="FL4", main="clean data only", xlim=xlim, ylim=ylim) @ \begin{figure}[htp] \centering \includegraphics{norm2-plotafter} \caption{\label{fig:plotafter}% Scatter plots of FL1 vs FL4.} \end{figure} \section{Scatterplots} If you think that scatterplots with thousands of points are hard to read and annoying to view in a PDF viewer, have a look at the function \Rfunction{smoothScatter} (see Fig.~\ref{fig:smoothscatter}): <>= smoothScatter(fdat[,c("FSC-H", "SSC-H")], nrpoints=50) @ % \begin{figure}[htp] \centering \includegraphics[width=\textwidth]{norm2-smoothscatter} \caption{\label{fig:smoothscatter}% Smooth scatter plots.} \end{figure} % \end{document}