%\VignetteIndexEntry{immunoClust package} %\VignetteKeywords{} %\VignetteEncoding{UTF-8} %\VignettePackage{BiocStyle} %\VignetteEngine{utils::Sweave} \documentclass{article} <>= BiocStyle::latex() @ \usepackage{cite, hyperref} \usepackage[utf8]{inputenx} \bioctitle[immunoClust]{immunoClust - Automated Pipeline for Population Detection in Flow Cytometry} \author{Till Sörensen\footnote{till-antoni.soerensen@charite.de}} \begin{document} \setkeys{Gin}{width=1.0\textwidth, height=1.1\textwidth} \maketitle \textnormal{\normalfont} \tableofcontents \newpage \section{Licensing} Under the Artistic License, you are free to use and redistribute this software. However, we ask you to cite the following paper if you use this software for publication. \begin{itemize} \item[] Sörensen, T., Baumgart, S., Durek, P., Grützkau, A. and Häupl, T. immunoClust - an automated analysis pipeline for the identification of immunophenotypic signatures in high-dimensional cytometric datasets. \emph{Cytometry A} (accepted). \end{itemize} \section{Overview} \Rpackage{immunoClust} presents an automated analysis pipeline for uncompensated fluorescence and mass cytometry data and consists of two parts. First, cell events of each sample are grouped into individual clusters (cell-clustering). Subsequently, a classification algorithm assorts these cell event clusters into populations comparable between different samples (meta-clustering). The clustering of cell events is designed for datasets with large event counts in high dimensions as a global unsupervised method, sensitive to identify rare cell types even when next to large populations. Both parts use model-based clustering with an iterative Expectation Maximization (EM) algorithm and the Integrated Classification Likelihood (ICL) to obtain the clusters. The cell-clustering process fits a mixture model with $t$-distributions. Within the clustering process a optimisation of the $asinh$-transformation for the fluorescence parameters is included. The meta-clustering fits a Gaussian mixture model for the meta-clusters, where adjusted Bhattacharyya-Coefficients give the probability measures between cell- and meta-clusters. Several plotting routines are available visualising the results of the cell- and meta-clustering process. Additional helper-routines to extract population features are provided. \section{Getting started} The installation on \Rpackage{immunoClust} is normally done within the Bioconductor. The core functions of \Rpackage{immunoClust} are implemented in C/C++ for optimal utilization of system resources and depend on the GNU Scientific Library (GSL) and Basic Linear Subprogram (BLAS). When installing \Rpackage{immunoClust} form source using Rtools be aware to adjust the GSL library and include pathes in src/Makevars.in or src/Makevars.win (on Windows systems) repectively to the correct installation directory of the GSL-library on the system. \Rpackage{immunoClust} relies on the \Rclass{flowFrame} structure imported from the \Biocpkg{flowCore}-package for accessing the measured cell events from a flow cytometer device. \section{Example Illustrating the immunoClust Pipeline} The functionality of the immunoClust pipeline is demonstrated on a dataset of blood cell samples of defined composition that were depleted of particular cell subsets by magnetic cell sorting. Whole blood leukocytes taken from three healthy individuals, which were experimentally modified by the depletion of one particular cell type per sample, including granulocytes (using CD15-MACS-beads), monocytes (using CD14-MACS-beads), T lymphocytes (CD3-MACS-beads), T helper lymphocytes (using CD4-MACS-beads) and B lymphocytes (using CD19-MACS-beads). The example datasets contain reduced (10.000 cell-events) of the first Flow Cytometry (FC) sample in \Robject{dat.fcs} and the \Rpackage{immunoClust} cell-clustering results of all 5 reduced FC samples for the first donor in \Robject{dat.exp}. The full sized dataset is published and available under http://flowrepository.org/id/FR-FCM-ZZWB. \subsection{Cell Event Clustering} \label{cell-clustering} <>= library(immunoClust) @ The cell-clustering is performed by the \Rfunction{cell.process} function for each FC sample separately. Its major input are the measured cell-events in a \Rclass{flowFrame}-object imported from the \Biocpkg{flowCore}-package. <>= data(dat.fcs) dat.fcs @ In the \Rcode{parameters} argument the parameters (named as observables in the \Robject{flowFrame}) used for cell-clustering are specified. When omitted all determined parameters are used. <>= pars=c("FSC-A","SSC-A","FITC-A","PE-A","APC-A","APC-Cy7-A","Pacific Blue-A") res.fcs <- cell.process(dat.fcs, parameters=pars) @ The \Rfunction{summary} method for an \Rpackage{immunoClust}-object gives an overview of the clustering results. <>= summary(res.fcs) @ With the \Rcode{bias} argument of the \Rfunction{cell.process} function the number of clusters in the final model is controlled. <>= res2 <- cell.process(dat.fcs, bias=0.25) summary(res2) @ An ICL-bias of 0.3 is reasonable for fluorescence cytometry data based on our experiences, whereas the number of clusters increase dramatically when a \Rcode{bias} below 0.2 is applied. A principal strategy for the ICL-bias in the whole pipeline is the use of a moderately small \Rcode{bias} (0.2 - 0.3) for cell-clustering and to optimise the \Rcode{bias} on meta-clustering level to retrieve the common populations across all samples. For plotting the clustering results on cell event level, the optimised $asinh$-transformation has to be applied to the raw FC data first. <>= dat.transformed <- trans.ApplyToData(res.fcs, dat.fcs) @ A scatter plot matrix of all used parameters for clustering is obtained by the \Rfunction{splom} method. <>= splom(res.fcs, dat.transformed, N=1000) @ For a scatter plot of 2 particular parameters the \Rfunction{plot} method can be used, where parameters of interest are specified in the \Rcode{subset} argument. <>= plot(res.fcs, data=dat.transformed, subset=c(1,2)) @ \subsection{Meta Clustering} \label{meta-clustering} For meta-clustering the cell-clustering results of all FC samples obtained by the \Rfunction{cell.process} function are collected in a \Robject{vector} of \Rpackage{immunoClust}-objects and processed by the \Rfunction{meta.process} function. <>= data(dat.exp) meta<-meta.process(dat.exp, meta.bias=0.3) @ The obtained \Robject{immunoMeta}-object contains the meta-clustering result in \Rcode{\$res.clusters}, and the used cell-clusters information in \Rcode{\$dat.clusters}. Additionally, the clusters can be structures manually in a hierarchical mannner using methods of the \Robject{immunoMeta}-object. A scatter plot matrix of the meta-clustering is obtained by the \Rfunction{plot} method. <>= plot(meta, c()) @ In these scatter plots each cell-cluster is marked by a point of its centre. With the default \Rcode{plot.ellipse=TRUE} argument the meta-clusters are outlined by ellipses of the 90\% quantile. \subsection{Meta Annotation} \label{meta-annotation} We take a look on the event numbers of all meta-clusters in each sample <>= cls <- clusters(meta,c()) events(meta,cls) @ and pick the meta-clusters of the five commonly found population, with respect to the technical depletion to collect them in a first annotation level <>= addLevel(meta,c(1),"leucocytes") <- c(1,2,6,7,10,14,18) @ In the plot of this level the five major population are seen easily <>= plot(meta, c(1)) @ and we identify the clusters for the particular populations successivley by their expression levels. <>= cls <- clusters(meta,c(1)) sort(mu(meta,cls,7)) ## CD3 expression inc <- mu(meta,cls,7) > 5 ## CD3+ clusters cls[inc] mu(meta,cls[inc],6) ## CD4 expression addLevel(meta,c(1,1), "CD3+CD4+") <- 10 addLevel(meta,c(1,2), "CD3+CD4-") <- 2 cls <- unclassified(meta,c(1)) sort(mu(meta,cls,5)) ## CD15 expression inc <- mu(meta,cls,5) > 3 addLevel(meta,c(1,3), "CD15+") <- cls[inc] cls <- unclassified(meta,c(1)) sort(mu(meta,cls,3)) ## CD14 expression inc <- mu(meta,cls,3) > 5 addLevel(meta,c(1,4), "CD14+") <- cls[inc] cls <- unclassified(meta,c(1)) sort(mu(meta,cls,4)) ## CD19 expression inc <- mu(meta,cls,4) > 3 addLevel(meta,c(1,5), "CD19+") <- cls[inc] @ The whole analysis is performed on uncompensated FC data, thus the high CD19 values on the CD14-population is explained by spillover of FITC into PE. The event numbers of each meta-cluster and each sample are extracted in a numeric matrix by the \Rfunction{meta.numEvents} function. <>= tbl <- meta.numEvents(meta, out.all=FALSE) tbl[,1:5] @ Each row denotes an annotated hierarchical level or/and meta-cluster and each column a data sample used in meta-clustering. The row names give the annotated population name, the meta-cluster index and the default color used in the plot routines for each meta-cluster. In the last columns additionally the meta-cluster centre values in each parameter are given, which helps to identify the meta-clusters. Further export functions retrieve relative cell event frequencies and sample meta-cluster centre values in a particular parameter. We see here, that for sample \texttt{12546} where the CD15-cells are depleted, the CD14-population is missing. Anyway, this missing cluster could be in the so far unclassified clusters. <>= move(meta,c(1,4)) <- 13 @ <>= plot(meta, c(1)) @ We see the CD14 population of sample \texttt{12546} shifted in FSC and CD3 expression levels, probably due to technical variation in the measurement of the CD15-depleted sample, where the granulocytes are missing which constitute about 60\% - 70\% of the events in the other samples. \section{Session Info} The documentation and example output was compiled and obtained on the system: <>= toLatex(sessionInfo()) @ \end{document}