--- title: "flowAI" author: "Gianni Monaco" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Automatic and GUI methods to do quality control on Flow cytometry Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(dpi = 300) knitr::opts_chunk$set(cache=FALSE) ``` ## Introduction to flowAI The flowAI package allows to perform quality control on flow cytometry data in order to warrant superior results for both manual and automated downstream analysis. The package is built on the functions: 1) `flow_auto_qc`, for the automatic analysis and 2) `flow_iQC()`, for the interactive analysis. The principle behind these two methods is complementary and takes into account three different properties, i.e. 1) flow rate, 2) signal acquisition and 3) dynamic range. The evaluation of these properties makes it possible to remove the technical variability derived from surges or deviations in the flow rate, defective laser-detection system, data range limitations and other technical issues. #### Load the package After the installation of the package in your local system you can load the package. ```{r, message=FALSE} require(flowAI) ``` ## Calling functions and loading data #### Data For this documentation or for other testing purposes we use a small built-in dataset. The dataset was manually created extracting a subsample of cells and channels from three FCS files that were part of an aging study of a Singaporean cohort. The dataset is stored as a flowSet object. ```{r, collapse = TRUE} data(Bcells) Bcells ``` To select FCS files from your working directory you can create a character vector of the files you want to analyze calling the function `dir` with "*fcs$" as regular expression for the pattern argument. ```{r, eval=FALSE} setwd(...) fcsfiles <- dir(".", pattern="*fcs$") ``` #### Automatic method The automatic method is implemented in the function `flow_auto_qc`. The following calls show how to perform the quality control with default settings on the FCS files in your folder and in the toy dataset that comes with the FlowAI packages. The flowAI package depends on the flowCore package for the handling of the FCS files in the R environment. The flowCore package provides two main classes, `flowFrame` and `flowSet`. The `Bcells` object is an instance of the `flowSet` class and contains a set of three FCS files that taken singuarly are instances of the `flowFrame` object. The `flow_auto_qc` function can be called on either one of the flowCore objects, `flowSet` and `flowFrame`, and on a character vector of the fcs files: ```{r, eval=FALSE} resQC <- flow_auto_qc(Bcells) # using a flowSet resQC <- flow_auto_qc(Bcells[[1]]) # using a flowFrame resQC <- flow_auto_qc(fcsfiles) # using a character vector ``` When a character vector is used to call the `flow_auto_qc` function, a `flowSet` object is automatically generated since the creation of the histogram for the cell number comparison depends on it. Therefore, to avoid memory saturation, we suggest to split large datasets in batches that are compatible with the hardware specifications of your computer system. For example, if you want batches of maximum 2 gigabytes you can use: ```{r, eval=FALSE} GbLimit <- 2 # decide the limit in gigabyte for your batches of FCS files size_fcs <- file.size(fcsfiles)/1024/1024/1024 # it calculates the size in gigabytes for each FCS file groups <- ceiling(sum(size_fcs)/GbLimit) cums <- cumsum(size_fcs) batches <- cut(cums, groups) ``` Then you can run your analysis on the batches using a for-loop: ```{r, eval=FALSE} for(i in 1:groups){ flow_auto_qc(fcsfiles[which(batches == levels(batches)[i])], output = 0) } ``` When setting the *output* argument to 0 (or any other value apart 1 and 2), no R objects are returned. #### Interactive method The interactive method is implemented as a Shiny app and is executed through the `flow_iQC()` command on the R environment. For performance and clearness reasons, it allows to analyze one file at a time only. Once you open the Shiny app on your web browser, you can upload the FCS file from the top part of the left hand side panel. ## Quality control The full pipeline of our quality control procedure includes the removal of events having anomalous values when looking at three aspect of a flow cytometry analysis: 1. flow rate 2. signal acquisition 3. dynamic range However, it is possible to perform partial quality control on only one or two of the above mentioned properties. For the automatic method, this can be set with the argument `remove_from`. After the quality control, the automatic method generates by default a new FCS file containing an additional parameter where the low quality events have a value higher than 10,000, similarly to the flowClean flagging method. Alternatively, a new FCS containing only the high quality events can be generated. Moreover, flowAI can be implemented in automatic pipelines of analysis through the returned objects of the `flowFrame` or `flowSet` class ## Result evaluation The function `flow_auto_qc` generates a report for each FCS file, in both a graphic and tabular format, to evaluate the performance of the algorithms in the detection of the anomalies. We suggest to run the automatic method first with default settings. If the results are not satisfying you can either modify the settings or use the interactive method `flow_iQC`. ## Case study: B cells from elderly individuals Here, we give an example of the results obtained after performing the quality control on the first FCS file of the Bcells dataset. #### File description and quality control summary The summary information of the FCS file analyzed is reported in the first section of the automatically generated report or on the left hand side panel of the `flow_iQC` Shiny app. The summary information contains the name of the file, the number of events and the total percentage of anomalies detected and removed. The following information were obtained from the automatically generated report of our example: > Input File Name: Bcells1 > Number of Events: 64562 > The anomalies were removed from: Flow Rate, Flow Signal and Flow Margin > Anomalies Detected in total: 23% > Number of high quality events: 49535 ##### Comparison of the number of events among the FCS files of the dataset If the dataset has more than three FCS files, the automatic method will produce a histogram containing the number of events for each file. The bar in blue correspond to the FCS file whose quality control analysis is described in the remaining part of the report. ![hist_set](hist_set.png) #### Flow rate check The flow rate is reconstructed using the keyword $TIMESTEP contained in FCS files with version equal or greater to 3. By default the analysis is performed using a timestep of 1/10 of a second. `flow_auto_qc` uses an anomaly detection algorithm to detect and remove the data acquired during flow rate surges and shift from the median value. The algorithm is based on the Generalized ESD outlier detection method optimized to work on time series data. The anomalies automatically detected are circled in green. ![flowrateAUTO](autoflowrate.png) `flow_iQC` allows to manually select the most stable region of the flow rate. ![flowrateMANUAL](iQCrate.png) #### Signal acquisition check For each channel, the median of the signal of equally-sized bins of events is reported as a Levy-Jennings-type graph. The mean and standard deviation of the median should remain constant over the course of the analysis. `flow_auto_qc` uses a changepoint detection method to verify the stability of the signal. Precisely, a shift in the median or the variance is detected by the Binary Segmentation algorithm of the `changepoint` package. In the resulting plot, the region that passed the quality control is highlighted in yellow. ![signal](autosignal.png) As for the flow rate checking, `flow_iQC` allows to manually choose the most stable region. #### Dynamic range check Events from the upper and lower limits of the dynamic range are checked in the last step. For the upper limit, the maximum value of the dynamic range is removed since the instrument is unable to record values exceeding a maximum pre-set by the manufacturer. For the lower limit, the quality control removes all the values below zero for the scatter channels and all the outliers in the negative range for the immunofluorescence channels. The plot shows the frequency of events removed over the course of the analysis; the scaling of the x-axis is complementary to the one of the signal acquisition check. For this step, both `flow_auto_qc` and `flow_iQC` use the same detection principle to scout for anomalies. When using the automatic method to refine the lower limit of the dynamic range, with the `neg_valueFM` argument you can decide to truncate the negative values to the cut-off suggested in the FCS file instead of the removing the negative outliers. ![margins](margins.png)