The common data files (FCS files for the practical workshop) are located in /data/OpenCyto
. The following will move the gating template csv
file to your home directory under ~/data
.
Please do not touch or modify the shared gating template. Work rather on your own copy.
# Common files
FCSPATH="/data/OpenCyto/data/FCS/"
WORKSPACE="/data/OpenCyto/data/workspace/080 batch 0882.xml"
OUTPATH="~"
MANUAL<-file.path(OUTPATH,"data","manual")
AUTO<-file.path(OUTPATH,"data","auto")
TEMPLATE="/data/OpenCyto/data/template/gt_080.csv"
ANNOTATIONS="/data/OpenCyto/data/workspace/pd_submit.csv"
# Move the template into your home directory
if(!file.exists("~/data")){
dir.create("~/data")
}
file.copy(TEMPLATE,file.path("~/data",basename(TEMPLATE)),overwrite = TRUE)
## [1] TRUE
file.copy(ANNOTATIONS,file.path("~/data",basename(ANNOTATIONS)),overwrite = TRUE)
## [1] TRUE
TEMPLATE<-"~/data/gt_080.csv"
ANNOTATIONS<-"~/data/pd_submit.csv"
require(openCyto)
## Loading required package: openCyto
## Loading required package: flowWorkspace
## Loading required package: flowCore
## Loading required package: flowViz
## Loading required package: lattice
## KernSmooth 2.23 loaded
## Copyright M. P. Wand 1997-2009
## Loading required package: ncdfFlow
## Loading required package: gridExtra
## Loading required package: grid
require(ggplot2)
## Loading required package: ggplot2
require(plyr)
## Loading required package: plyr
require(reshape2)
## Loading required package: reshape2
require(data.table)
## Loading required package: data.table
The first thing to do is open the workspace file.
ws<-openWorkspace(WORKSPACE)
show(ws)
## FlowJo Workspace Version 2.0
## File location: /data/OpenCyto/data/workspace
## File name: 080 batch 0882.xml
## Workspace is open.
##
## Groups in Workspace
## Name Num.Samples
## 1 All Samples 158
## 2 0882-L-080 157
## 3 Comps 13
## 4 0882 Samples 76
We see different sample groups in the workspace. The data we’re concerned with is the 0882 Samples
group. The data are already compensated and transformed.
The following code snippet parses the contents of the workspace, loads the FCS data, compensates and transforms it according to the instructions in the FlowJo workspace file, and stores the results in a gating_set.
This can take some time depending on the bandwidth of the attached storage device (local vs network disk).
Some arguments to parseWorkspace
The name
argument specifies which group
to import.
The path
argument tells flowWorkspace
where to look for the FCS files.
The subset
argument allows you to parse a subset of samples. It is a numeric index. The isNcdf
argument allows you specify whether or not to use HDF5 / NetCDF for disk-based data access (TRUE), or whether to store the data in RAM (FALSE). Large data sets can’t practically be analyzed in RAM, set this to TRUE most of the time.
# Parse the worspace only if we haven't already done so, and save it.
if(!file.exists(MANUAL)){
G<-parseWorkspace(ws,isNcdf=TRUE,name="0882 Samples",path=FCSPATH)
save_gs(G,MANUAL)
}else{
G<-load_gs(MANUAL)
}
G is now a GatingSet
object. This is a set of GatingHierarchy
objects, each of which represents an FCS file in the workspace / group that you imported.
show(G)
## A GatingSet with 76 samples
We see the object G
has 76 samples. The usual R operators can be used to subset the data. - Subset by index (not preferred) - Extract sample names - Subset by sample names (preferred)
G[1:10] # Subset by numeric index.. not the preferred way of doing things.
## A GatingSet with 10 samples
sampleNames(G) # Get the sample names
## [1] "769097.fcs" "769098.fcs" "769099.fcs" "769101.fcs" "769103.fcs"
## [6] "769105.fcs" "769107.fcs" "769109.fcs" "769111.fcs" "769119.fcs"
## [11] "769121.fcs" "769122.fcs" "769124.fcs" "769125.fcs" "769127.fcs"
## [16] "769128.fcs" "769130.fcs" "769131.fcs" "769133.fcs" "769134.fcs"
## [21] "769136.fcs" "769137.fcs" "769145.fcs" "769147.fcs" "769149.fcs"
## [26] "769151.fcs" "769153.fcs" "769155.fcs" "769161.fcs" "769163.fcs"
## [31] "769165.fcs" "769167.fcs" "769169.fcs" "769171.fcs" "769177.fcs"
## [36] "769179.fcs" "769181.fcs" "769183.fcs" "769185.fcs" "769187.fcs"
## [41] "769193.fcs" "769195.fcs" "769197.fcs" "769199.fcs" "769201.fcs"
## [46] "769203.fcs" "769209.fcs" "769211.fcs" "769213.fcs" "769215.fcs"
## [51] "769217.fcs" "769219.fcs" "769225.fcs" "769227.fcs" "769229.fcs"
## [56] "769231.fcs" "769233.fcs" "769235.fcs" "769241.fcs" "769243.fcs"
## [61] "769245.fcs" "769247.fcs" "769249.fcs" "769251.fcs" "769257.fcs"
## [66] "769259.fcs" "769261.fcs" "769263.fcs" "769265.fcs" "769267.fcs"
## [71] "769273.fcs" "769275.fcs" "769277.fcs" "769279.fcs" "769281.fcs"
## [76] "769283.fcs"
G[sampleNames(G)[1:10]] # Subset by sample name.. preferred.
## A GatingSet with 10 samples
There is one data transformation per transformed channel. In this case there are no differences between the transformations on the different channels. The FlowJo transformation is a biexponential but it maps input data into “channel space” from 0 to 4095. The output data are effectively binned. flowWorkspace interpolate the transformation function to give continuous values between 0 and 4095.
raw<-seq(1,2.5e5,by=1000) #Input data range on the raw scale
transformed<-lapply(getTransformations(G[[1]]),function(x)matrix(x(raw))) # transform with each transformation
transformed<-do.call(cbind,transformed) # group output
colnames(transformed)<-gsub("^ ","",names(getTransformations(G[[1]])))
transformed<-melt(transformed)
transformed<-(cbind(transformed,raw))
setnames(transformed,c("index","parameter","transformed","raw"))
ggplot(data.frame(raw,transformed))+geom_line(aes(x=raw,y=transformed))+facet_wrap(~parameter)+theme_minimal()
We can extract the compensation matrices from the GatingSet
as well, and visualize them.
getCompensationMatrices(G[[1]]) #Compensation for the first GatingHierarchy
## Compensation object 'defaultCompensation':
## FITC-A PerCP Cy55 Blue-A Pacific Blue-A Am Cyan-A
## FITC-A 1.000e+00 0.117360 1.375e-03 9.008e-03
## PerCP Cy55 Blue-A 5.425e-06 1.000000 1.140e-04 3.689e-05
## Pacific Blue-A 5.084e-04 0.004963 1.000e+00 2.455e-02
## Am Cyan-A 4.314e-02 0.064777 4.833e-01 1.000e+00
## APC-A 2.848e-04 0.035541 2.395e-04 1.022e-04
## Alexa 680-A 3.616e-04 0.049498 2.752e-04 1.155e-04
## APC Cy7-A -7.741e-04 -0.002257 -5.087e-04 -2.697e-04
## PE Green laser-A 2.025e-04 0.144410 8.921e-06 9.473e-06
## PE Tx RD-A 5.456e-04 0.915790 1.920e-04 4.865e-05
## PE Cy7-A 4.590e-04 0.048309 5.165e-05 2.110e-05
## APC-A Alexa 680-A APC Cy7-A PE Green laser-A
## FITC-A 0.0012730 0.001554 4.906e-04 0.0289250
## PerCP Cy55 Blue-A 0.0092128 0.390170 2.259e-02 0.0004728
## Pacific Blue-A 0.0012361 0.006692 1.086e-03 0.0014149
## Am Cyan-A 0.0197610 0.032842 2.797e-03 0.0350400
## APC-A 1.0000000 3.122700 1.409e-01 0.0010204
## Alexa 680-A 0.0020035 1.000000 7.197e-02 0.0007192
## APC Cy7-A 0.0150370 0.047991 1.000e+00 -0.0015137
## PE Green laser-A 0.0006007 0.001064 4.363e-05 1.0000000
## PE Tx RD-A 0.0039865 0.008877 4.263e-04 0.2067800
## PE Cy7-A 0.0003989 0.003547 3.853e-02 0.0505200
## PE Tx RD-A PE Cy7-A
## FITC-A 0.0089330 0.0005243
## PerCP Cy55 Blue-A 0.0003445 0.0510200
## Pacific Blue-A 0.0011251 0.0011551
## Am Cyan-A 0.0274790 0.0058747
## APC-A 0.0008323 0.0153530
## Alexa 680-A 0.0008401 0.0152670
## APC Cy7-A -0.0017165 0.1681300
## PE Green laser-A 0.2480800 0.0059821
## PE Tx RD-A 1.0000000 0.0522140
## PE Cy7-A 0.0141130 1.0000000
ggplot(melt(getCompensationMatrices(G[[1]])@spillover,value.name = "Coefficient"))+geom_tile(aes(x=Var1,y=Var2,fill=Coefficient))+scale_fill_continuous(guide="colourbar")+theme(axis.text.x=element_text(angle=45,hjust=1))
The getNodes
function will extract the node names (population names) of all populations defined in a GatingSet or a GatingHierarchy.
getNodes(G)
## [1] "root"
## [2] "/S"
## [3] "/S/Lv"
## [4] "/S/Lv/L"
## [5] "/S/Lv/L/3+"
## [6] "/S/Lv/L/3+/4+"
## [7] "/S/Lv/L/3+/4+/57+"
## [8] "/S/Lv/L/3+/4+/57-"
## [9] "/S/Lv/L/3+/4+/Perforin+"
## [10] "/S/Lv/L/3+/4+/Granzyme B+"
## [11] "/S/Lv/L/3+/4+/Granzyme B-"
## [12] "/S/Lv/L/3+/4+/GZB+IFN+IL2+TNF+"
## [13] "/S/Lv/L/3+/4+/GZB+IFN+IL2+TNF-"
## [14] "/S/Lv/L/3+/4+/GZB+IFN+IL2-TNF+"
## [15] "/S/Lv/L/3+/4+/GZB+IFN+IL2-TNF-"
## [16] "/S/Lv/L/3+/4+/GZB+IFN-IL2+TNF+"
## [17] "/S/Lv/L/3+/4+/GZB+IFN-IL2+TNF-"
## [18] "/S/Lv/L/3+/4+/GZB+IFN-IL2-TNF+"
## [19] "/S/Lv/L/3+/4+/GZB+IFN-IL2-TNF-"
## [20] "/S/Lv/L/3+/4+/GZB-IFN+IL2+TNF+"
## [21] "/S/Lv/L/3+/4+/GZB-IFN+IL2+TNF-"
## [22] "/S/Lv/L/3+/4+/GZB-IFN+IL2-TNF+"
## [23] "/S/Lv/L/3+/4+/GZB-IFN+IL2-TNF-"
## [24] "/S/Lv/L/3+/4+/GZB-IFN-IL2+TNF+"
## [25] "/S/Lv/L/3+/4+/GZB-IFN-IL2+TNF-"
## [26] "/S/Lv/L/3+/4+/GZB-IFN-IL2-TNF+"
## [27] "/S/Lv/L/3+/4+/GZB-IFN-IL2-TNF-"
## [28] "/S/Lv/L/3+/4+/IFN+IL2+"
## [29] "/S/Lv/L/3+/4+/IFN+IL2+TNF+"
## [30] "/S/Lv/L/3+/4+/IFN+IL2+TNF-"
## [31] "/S/Lv/L/3+/4+/IFN+IL2-"
## [32] "/S/Lv/L/3+/4+/IFN+IL2-TNF+"
## [33] "/S/Lv/L/3+/4+/IFN+IL2-TNF-"
## [34] "/S/Lv/L/3+/4+/IFN-IL2+"
## [35] "/S/Lv/L/3+/4+/IFN-IL2+TNF+"
## [36] "/S/Lv/L/3+/4+/IFN-IL2+TNF-"
## [37] "/S/Lv/L/3+/4+/IFN-IL2-"
## [38] "/S/Lv/L/3+/4+/IFN-IL2-TNF+"
## [39] "/S/Lv/L/3+/4+/IFN-IL2-TNF-"
## [40] "/S/Lv/L/3+/4+/IFNg+"
## [41] "/S/Lv/L/3+/4+/IFNg+IL2+TNFa+GzB+57+"
## [42] "/S/Lv/L/3+/4+/IFNg+IL2+TNFa+GzB+57-"
## [43] "/S/Lv/L/3+/4+/IFNg+IL2+TNFa+GzB-57+"
## [44] "/S/Lv/L/3+/4+/IFNg+IL2+TNFa+GzB-57-"
## [45] "/S/Lv/L/3+/4+/IFNg+IL2+TNFa-GzB+57+"
## [46] "/S/Lv/L/3+/4+/IFNg+IL2+TNFa-GzB+57-"
## [47] "/S/Lv/L/3+/4+/IFNg+IL2+TNFa-GzB-57+"
## [48] "/S/Lv/L/3+/4+/IFNg+IL2+TNFa-GzB-57-"
## [49] "/S/Lv/L/3+/4+/IFNg+IL2-TNFa+GzB+57+"
## [50] "/S/Lv/L/3+/4+/IFNg+IL2-TNFa+GzB+57-"
## [51] "/S/Lv/L/3+/4+/IFNg+IL2-TNFa+GzB-57+"
## [52] "/S/Lv/L/3+/4+/IFNg+IL2-TNFa+GzB-57-"
## [53] "/S/Lv/L/3+/4+/IFNg+IL2-TNFa-GzB+57+"
## [54] "/S/Lv/L/3+/4+/IFNg+IL2-TNFa-GzB+57-"
## [55] "/S/Lv/L/3+/4+/IFNg+IL2-TNFa-GzB-57+"
## [56] "/S/Lv/L/3+/4+/IFNg+IL2-TNFa-GzB-57-"
## [57] "/S/Lv/L/3+/4+/IFNg-IL2+TNFa+GzB+57+"
## [58] "/S/Lv/L/3+/4+/IFNg-IL2+TNFa+GzB+57-"
## [59] "/S/Lv/L/3+/4+/IFNg-IL2+TNFa+GzB-57+"
## [60] "/S/Lv/L/3+/4+/IFNg-IL2+TNFa+GzB-57-"
## [61] "/S/Lv/L/3+/4+/IFNg-IL2+TNFa-GzB+57+"
## [62] "/S/Lv/L/3+/4+/IFNg-IL2+TNFa-GzB+57-"
## [63] "/S/Lv/L/3+/4+/IFNg-IL2+TNFa-GzB-57+"
## [64] "/S/Lv/L/3+/4+/IFNg-IL2+TNFa-GzB-57-"
## [65] "/S/Lv/L/3+/4+/IFNg-IL2-TNFa+GzB+57+"
## [66] "/S/Lv/L/3+/4+/IFNg-IL2-TNFa+GzB+57-"
## [67] "/S/Lv/L/3+/4+/IFNg-IL2-TNFa+GzB-57+"
## [68] "/S/Lv/L/3+/4+/IFNg-IL2-TNFa+GzB-57-"
## [69] "/S/Lv/L/3+/4+/IFNg-IL2-TNFa-GzB+57+"
## [70] "/S/Lv/L/3+/4+/IFNg-IL2-TNFa-GzB+57-"
## [71] "/S/Lv/L/3+/4+/IFNg-IL2-TNFa-GzB-57+"
## [72] "/S/Lv/L/3+/4+/IFNg-IL2-TNFa-GzB-57-"
## [73] "/S/Lv/L/3+/4+/IFN\\IL2"
## [74] "/S/Lv/L/3+/4+/IL2+"
## [75] "/S/Lv/L/3+/4+/TNFa+"
## [76] "/S/Lv/L/3+/8+"
## [77] "/S/Lv/L/3+/8+/57+"
## [78] "/S/Lv/L/3+/8+/57-"
## [79] "/S/Lv/L/3+/8+/Granzyme B+"
## [80] "/S/Lv/L/3+/8+/Granzyme B-"
## [81] "/S/Lv/L/3+/8+/GZB+IFN+IL2+TNF+"
## [82] "/S/Lv/L/3+/8+/GZB+IFN+IL2+TNF-"
## [83] "/S/Lv/L/3+/8+/GZB+IFN+IL2-TNF+"
## [84] "/S/Lv/L/3+/8+/GZB+IFN+IL2-TNF-"
## [85] "/S/Lv/L/3+/8+/GZB+IFN-IL2+TNF+"
## [86] "/S/Lv/L/3+/8+/GZB+IFN-IL2+TNF-"
## [87] "/S/Lv/L/3+/8+/GZB+IFN-IL2-TNF+"
## [88] "/S/Lv/L/3+/8+/GZB+IFN-IL2-TNF-"
## [89] "/S/Lv/L/3+/8+/GZB-IFN+IL2+TNF+"
## [90] "/S/Lv/L/3+/8+/GZB-IFN+IL2+TNF-"
## [91] "/S/Lv/L/3+/8+/GZB-IFN+IL2-TNF+"
## [92] "/S/Lv/L/3+/8+/GZB-IFN+IL2-TNF-"
## [93] "/S/Lv/L/3+/8+/GZB-IFN-IL2+TNF+"
## [94] "/S/Lv/L/3+/8+/GZB-IFN-IL2+TNF-"
## [95] "/S/Lv/L/3+/8+/GZB-IFN-IL2-TNF+"
## [96] "/S/Lv/L/3+/8+/GZB-IFN-IL2-TNF-"
## [97] "/S/Lv/L/3+/8+/IFN+IL2+"
## [98] "/S/Lv/L/3+/8+/IFN+IL2+TNF+"
## [99] "/S/Lv/L/3+/8+/IFN+IL2+TNF-"
## [100] "/S/Lv/L/3+/8+/IFN+IL2-"
## [101] "/S/Lv/L/3+/8+/IFN+IL2-TNF+"
## [102] "/S/Lv/L/3+/8+/IFN+IL2-TNF-"
## [103] "/S/Lv/L/3+/8+/IFN-IL2+"
## [104] "/S/Lv/L/3+/8+/IFN-IL2+TNF+"
## [105] "/S/Lv/L/3+/8+/IFN-IL2+TNF-"
## [106] "/S/Lv/L/3+/8+/IFN-IL2-"
## [107] "/S/Lv/L/3+/8+/IFN-IL2-TNF+"
## [108] "/S/Lv/L/3+/8+/IFN-IL2-TNF-"
## [109] "/S/Lv/L/3+/8+/IFNg+"
## [110] "/S/Lv/L/3+/8+/IFNg+IL2+TNFa+GzB+57+"
## [111] "/S/Lv/L/3+/8+/IFNg+IL2+TNFa+GzB+57-"
## [112] "/S/Lv/L/3+/8+/IFNg+IL2+TNFa+GzB-57+"
## [113] "/S/Lv/L/3+/8+/IFNg+IL2+TNFa+GzB-57-"
## [114] "/S/Lv/L/3+/8+/IFNg+IL2+TNFa-GzB+57+"
## [115] "/S/Lv/L/3+/8+/IFNg+IL2+TNFa-GzB+57-"
## [116] "/S/Lv/L/3+/8+/IFNg+IL2+TNFa-GzB-57+"
## [117] "/S/Lv/L/3+/8+/IFNg+IL2+TNFa-GzB-57-"
## [118] "/S/Lv/L/3+/8+/IFNg+IL2-TNFa+GzB+57+"
## [119] "/S/Lv/L/3+/8+/IFNg+IL2-TNFa+GzB+57-"
## [120] "/S/Lv/L/3+/8+/IFNg+IL2-TNFa+GzB-57+"
## [121] "/S/Lv/L/3+/8+/IFNg+IL2-TNFa+GzB-57-"
## [122] "/S/Lv/L/3+/8+/IFNg+IL2-TNFa-GzB+57+"
## [123] "/S/Lv/L/3+/8+/IFNg+IL2-TNFa-GzB+57-"
## [124] "/S/Lv/L/3+/8+/IFNg+IL2-TNFa-GzB-57+"
## [125] "/S/Lv/L/3+/8+/IFNg+IL2-TNFa-GzB-57-"
## [126] "/S/Lv/L/3+/8+/IFNg-IL2+TNFa+GzB+57+"
## [127] "/S/Lv/L/3+/8+/IFNg-IL2+TNFa+GzB+57-"
## [128] "/S/Lv/L/3+/8+/IFNg-IL2+TNFa+GzB-57+"
## [129] "/S/Lv/L/3+/8+/IFNg-IL2+TNFa+GzB-57-"
## [130] "/S/Lv/L/3+/8+/IFNg-IL2+TNFa-GzB+57+"
## [131] "/S/Lv/L/3+/8+/IFNg-IL2+TNFa-GzB+57-"
## [132] "/S/Lv/L/3+/8+/IFNg-IL2+TNFa-GzB-57+"
## [133] "/S/Lv/L/3+/8+/IFNg-IL2+TNFa-GzB-57-"
## [134] "/S/Lv/L/3+/8+/IFNg-IL2-TNFa+GzB+57+"
## [135] "/S/Lv/L/3+/8+/IFNg-IL2-TNFa+GzB+57-"
## [136] "/S/Lv/L/3+/8+/IFNg-IL2-TNFa+GzB-57+"
## [137] "/S/Lv/L/3+/8+/IFNg-IL2-TNFa+GzB-57-"
## [138] "/S/Lv/L/3+/8+/IFNg-IL2-TNFa-GzB+57+"
## [139] "/S/Lv/L/3+/8+/IFNg-IL2-TNFa-GzB+57-"
## [140] "/S/Lv/L/3+/8+/IFNg-IL2-TNFa-GzB-57+"
## [141] "/S/Lv/L/3+/8+/IFNg-IL2-TNFa-GzB-57-"
## [142] "/S/Lv/L/3+/8+/IFN\\IL2"
## [143] "/S/Lv/L/3+/8+/IL2+"
## [144] "/S/Lv/L/3+/8+/Perforin+"
## [145] "/S/Lv/L/3+/8+/TNFa+"
## [146] "/S/Lv/L/Not 4+"
## [147] "/S/Lv/L/Not 4+/IFNg+"
This manually gated data has 147 distinct populations.
getNodes
and getPopStats
The population names can be unwieldy and long. One particularly useful argument is the path=
argument. You can set path="auto"
, which will extract the node name as the shortest unique path name for each cell population.
getNodes(G,path="auto")
## [1] "root" "S"
## [3] "Lv" "L"
## [5] "3+" "4+"
## [7] "4+/57+" "4+/57-"
## [9] "4+/Perforin+" "4+/Granzyme B+"
## [11] "4+/Granzyme B-" "4+/GZB+IFN+IL2+TNF+"
## [13] "4+/GZB+IFN+IL2+TNF-" "4+/GZB+IFN+IL2-TNF+"
## [15] "4+/GZB+IFN+IL2-TNF-" "4+/GZB+IFN-IL2+TNF+"
## [17] "4+/GZB+IFN-IL2+TNF-" "4+/GZB+IFN-IL2-TNF+"
## [19] "4+/GZB+IFN-IL2-TNF-" "4+/GZB-IFN+IL2+TNF+"
## [21] "4+/GZB-IFN+IL2+TNF-" "4+/GZB-IFN+IL2-TNF+"
## [23] "4+/GZB-IFN+IL2-TNF-" "4+/GZB-IFN-IL2+TNF+"
## [25] "4+/GZB-IFN-IL2+TNF-" "4+/GZB-IFN-IL2-TNF+"
## [27] "4+/GZB-IFN-IL2-TNF-" "4+/IFN+IL2+"
## [29] "4+/IFN+IL2+TNF+" "4+/IFN+IL2+TNF-"
## [31] "4+/IFN+IL2-" "4+/IFN+IL2-TNF+"
## [33] "4+/IFN+IL2-TNF-" "4+/IFN-IL2+"
## [35] "4+/IFN-IL2+TNF+" "4+/IFN-IL2+TNF-"
## [37] "4+/IFN-IL2-" "4+/IFN-IL2-TNF+"
## [39] "4+/IFN-IL2-TNF-" "4+/IFNg+"
## [41] "4+/IFNg+IL2+TNFa+GzB+57+" "4+/IFNg+IL2+TNFa+GzB+57-"
## [43] "4+/IFNg+IL2+TNFa+GzB-57+" "4+/IFNg+IL2+TNFa+GzB-57-"
## [45] "4+/IFNg+IL2+TNFa-GzB+57+" "4+/IFNg+IL2+TNFa-GzB+57-"
## [47] "4+/IFNg+IL2+TNFa-GzB-57+" "4+/IFNg+IL2+TNFa-GzB-57-"
## [49] "4+/IFNg+IL2-TNFa+GzB+57+" "4+/IFNg+IL2-TNFa+GzB+57-"
## [51] "4+/IFNg+IL2-TNFa+GzB-57+" "4+/IFNg+IL2-TNFa+GzB-57-"
## [53] "4+/IFNg+IL2-TNFa-GzB+57+" "4+/IFNg+IL2-TNFa-GzB+57-"
## [55] "4+/IFNg+IL2-TNFa-GzB-57+" "4+/IFNg+IL2-TNFa-GzB-57-"
## [57] "4+/IFNg-IL2+TNFa+GzB+57+" "4+/IFNg-IL2+TNFa+GzB+57-"
## [59] "4+/IFNg-IL2+TNFa+GzB-57+" "4+/IFNg-IL2+TNFa+GzB-57-"
## [61] "4+/IFNg-IL2+TNFa-GzB+57+" "4+/IFNg-IL2+TNFa-GzB+57-"
## [63] "4+/IFNg-IL2+TNFa-GzB-57+" "4+/IFNg-IL2+TNFa-GzB-57-"
## [65] "4+/IFNg-IL2-TNFa+GzB+57+" "4+/IFNg-IL2-TNFa+GzB+57-"
## [67] "4+/IFNg-IL2-TNFa+GzB-57+" "4+/IFNg-IL2-TNFa+GzB-57-"
## [69] "4+/IFNg-IL2-TNFa-GzB+57+" "4+/IFNg-IL2-TNFa-GzB+57-"
## [71] "4+/IFNg-IL2-TNFa-GzB-57+" "4+/IFNg-IL2-TNFa-GzB-57-"
## [73] "4+/IFN\\IL2" "4+/IL2+"
## [75] "4+/TNFa+" "8+"
## [77] "8+/57+" "8+/57-"
## [79] "8+/Granzyme B+" "8+/Granzyme B-"
## [81] "8+/GZB+IFN+IL2+TNF+" "8+/GZB+IFN+IL2+TNF-"
## [83] "8+/GZB+IFN+IL2-TNF+" "8+/GZB+IFN+IL2-TNF-"
## [85] "8+/GZB+IFN-IL2+TNF+" "8+/GZB+IFN-IL2+TNF-"
## [87] "8+/GZB+IFN-IL2-TNF+" "8+/GZB+IFN-IL2-TNF-"
## [89] "8+/GZB-IFN+IL2+TNF+" "8+/GZB-IFN+IL2+TNF-"
## [91] "8+/GZB-IFN+IL2-TNF+" "8+/GZB-IFN+IL2-TNF-"
## [93] "8+/GZB-IFN-IL2+TNF+" "8+/GZB-IFN-IL2+TNF-"
## [95] "8+/GZB-IFN-IL2-TNF+" "8+/GZB-IFN-IL2-TNF-"
## [97] "8+/IFN+IL2+" "8+/IFN+IL2+TNF+"
## [99] "8+/IFN+IL2+TNF-" "8+/IFN+IL2-"
## [101] "8+/IFN+IL2-TNF+" "8+/IFN+IL2-TNF-"
## [103] "8+/IFN-IL2+" "8+/IFN-IL2+TNF+"
## [105] "8+/IFN-IL2+TNF-" "8+/IFN-IL2-"
## [107] "8+/IFN-IL2-TNF+" "8+/IFN-IL2-TNF-"
## [109] "8+/IFNg+" "8+/IFNg+IL2+TNFa+GzB+57+"
## [111] "8+/IFNg+IL2+TNFa+GzB+57-" "8+/IFNg+IL2+TNFa+GzB-57+"
## [113] "8+/IFNg+IL2+TNFa+GzB-57-" "8+/IFNg+IL2+TNFa-GzB+57+"
## [115] "8+/IFNg+IL2+TNFa-GzB+57-" "8+/IFNg+IL2+TNFa-GzB-57+"
## [117] "8+/IFNg+IL2+TNFa-GzB-57-" "8+/IFNg+IL2-TNFa+GzB+57+"
## [119] "8+/IFNg+IL2-TNFa+GzB+57-" "8+/IFNg+IL2-TNFa+GzB-57+"
## [121] "8+/IFNg+IL2-TNFa+GzB-57-" "8+/IFNg+IL2-TNFa-GzB+57+"
## [123] "8+/IFNg+IL2-TNFa-GzB+57-" "8+/IFNg+IL2-TNFa-GzB-57+"
## [125] "8+/IFNg+IL2-TNFa-GzB-57-" "8+/IFNg-IL2+TNFa+GzB+57+"
## [127] "8+/IFNg-IL2+TNFa+GzB+57-" "8+/IFNg-IL2+TNFa+GzB-57+"
## [129] "8+/IFNg-IL2+TNFa+GzB-57-" "8+/IFNg-IL2+TNFa-GzB+57+"
## [131] "8+/IFNg-IL2+TNFa-GzB+57-" "8+/IFNg-IL2+TNFa-GzB-57+"
## [133] "8+/IFNg-IL2+TNFa-GzB-57-" "8+/IFNg-IL2-TNFa+GzB+57+"
## [135] "8+/IFNg-IL2-TNFa+GzB+57-" "8+/IFNg-IL2-TNFa+GzB-57+"
## [137] "8+/IFNg-IL2-TNFa+GzB-57-" "8+/IFNg-IL2-TNFa-GzB+57+"
## [139] "8+/IFNg-IL2-TNFa-GzB+57-" "8+/IFNg-IL2-TNFa-GzB-57+"
## [141] "8+/IFNg-IL2-TNFa-GzB-57-" "8+/IFN\\IL2"
## [143] "8+/IL2+" "8+/Perforin+"
## [145] "8+/TNFa+" "Not 4+"
## [147] "Not 4+/IFNg+"
Operations that utilize node names can match a unique path to a node, so this gives an easy way to extract the shortest unique name for a cell population.
getPopStats
will extract cell population statistics, for a GatingSet
or a GatingHierarchy
. You can specify whether you want the cell counts or the cell frequencies.
freq_gs<-getPopStats(G[1:5],statistic="freq")
head(freq_gs)
## 769097.fcs 769098.fcs 769099.fcs 769101.fcs 769103.fcs
## 3+ 0.79054 0.79106 0.8159666 7.163e-01 7.793e-01
## 4+ 0.63934 0.63238 0.6581814 6.150e-01 6.393e-01
## 4+/57+ 0.05689 0.05853 0.0489381 2.319e-02 1.078e-02
## 4+/57- 0.94311 0.94147 0.9510619 9.768e-01 9.892e-01
## 4+/GZB+IFN+IL2+TNF+ 0.00000 0.00000 0.0004208 1.506e-03 3.705e-04
## 4+/GZB+IFN+IL2+TNF- 0.00000 0.00000 0.0000000 8.425e-05 4.005e-05
counts_gh<-getPopStats(G[1:5],statistic="count")
head(counts_gh)
## 769097.fcs 769098.fcs 769099.fcs 769101.fcs 769103.fcs
## 3+ 116492 111962 129990 154397 156216
## 4+ 74478 70802 85557 94950 99869
## 4+/57+ 4237 4144 4187 2202 1077
## 4+/57- 70241 66658 81370 92748 98792
## 4+/GZB+IFN+IL2+TNF+ 0 0 36 143 37
## 4+/GZB+IFN+IL2+TNF- 0 0 0 8 4
The output of getPopStats
is a matrix wtih each entry representing the cell counts or cell frequencies (relative to the parent population) for a cell population and a sample.
We can plot the gating tree for a sample or a set of samples using plot
. To plot a gate use plotGate
.
# The gating tree for this gating set
plot(G[[1]],boolean=FALSE)
There is a lone IFNg+ gate floating, unattached to other nodes. If we look more closely at the node names, we see that its parent is “Not 4+”, which is hidden. We can set attributes of a node using setNode
.
Plotting gates is done using plotGate
. The infrastructre will extract the event-level data and plot it directly.
#plot one specific gate from one sample
p1<-plotGate(G[[2]],"4+/TNFa+",arrange=FALSE)
#plot one gate from two samples - automatic faceting
p2<-plotGate(G[2:3],"4+/TNFa+",arrange=FALSE)
#oops! population names must be unique
try(plotGate(G[1:2],"TNFa+"))
#binning using xbin (or not)
p4<-plotGate(G[[2]], "4+/TNFa+",xbin=128,arrange=FALSE) #smaller bins (default is 32), more detail
# use gridExtra for layout
require(gridExtra)
grid.arrange(p1[[1]],p2,p4[[1]]) # Single plot is a list with element 1 being the plot object. Multiple plots are trellis objects directly.
#plot an entire layout (boolean gates are skipped automatically)
p3<-plotGate(G[[2]]) # a little slower since we read all the event-level data. Arrange is TRUE by default
## skipping boolean gates!
The plotting functions use package::lattice
, see the help on ?plotGate
for further options.
Another way to extract event-level data from GatingSets and GatingHierarchy objects is via the getData
API.
This API extracts the selected events (e.g. events belonging to a certain population) into an ncdfFlowSet
, ncdfFlowFrame
, flowSet
, or flowFrame
. These are the standard flowCore
objects for representing collections of events from samples, with each sample representing one or a subset of FCS files.
cd3 <- getData(G[[1]],"3+")
cd3
## flowFrame object '9e2f2b56-b28e-4f1a-8ff2-cbd45407257e'
## with 116492 cells and 16 observables:
## name desc range minRange maxRange
## $P1 Time <NA> 2987 0.000 2987
## $P2 FSC-A <NA> 262143 0.000 262143
## $P3 FSC-H <NA> 262143 0.000 262143
## $P4 SSC-A <NA> 262143 0.000 262143
## $P5 <FITC-A> TNFa 4098 -1.437 4097
## $P6 <PerCP Cy55 Blue-A> CD8 4098 -1.437 4097
## $P7 <Pacific Blue-A> CD57 4098 -1.437 4097
## $P8 <Am Cyan-A> AViD 4098 -1.437 4097
## $P9 <APC-A> Perforin 4098 -1.437 4097
## $P10 <Alexa 680-A> Granzyme B 4098 -1.437 4097
## $P11 <APC Cy7-A> CD4 4098 -1.437 4097
## $P12 <PE Green laser-A> IL2 4098 -1.437 4097
## $P13 <PE Tx RD-A> CD3 4098 -1.437 4097
## $P14 PE Cy5-A <NA> 4098 -1.437 4097
## $P15 PE Cy55-A <NA> 4098 -1.437 4097
## $P16 <PE Cy7-A> IFNg 4098 -1.437 4097
## 241 keywords are stored in the 'description' slot
The above extracts the events that are CD3+ from the first sample of the gating set. The returned object is a flowFrame
. See ?flowFrame
for more information.
To grab multiple samples, pass a GatingSet
cd3_set<-getData(G[1:10],"3+")
cd3_set
## An ncdfFlowSet with 10 samples.
## flowSetId :
## NCDF file : /home/greg/data/manual/file7e0047d2a0b0.nc
## An object of class 'AnnotatedDataFrame'
## rowNames: 769097.fcs 769098.fcs ... 769119.fcs (10 total)
## varLabels: name
## varMetadata: labelDescription
##
## column names:
## Time, FSC-A, FSC-H, SSC-A, <FITC-A>, <PerCP Cy55 Blue-A>, <Pacific Blue-A>, <Am Cyan-A>, <APC-A>, <Alexa 680-A>, <APC Cy7-A>, <PE Green laser-A>, <PE Tx RD-A>, PE Cy5-A, PE Cy55-A, <PE Cy7-A>
You’ll note the above is a different object, an ncdfFlowSet
in this case, which is a set of flowFrames backed by a NetCDF file.
The samples contained in a netCDF flowSet are tracked by a set of pointers or sample indices, so the same NetCDF file can be associated with multiple R objects, and each can contain subsets of the whole data set.
For this reason, you should be careful about modifying the data on disk. It can have unintended consequences.
cd3_set@file #where the NetCDF file lives
## [1] "/home/greg/data/manual/file7e0047d2a0b0.nc"
flowData(G) #Return the entire ncdfFlowSet associated with a GatingSet
## An ncdfFlowSet with 76 samples.
## flowSetId :
## NCDF file : /home/greg/data/manual/file7e0047d2a0b0.nc
## An object of class 'AnnotatedDataFrame'
## rowNames: 769097.fcs 769098.fcs ... 769283.fcs (76 total)
## varLabels: name
## varMetadata: labelDescription
##
## column names:
## Time, FSC-A, FSC-H, SSC-A, <FITC-A>, <PerCP Cy55 Blue-A>, <Pacific Blue-A>, <Am Cyan-A>, <APC-A>, <Alexa 680-A>, <APC Cy7-A>, <PE Green laser-A>, <PE Tx RD-A>, PE Cy5-A, PE Cy55-A, <PE Cy7-A>
flowData(G[1:5]) #Subsettig works, and still points to the same file.
## An ncdfFlowSet with 5 samples.
## flowSetId :
## NCDF file : /home/greg/data/manual/file7e0047d2a0b0.nc
## An object of class 'AnnotatedDataFrame'
## rowNames: 769097.fcs 769098.fcs ... 769103.fcs (5 total)
## varLabels: name
## varMetadata: labelDescription
##
## column names:
## Time, FSC-A, FSC-H, SSC-A, <FITC-A>, <PerCP Cy55 Blue-A>, <Pacific Blue-A>, <Am Cyan-A>, <APC-A>, <Alexa 680-A>, <APC Cy7-A>, <PE Green laser-A>, <PE Tx RD-A>, PE Cy5-A, PE Cy55-A, <PE Cy7-A>
We may want to grab the single-cell data for CD8+ T-cells that express IL2 or IFNg.
This is relatively common, so there’s an API for that:
# Get cells that express IL2 or IFNg and are CD8+.
# map tells us how the node names map to markers on channels.
sce<-getSingleCellExpression(G[1:5],c("8+/IL2+","8+/IFNg+"),map=list("8+/IL2+"="IL2","8+/IFNg+"="IFNg"))
## 769097.fcs
## 769098.fcs
## 769099.fcs
## 769101.fcs
## 769103.fcs
str(sce) # list of 5 n x 2 matrices.
## List of 5
## $ 769097.fcs: num[0 , 1:2]
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:2] "IL2" "IFNg"
## $ 769098.fcs: num [1:2, 1:2] 1899 1878 0 0
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:2] "IL2" "IFNg"
## $ 769099.fcs: num [1:246, 1:2] 0 0 0 2788 0 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:2] "IL2" "IFNg"
## $ 769101.fcs: num [1:6605, 1:2] 0 0 0 1883 2368 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:2] "IL2" "IFNg"
## $ 769103.fcs: num [1:7051, 1:2] 2981 3097 1924 2547 2833 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:2] "IL2" "IFNg"
Much of the annotation information can be found in the keywords of the FCS files and the FlowJo workspace. We can extract these and annotate a GatingSet. It contains a phenoData
slot that can be accessed by pData
much like other BioConductor core objects.
Importantly, these annotation can be used to group files for gating, for example by subject, or stimulation and control, batch and so forth.
Other annotations may come from sites like flowrepository.org, and can also be imported from text files.
require(flowIncubator) # the `getKeywords` function is in the `flowIncubator` package at http://www.github.com/RGLab/flowIncubator
## Loading required package: flowIncubator
## Loading required package: QUALIFIER
## Loading required package: reshape
##
## Attaching package: 'reshape'
##
## The following objects are masked from 'package:reshape2':
##
## colsplit, melt, recast
##
## The following objects are masked from 'package:plyr':
##
## rename, round_any
##
##
## Attaching package: 'flowIncubator'
##
## The following object is masked from 'package:flowWorkspace':
##
## getKeywords
names(keyword(G[[1]])) # valid keywords
## [1] "FCSversion" "$BEGINANALYSIS"
## [3] "$ENDANALYSIS" "$BEGINSTEXT"
## [5] "$ENDSTEXT" "$BEGINDATA"
## [7] "$ENDDATA" "$FIL"
## [9] "$SYS" "$TOT"
## [11] "$PAR" "$MODE"
## [13] "$BYTEORD" "$DATATYPE"
## [15] "$NEXTDATA" "CREATOR"
## [17] "TUBE NAME" "$SRC"
## [19] "EXPERIMENT NAME" "GUID"
## [21] "$DATE" "$BTIM"
## [23] "$ETIM" "WINDOW EXTENSION"
## [25] "EXPORT USER NAME" "EXPORT TIME"
## [27] "FSC ASF" "AUTOBS"
## [29] "$INST" "PLATE NAME"
## [31] "WELL ID" "PLATE ID"
## [33] "$TIMESTEP" "SPILL"
## [35] "APPLY COMPENSATION" "THRESHOLD"
## [37] "$P1N" "$P1R"
## [39] "$P1B" "$P1E"
## [41] "$P1G" "P1BS"
## [43] "P1MS" "$P2N"
## [45] "$P2R" "$P2B"
## [47] "$P2E" "$P2V"
## [49] "$P2G" "P2DISPLAY"
## [51] "P2BS" "P2MS"
## [53] "$P3N" "$P3R"
## [55] "$P3B" "$P3E"
## [57] "$P3V" "$P3G"
## [59] "P3DISPLAY" "P3BS"
## [61] "P3MS" "$P4N"
## [63] "$P4R" "$P4B"
## [65] "$P4E" "$P4V"
## [67] "$P4G" "P4DISPLAY"
## [69] "P4BS" "P4MS"
## [71] "$P5N" "$P5S"
## [73] "$P5R" "$P5B"
## [75] "$P5E" "$P5V"
## [77] "$P5G" "P5DISPLAY"
## [79] "P5BS" "P5MS"
## [81] "$P6N" "$P6S"
## [83] "$P6R" "$P6B"
## [85] "$P6E" "$P6V"
## [87] "$P6G" "P6DISPLAY"
## [89] "P6BS" "P6MS"
## [91] "$P7N" "$P7S"
## [93] "$P7R" "$P7B"
## [95] "$P7E" "$P7V"
## [97] "$P7G" "P7DISPLAY"
## [99] "P7BS" "P7MS"
## [101] "$P8N" "$P8S"
## [103] "$P8R" "$P8B"
## [105] "$P8E" "$P8V"
## [107] "$P8G" "P8DISPLAY"
## [109] "P8BS" "P8MS"
## [111] "$P9N" "$P9S"
## [113] "$P9R" "$P9B"
## [115] "$P9E" "$P9V"
## [117] "$P9G" "P9DISPLAY"
## [119] "P9BS" "P9MS"
## [121] "$P10N" "$P10S"
## [123] "$P10R" "$P10B"
## [125] "$P10E" "$P10V"
## [127] "$P10G" "P10DISPLAY"
## [129] "P10BS" "P10MS"
## [131] "$P11N" "$P11S"
## [133] "$P11R" "$P11B"
## [135] "$P11E" "$P11V"
## [137] "$P11G" "P11DISPLAY"
## [139] "P11BS" "P11MS"
## [141] "$P12N" "$P12S"
## [143] "$P12R" "$P12B"
## [145] "$P12E" "$P12V"
## [147] "$P12G" "P12DISPLAY"
## [149] "P12BS" "P12MS"
## [151] "$P13N" "$P13S"
## [153] "$P13R" "$P13B"
## [155] "$P13E" "$P13V"
## [157] "$P13G" "P13DISPLAY"
## [159] "P13BS" "P13MS"
## [161] "$P14N" "$P14R"
## [163] "$P14B" "$P14E"
## [165] "$P14V" "$P14G"
## [167] "P14DISPLAY" "P14BS"
## [169] "P14MS" "$P15N"
## [171] "$P15R" "$P15B"
## [173] "$P15E" "$P15V"
## [175] "$P15G" "P15DISPLAY"
## [177] "P15BS" "P15MS"
## [179] "$P16N" "$P16S"
## [181] "$P16R" "$P16B"
## [183] "$P16E" "$P16V"
## [185] "$P16G" "P16DISPLAY"
## [187] "P16BS" "P16MS"
## [189] "CST SETUP STATUS" "CST BEADS LOT ID"
## [191] "CYTOMETER CONFIG NAME" "CYTOMETER CONFIG CREATE DATE"
## [193] "CST SETUP DATE" "CST BASELINE DATE"
## [195] "Plate" "Sample"
## [197] "Stim" "Replicate"
## [199] "Sample Order" "Type"
## [201] "Comp" "ANALYSIS_PLAN_ID"
## [203] "EXP_ASSAY_ID" "STDY_DESC"
## [205] "WELLROLE" "FH control"
## [207] "FILENAME" "transformation"
## [209] "flowCore_$P1Rmax" "flowCore_$P1Rmin"
## [211] "flowCore_$P2Rmax" "flowCore_$P2Rmin"
## [213] "flowCore_$P3Rmax" "flowCore_$P3Rmin"
## [215] "flowCore_$P4Rmax" "flowCore_$P4Rmin"
## [217] "flowCore_$P5Rmax" "flowCore_$P5Rmin"
## [219] "flowCore_$P6Rmax" "flowCore_$P6Rmin"
## [221] "flowCore_$P7Rmax" "flowCore_$P7Rmin"
## [223] "flowCore_$P8Rmax" "flowCore_$P8Rmin"
## [225] "flowCore_$P9Rmax" "flowCore_$P9Rmin"
## [227] "flowCore_$P10Rmax" "flowCore_$P10Rmin"
## [229] "flowCore_$P11Rmax" "flowCore_$P11Rmin"
## [231] "flowCore_$P12Rmax" "flowCore_$P12Rmin"
## [233] "flowCore_$P13Rmax" "flowCore_$P13Rmin"
## [235] "flowCore_$P14Rmax" "flowCore_$P14Rmin"
## [237] "flowCore_$P15Rmax" "flowCore_$P15Rmin"
## [239] "flowCore_$P16Rmax" "flowCore_$P16Rmin"
## [241] "ORIGINALGUID"
keyword_vars<-c("$FIL","Stim","Sample Order","EXPERIMENT NAME") #relevant keywords
pd<-data.table(getKeywords(G,keyword_vars)) #extract relevant keywords to a data table
annotations<-data.table:::fread(ANNOTATIONS) # read the annotations from flowrepository
setnames(annotations,"File Name","name")
setkey(annotations,"name")
setkey(pd,"name")
head(pd)
## name Stim Sample Order EXPERIMENT NAME
## 1: 769097.fcs negctrl 9 0882-L-080
## 2: 769098.fcs negctrl 9 0882-L-080
## 3: 769099.fcs CMV 9 0882-L-080
## 4: 769101.fcs sebctrl 1 0882-L-080
## 5: 769103.fcs sebctrl 2 0882-L-080
## 6: 769105.fcs sebctrl 3 0882-L-080
head(annotations)
## name Condition Timepoint Individual Sample Description
## 1: 431345.fcs negctrl 2 080-1 PBMCs from healthy subjects
## 2: 431346.fcs negctrl 2 080-1 PBMCs from healthy subjects
## 3: 431348.fcs negctrl 5 080-1 PBMCs from healthy subjects
## 4: 431349.fcs negctrl 5 080-1 PBMCs from healthy subjects
## 5: 431351.fcs negctrl 2 080-2 PBMCs from healthy subjects
## 6: 431352.fcs negctrl 2 080-2 PBMCs from healthy subjects
## Sample Source Description Sample Source
## 1: [biological] Human peripheral blood [biological]
## 2: [biological] Human peripheral blood [biological]
## 3: [biological] Human peripheral blood [biological]
## 4: [biological] Human peripheral blood [biological]
## 5: [biological] Human peripheral blood [biological]
## 6: [biological] Human peripheral blood [biological]
# We match on the name column, which matches the sampleNames(G).
pd<-data.frame(annotations[pd]) #data.table style merge
setnames(pd,c("Timepoint","Individual"),c("VISITNO","PTID")) #Rename Timepoint and Indivudual to VISITNO and PTID which are used in the template files
pData(G)<-pd #annotate
Now that you are more familiar with the basic objects in OpenCyto, we’ll proceed to do some automated gating of the data, and we’ll explore the consequences of modifying different arugments in the gating template.
We need to clone the existing gating set containing the manually gated data if we want to perform OpenCyto automated gating and save our results.
automated<-clone(G)
Next we need to remove the existing gates. root
is the first node.. don’t remove that. Remove the first population below it, which are singlets in this case “S”.
Rm("S",automated) #Rm is the API to remove a node and all nodes beneath it.
automated
is now a GatingSet
containing only compensated and transformed data, with no gates.
Next we will load the gating template.
gating_template<-gatingTemplate(TEMPLATE)
## Warning: Bumped column 10 to type character on data row 7, field contains
## 'K=2'. Coercing previously read values in this column from integer or
## numeric back to character which may not be lossless; e.g., if '00' and
## '000' occurred before they will now be just '0', and there may be
## inconsistencies with treatment of ',,' and ',NA,' too (if they occurred in
## this column before the bump). If this matters please rerun and set
## 'colClasses' to 'character' for this column. Please note that column type
## detection uses the first 5 rows, the middle 5 rows and the last 5 rows, so
## hopefully this message should be very rare. If reporting to
## datatable-help, please rerun and include the output from verbose=TRUE.
## expanding pop: cd8+/-
##
## expanding pop: cd57+/-
##
## expanding pop: cd57+/-
##
## Adding population:boundary
## Adding population:singlet
## Adding population:viable
## Adding population:nonNeutro
## Adding population:DebrisGate
## Adding population:nonDebris
## Adding population:lymph
## Adding population:cd3
## Adding population:cd8+
## Adding population:cd8-
## Adding population:cd4pos
## Adding population:cd4neg
## Adding population:cd4
## Adding population:cd8
## Adding population:TNFa
## Adding population:TNFa
## Adding population:IFNg
## Adding population:IFNg
## Adding population:IL2
## Adding population:IL2
## Adding population:cd57+
## Adding population:cd57-
## Adding population:cd57+
## Adding population:cd57-
## Adding population:prf_gate
## Adding population:prf_gate
## Adding population:Prf
## Adding population:Prf
## Adding population:gzb_gate
## Adding population:gzb_gate
## Adding population:GzB
## Adding population:GzB
Don’t worry about the warning at the end. Reading the template processes it and performs some expansion of populations like cd8+/-
into cd8+
and cd8-
.
We can view the gating template with plot
plot(gating_template)
##
## Attaching package: 'graph'
##
## The following object is masked from 'package:plyr':
##
## join
Next we’ll look at the first row of the template, the entry for the boundary
filter.
options("warn"=-1)
head(fread(TEMPLATE)[1,])
## alias pop parent dims gating_method gating_args
## 1: boundary boundary root FSC-A,SSC-A boundary max=c(2.5e5,2.5e5)
## collapseDataForGating groupBy preprocessing_method preprocessing_args
## 1:
The gating_method
is listed as boundary
, if we look at the list of available gating methods we see it is one of the methods listed:
listgtMethods()
## Gating Functions:
## === quadrantGate
## === quantileGate
## === rangeGate
## === flowClust.2d
## === mindensity
## === cytokine
## === flowClust.1d
## === boundary
## === singletGate
## === tailgate
## === quadGate.tmix
## === quadGate.seq
## Preprocessing Functions:
## === prior_flowClust
## === warpSet
## === standardize_flowset
Other parameters are an alias
, which is how the population is named in the gating tree, and pop
, which tells openCyto
whether to keep the events within the gate or outside the gate. In this case, it keeps the events within the gate by default.
parent
column specifies the name of the parent population, which is “root”, since this is the first population we are defining.dims
column defines the channels on which this gate is applied. Here, it is “FSC-A,SSC-A”, which are the forward and side scatter dimensions. We can verify this by looking at the flowFrame
from a sample.getData(automated[[1]],use.exprs=FALSE) #use.exprs=FALSE don't retrieve events.. faster.
## flowFrame object '9e2f2b56-b28e-4f1a-8ff2-cbd45407257e'
## with 0 cells and 0 observables:
## name desc range minRange maxRange
## $P1 Time <NA> 2987 0.000 2987
## $P2 FSC-A <NA> 262143 0.000 262143
## $P3 FSC-H <NA> 262143 0.000 262143
## $P4 SSC-A <NA> 262143 0.000 262143
## $P5 <FITC-A> TNFa 4098 -1.437 4097
## $P6 <PerCP Cy55 Blue-A> CD8 4098 -1.437 4097
## $P7 <Pacific Blue-A> CD57 4098 -1.437 4097
## $P8 <Am Cyan-A> AViD 4098 -1.437 4097
## $P9 <APC-A> Perforin 4098 -1.437 4097
## $P10 <Alexa 680-A> Granzyme B 4098 -1.437 4097
## $P11 <APC Cy7-A> CD4 4098 -1.437 4097
## $P12 <PE Green laser-A> IL2 4098 -1.437 4097
## $P13 <PE Tx RD-A> CD3 4098 -1.437 4097
## $P14 PE Cy5-A <NA> 4098 -1.437 4097
## $P15 PE Cy55-A <NA> 4098 -1.437 4097
## $P16 <PE Cy7-A> IFNg 4098 -1.437 4097
## 241 keywords are stored in the 'description' slot
gating_args
column are additional arguments to the gating_method
(boundary
). Here we tell the gating method that we want to include events up to a maximum of 2.5e5. Anything outside that range probably corresponds to cellular aggregates.We need to clean up the data because subsequent gating steps, which do some modeling, perform better if the data distribution is well behaved, i.e. smooth and doesn’t have large spikes.
Let’s take a look at the gating template object
gating_template@nodeData@data[["/boundary"]] # node named "/boundary"
## $pop
## An object of class "gtPopulation"
## Slot "id":
## [1] "/boundary"
##
## Slot "name":
## [1] "boundary+"
##
## Slot "alias":
## [1] "boundary"
gating_template@edgeData@data[["root|/boundary"]] #edge from root to the boundary node
## $gtMethod
## boundary(xChannel=FSC-A,yChannel=SSC-A)
##
## $isReference
## [1] FALSE
str(gating_template@edgeData@data[["root|/boundary"]]$gtMethod)
## Formal class 'gtMethod' [package "openCyto"] with 5 slots
## ..@ name : chr "boundary"
## ..@ dims : chr "FSC-A,SSC-A"
## ..@ args :List of 1
## .. ..$ max: language c(250000, 250000)
## ..@ groupBy : chr ""
## ..@ collapse: logi FALSE
We see the node has an “id”, a “name”, and an “alias”, these are taken from the csv template. The edge information has an entry “gtMethod”, which contains a gtMethod
object that has all the info to run the gating method on a sample. It contains the dimensions to gate, the name of the population, the arguments to the method, and groupBy
describing whether the data should be grouped by some phenoData
variable, and collapse
defining whether the data should be collapsed across multiple samples for gating.
The gating template plot shows us that we have a population named “boundary” gated afte the “root” node using the “boundary” gating method (green arrow).
Reference gates are very useful and important to understand. They provide a way to define a gate and re-use that gate threshold later in the gating process. We’ll show how they’re used here to gate out debris.
Let’s look at the path from “viable” to “nonDebris”. It goes through two additional populations, “nonNeutro”, and “DebrisGate”.
head(fread(TEMPLATE)[3:6,]) # Thre four entries in the csv template.
## alias pop parent dims gating_method
## 1: viable viable- singlet AViD mindensity
## 2: nonNeutro nonNeutro- viable SSC-A mindensity
## 3: DebrisGate DebrisGate+ nonNeutro FSC-A mindensity
## 4: nonDebris nonDebris+ viable FSC-A refGate
## gating_args collapseDataForGating groupBy
## 1: gate_range=c(500,1000)
## 2: gate_range=c(5e4,1.5e5)
## 3: gate_range=c(0,1e+05)
## 4: DebrisGate
## preprocessing_method preprocessing_args
## 1:
## 2:
## 3:
## 4:
These consist of a one-dimensional mindensity
gate to exclude dead cells (the gate on the “AViD” parameter which looks for a minimum density cutpoint in the data distribution between 500 and 1000). Then, two minimum density cutpoints on side-scatter and forward scatter, which remove high side-scatter events, and debris, respectively. The gate_range
argument to mindensity
allows you to specify a range of the data where you expect the separation between the populations to occur. This is why it is important to have well-standardized experiments. If you have lots of instrument variation, the data will have very different distributions from sample to sample, and automated gating approaches will have a very difficult time with such data.
The final gate is a refGate
, or “referenceGate”. It is applied to the viable
population on the forward scatter channel, and uses the threshold derived from the DebrisGate
population. Why not just apply a mindensity gate on forward scatter directly to the live cells? The reason is that high side-scatter events sometimes get in the way of identifying a good cutpoint for debris in the forward scatter channel, so these are removed first, then the resulting cutpoint is used directly on the live cells.
OpenCyto can use multidimensional mixture modeling tools such as flowClust
which was developed for flow cytometry automated gating. We use this most frequently to gate on lymphocytes. The definition of the lymphocyte population, named “lymph” is shown below in the csv.
head(fread(TEMPLATE)[7,]) # The lymph population in the csv template.
## alias pop parent dims gating_method gating_args
## 1: lymph lymph nonDebris FSC-A,SSC-A flowClust K=2,quantile=0.99
## collapseDataForGating groupBy preprocessing_method preprocessing_args
## 1: prior_flowClust K=2
dims
column lists two dimensions for gating.pop
column lists one population with no + or - sign. This indicates to OpenCyto that this is a multidimensional gate.gating_method
is “flowClust”. A 1d and 2d version of this can be found in the listgtMethods()
output, but openCyto chooses the correct one based on context.gating_args
include standard arguments to flowClust
. In this case we specify K=2
that we want to identify two populations, and quantile=0.99
that we want to include 99% of cells in the gate.preprocessing_method
uses prior_flowClust
which looks at the data before gating it and estimates a data-driven prior for the two populations. This used to speed up convergence when fitting the model to individual samples.The output is a single population named “lymph”. By default, openCyto will select the population with highest proportion. However, if that’s not desired, then you can pass a target=c(x,y)
argument, where “x” and “y” are the approximate x and y location of the cell population of interest (usually it is fairly consistent across samples in an experiment). OpenCyto will look at the distance to this target of each cell population identified in each sample and return the one that is closest.
head(fread(TEMPLATE)[9,]) # The four entries in the csv template.
## alias pop parent dims gating_method gating_args
## 1: * cd8+/- cd3 cd8 mindensity gate_range=c(1500,2800),min=0
## collapseDataForGating groupBy preprocessing_method preprocessing_args
## 1:
Above, you see the definition of the CD8+/- cell subset. Some things you should note that are different from what we’ve seen so far:
gate_range
here covers about 31.9489 percent of the data range.pop
argument “cd8+/-” to define two populations: “cd8+” and “cd8-”, named automatically, thus the empty alias
field.We can see the expanded populations in the gating template graph.
# Two nodes for the single input row "cd8+/-"
gating_template@nodeData@data[["/boundary/singlet/viable/nonDebris/lymph/cd3/cd8+"]]
## $pop
## An object of class "gtPopulation"
## Slot "id":
## [1] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8+"
##
## Slot "name":
## [1] "cd8+"
##
## Slot "alias":
## [1] "cd8+"
gating_template@nodeData@data[["/boundary/singlet/viable/nonDebris/lymph/cd3/cd8-"]]
## $pop
## An object of class "gtPopulation"
## Slot "id":
## [1] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8-"
##
## Slot "name":
## [1] "cd8-"
##
## Slot "alias":
## [1] "cd8-"
Things appear to get complex as we get to the CD4 and CD8 populations. But they’re actually not so bad.
Let’s start by actually running the gating on the first few samples.
#gate the first 5 samples
nodes(gating_template)
## [1] "root"
## [2] "/boundary"
## [3] "/boundary/singlet"
## [4] "/boundary/singlet/viable"
## [5] "/boundary/singlet/viable/nonNeutro"
## [6] "/boundary/singlet/viable/nonNeutro/DebrisGate"
## [7] "/boundary/singlet/viable/nonDebris"
## [8] "/boundary/singlet/viable/nonDebris/lymph"
## [9] "/boundary/singlet/viable/nonDebris/lymph/cd3"
## [10] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8+"
## [11] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8-"
## [12] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8-/cd4pos"
## [13] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8+/cd4neg"
## [14] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd4"
## [15] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8"
## [16] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd4/TNFa"
## [17] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8/TNFa"
## [18] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd4/IFNg"
## [19] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8/IFNg"
## [20] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd4/IL2"
## [21] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8/IL2"
## [22] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd4/cd57+"
## [23] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd4/cd57-"
## [24] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8/cd57+"
## [25] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8/cd57-"
## [26] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd4/cd57-/prf_gate"
## [27] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8/cd57-/prf_gate"
## [28] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd4/Prf"
## [29] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8/Prf"
## [30] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd4/cd57-/gzb_gate"
## [31] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8/cd57-/gzb_gate"
## [32] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd4/GzB"
## [33] "/boundary/singlet/viable/nonDebris/lymph/cd3/cd8/GzB"
gating(gating_template,automated[1:10])
## Loading required package: parallel
## Gating for 'boundary'
## done.
## Gating for 'singlet'
## done.
## Gating for 'viable'
## done.
## Gating for 'nonNeutro'
## done.
## Gating for 'DebrisGate'
## done.
## Population 'nonDebris'
## done.
## Preprocessing for 'lymph'
## Gating for 'lymph'
## Using the parallel (multicore) version of flowClust with 1 cores
## Using the parallel (multicore) version of flowClust with 1 cores
## Using the parallel (multicore) version of flowClust with 1 cores
## Using the parallel (multicore) version of flowClust with 1 cores
## Using the parallel (multicore) version of flowClust with 1 cores
## Using the parallel (multicore) version of flowClust with 1 cores
## Using the parallel (multicore) version of flowClust with 1 cores
## Using the parallel (multicore) version of flowClust with 1 cores
## Using the parallel (multicore) version of flowClust with 1 cores
## Using the parallel (multicore) version of flowClust with 1 cores
## done.
## Gating for 'cd3'
## done.
## Gating for 'cd8+'
## done.
## Gating for 'cd4neg'
## done.
## Population 'cd8'
## done.
## Gating for 'cd57+'
## done.
## Population 'cd57-'
## done.
## Preprocessing for 'gzb_gate'
## Gating for 'gzb_gate'
## done.
## Population 'GzB'
## done.
## Preprocessing for 'prf_gate'
## Gating for 'prf_gate'
## done.
## Population 'Prf'
## done.
## Preprocessing for 'IL2'
## Gating for 'IL2'
## done.
## Preprocessing for 'IFNg'
## Gating for 'IFNg'
## done.
## Preprocessing for 'TNFa'
## Gating for 'TNFa'
## done.
## Population 'cd8-'
## done.
## Gating for 'cd4pos'
## done.
## Population 'cd4'
## done.
## Gating for 'cd57+'
## done.
## Population 'cd57-'
## done.
## Preprocessing for 'gzb_gate'
## Gating for 'gzb_gate'
## done.
## Population 'GzB'
## done.
## Preprocessing for 'prf_gate'
## Gating for 'prf_gate'
## done.
## Population 'Prf'
## done.
## Preprocessing for 'IL2'
## Gating for 'IL2'
## done.
## Preprocessing for 'IFNg'
## Gating for 'IFNg'
## done.
## Preprocessing for 'TNFa'
## Gating for 'TNFa'
## done.
## finished.
Now let’s plot the cd8-, cd4pos, and cd4 gates.
plotGate(automated[[1]],c("cd8-","cd4pos","cd4"),gpar=list(nrow=1)) #gpar adjusts graphical parameters so we can layout by row.
Now we see more clearly what is happening. The “cd8-” gate defines the set of CD8- cells (71% in the samples, first panel), which are passed into the “cd4pos” gate, so CD4 is gated conditional on CD8 (second panel). The thresholds from these two gates are composed via the “reference gate” to define CD4+/CD8- T-cells, whose parent is CD3 (third panel).
Similar approaches using “refGate” are used to gate Perforin, and Grazyme B in this data set.
We can use the annotations in pData
to help OpenCyto group samples for gating. In this data set, which consists of antigen-stimulated T-cells and subject-matched non-stimulated controls, it makes sense to gate data by PTID (subject), and even by VISIT (subject visit).
To group the data for gating by certain variables in the pData
, add those variables to the groupBy
column, using standard R notation for combining levels of factors. In this case, for the “TNFa” gate, we see that the groupBy
column has the “PTID:VISIT” variables listed. So samples with unique levels of PTID and VISIT will be combined and passed to the gating routine.
fread(TEMPLATE)[14,]
## alias pop parent dims gating_method gating_args
## 1: TNFa TNFa+ cd4 TNFa cytokine adjust=2,tol=2e-3
## collapseDataForGating groupBy preprocessing_method
## 1: PTID:VISITNO standardize_flowset
## preprocessing_args
## 1:
The cytokines in this data set are gated using a routine called cyokine
aka tailgate
. These are interchangeable and alias the same routine.
There are two parameters of particular importance:
adjust
specifies the amount of smoothing when estimating the density.tol
is the tolerance within which the first derivative of the density is allowed to appraoch zero (i.e. how flat) before a threshold is drawn.The cytokine / tailgate is meant for use in a situation where there is one major cell population (perhaps a major negative peak), and very few positive cells, such that they may not register as a peak on a density estimate. The routine calculates the first derivative of the density and looks at the value on the (left or right) side of the density (side="left"
or side="right"
parameter). When the value is within tol
of zero, it places a gate.
In general the routine works better than a fixed quantile for discriminating rare cell populations, and allows for variable levels of background.
The cytokine
or tailgate
is pared with a preprocessing routine named standardize_flowset
. This routine will take all the samples within a group, estimate a set of data transformations that will standarize the samples to each other, collapse the transformed data and pass it along to the gating routine, along with the transformation functions, and the original data.
The gating routine, if it receives the preprocessing results from standardize_flowset, will gate the collapsed data, then back-transform that gate location with the appropriate sample-specific transformation, and use that to gate the original sample-specific data.
The above enables the grouped data to have a common gate threshold on the transformed scale, and on that transformed scale, the data are standardized, thus comparable across samples.
If the preprocessing argument is omitted, then gating proceeds immediately using tailgate
or cytokine
gate. In order to pass grouped and collapsed samples on to the gating routine, you need to specify both the groupBy
and collapseDataForGating
columns in the template. Then a single collapsed flowSet is passed on to tailgate (without standardization), and each sample will share a common gate threshold on the untransformed scale.
In general groupBy
specifies how the data are grouped before being passed to preprocessing if preprocessing is performed. For example, for flowClust, the groups could specify sets of samples that should be used to estimate a common prior.
The collapseDataForGating
columns specifies whether the grouped data should be collapsed into a single flowFrame before being passed on to the gating routine. If this is the case, then a common threshold is estimated for the set of samples.
An example of this is the CD57+/- set of populations.
fread(TEMPLATE)[20,]
## alias pop parent dims gating_method gating_args
## 1: * cd57+/- cd4 cd57 mindensity gate_range=c(1000,3000),min=0
## collapseDataForGating groupBy preprocessing_method
## 1: TRUE PTID:VISITNO
## preprocessing_args
## 1:
The gate above uses mindensity to find a threshold for the CD57+ vs CD57- populations. It finds a common threshold for unique levels of PTID and VISIT by setting the groupBy
column with collapseDataForGating=TRUE
. It does no preprocessing.
standardize_flowset
is the only exception to the above. In order to get a common threshold for a group of samples on the standarized scale, you must specify a value for groupBy
but set collapseDataForGating
to FALSE
, otherwise an error will be thrown.
pop
columnPlus (+) and minus (-) signs are reserved characters. Your channel and marker names must not include + or - signs as these are interpreted as population modifiers that tell OpenCyto whether to keep the positive or negative part of a population (events in our out of the gate).
If your channel names have special characters like these, use marker names in the pop
column instead.
You can plug in your own routines for gating_method
and preprocessing_method
by using the registerPlugins
API.
This looks like the following:
args(registerPlugins)
## function (fun = NA, methodName, dep = NA, ...)
## NULL
?registerPlugins
The registration routine requires a function fun
, a name methodName
, and a dep
argument that specifies any library dependencies (for example if the gating routine requires functionality from a different package). Finally there are additional arguments that specify whether the routine is a “gating” or “preprocessing” routine.
The fun
argument is a function that wraps around a gating routine and gives it a common interface. A gating method should have the following formal arguments:
fr
a flowFramepp_res
the output of preprocessingxChannel
a character (optional)yChannel
a character (required)filter_id
a character that identifies the resulting filter....
additional parameters for the method.A preprocessing method should have the following arguments:
fs
a flow set that stores the flow data, possibly output by groupBy
gs
a gating setgm
a gtMethod
object that stores information about the gating methodxChannel
yChannel
...
additional arguments for the preprocessing method.Within the wrapper, you can extract the information you need to pass on to your gating or preprocessing routine. As defined by the formal arguments, you have access to the data in fr
, the output of any preprocessing in pp_res
, the channels via xChannel
and yChannel
(yChannel
is always specified, and xChannel
may be NULL
), and the filter_id
which you can use to name the gate that you return to OpenCyto.
Within the preprocessing you have access to the flowSet of data, the GatingSet which allows you to access other annotations, the gating method object via gm
, and the channels to use for gating.
This allows the plugin-writer a considerable amount of flexibility and creativity to process the data as they please. For example, we can envision a plugin that would use FMO controls to set the gate thresholds. The preprocessing could match the xChannel
and yChannel
with FMO controls from the gating set in gs
, and gate those controls and pass the thresholds as part of pp_res
on to the gating method.
The gating method would construct the actual gates, or use those thresholds to refine some more complicated gating scheme.
The gating function
must return a fiter
object, defined in flowCore
, or may instantiate its own type of filter
. Most commonly, we return a polygonGate
.
Finally, wrapper functions must begin with a dot (.) - So if you have a gating method you call “myMethod” in the methodName
argument, the function passed to the fun
argument should be named .myMethod
If registration succeeds, registerPlugins
returns TRUE
.
The following plugin will collapse negative controls and stimulated samples per subject based on variables in the groupBy
template column.
It will then use a control
column in the phenoData
of the flowSet
to identify which events belong to the control sample and which belong to the stimualted sample.
The gating routine will subset the control events, define a gate threshold and apply it to all samples in the group, including the stimulated samples.
# The gate method will subset the flowFrame on the "control" events and gate those, returning a "negative control" gate.
# This could work equally well with FMOs.
.negGate <- function(fr, pp_res, xChannel=NA, yChannel=NA, filterId="ppgate", ...){
my_gate <- tailgate(fr[pp_res,],channel=yChannel, filter_id=filterId, ...)
return(my_gate)
}
registerPlugins(fun=.negGate,methodName='negGate',dep=NA)
## Registered negGate
## [1] TRUE
# Identifies which events belong to the negative control sample vs the treatment sample.
# groupBy includes PTID
# Treatment information comes from the pData of the flowSet
# passes a vector of 0/1 for treamtent / control events
.ppnegGate <- function(fs, gs, gm, xChannel, yChannel, groupBy, isCollapse, ...) {
d <- c()
for(i in c(1:length(fs))) {
d = c(d,rep.int(pData(fs[i])$control,nrow(exprs(fs[[i]]))))
}
return(as.logical(d))
}
registerPlugins(fun=.ppnegGate, methodName='ppnegGate', dep=NA, "preprocessing")
## Registered ppnegGate
## [1] TRUE