1 Installation

To install this package, start R and enter (un-commented):

# if (!require("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")
# 
# BiocManager::install("CytoMDS")

Loading dependent packages for the present vignette…

library(CytoPipeline)
library(CytoMDS)
library(ggplot2)

2 Introduction

The CytoMDS package implements a low dimensional visualization of a set of cytometry samples, in order to visually assess the ‘distances’ between them. This, in turn, can greatly help the user to identify quality issues like batch effects or outlier samples, and/or check the presence of potential sample clusters that might align with the experimental design.

The CytoMDS algorithm combines, on the one hand, the concept of Earth Mover’s Distance (EMD) (Orlova et al. 2016), a.k.a. Wasserstein metric and, on the other hand, the metric Multi Dimensional Scaling (MDS) algorithm for the low dimensional projection (Leeuw and Mair 2009).

Also, the package provides some diagnostic tools for both checking the quality of the MDS projection, as well as tools to help with the interpretation of the axes of the projection (see below sections).

3 Illustrative datasets

Illustrative data sets that will be used throughout this vignette are derived from a reference public dataset accompanying the OMIP-021 (Optimized Multicolor Immunofluorescence Panel 021) article (Gherardin et al. 2014).

A sub-sample of this public dataset is built-in in the CytoPipeline package (Hauchamps and Gatto 2023) as the OMIP021 dataset.

In the CytoMDSpackage, as in the current vignette, matrices of flow cytometry events intensities are stored as flowCore::flowFrame (Ellis et al. 2023) objects.

Note that the OMIP021 dataset only contains two samples, from two human donors. Therefore, in order to be able to meaningfully illustrate CytoMDS use cases, we will be building data sets with more samples, simulated by combining events, sampled from the two original OMIP021 samples.

The first step consists in scale transforming the two raw flow frames, using estimated scale transformations. Indeed, distances between samples make more sense with scaled transformed signal, in which distributional differences are much more obvious. In order to transform the signal of the different channels, here we use the estimateScaleTransforms() function from the CytoPipeline package. However, this can be done using any standard package for flow cytometry data.

data(OMIP021Samples)

#outputDir <- base::tempdir()

transList <- estimateScaleTransforms(
    ff = OMIP021Samples[[1]],
    fluoMethod = "estimateLogicle",
    scatterMethod = "linearQuantile",
    scatterRefMarker = "BV785 - CD3")

OMIP021Trans <- CytoPipeline::applyScaleTransforms(
    OMIP021Samples[,c(1:16,18,20:22)], # removing 'EMPTY' channels
    transList)

We now create two simulated data sets, of 20 samples each, by combining events from the two samples of the OMIP021 original data set.

We also attached to each sample, some ‘phenoData’ dataframe, describing each sample with some chosen variables.

The first simulated data set, which is called here OMIP021Sim1, is built by sub-sampling 5 times from each of the two samples (alternatively).

nSample <- 20
ffList <- list()

# to allow for reproducibility of sample()
set.seed(0)

for (i in seq_len(nSample)) {
    ffList[[i]] <- CytoPipeline::subsample(
                OMIP021Trans[[(i+1)%%2+1]],
                nEvents = 1000,
                seed = i)
}
OMIP021Sim1 <- as(ffList, "flowSet")

pData1 <- data.frame(name = paste0("S", seq_len(nSample)),
                     original_sample = factor(rep(c(1,2), 5)))

The second one, OMIP021Sim2 is built by mixing events from sample 1 and sample 2 with respective proportions varying from 1 to 0 (resp. 0 to 1).

nSample <- 20
ffList <- list()
S1prop <- rep(0., nSample)

# to allow for reproducibility of sample()
set.seed(0)

for (i in seq_len(nSample)) {
    if (i==1) {
        ffList[[i]] <- OMIP021Trans[[1]]
        S1prop[i] <- 1
    } else if (i==nSample) {
        ffList[[i]] <- OMIP021Trans[[2]]
        S1prop[i] <- 0
    } else {
        ff1 <- CytoPipeline::subsample(
            OMIP021Trans[[1]],
            nEvents = 1000 * (nSample-i)/(nSample-1),
            seed = i)
        ff2 <- CytoPipeline::subsample(
            OMIP021Trans[[2]],
            nEvents = 1000 * (i-1)/(nSample-1),
            seed = i)
        S1prop[i] <- (nSample-i)/(nSample-1)
        
        ffList[[i]] <- CytoPipeline::aggregateAndSample(
            flowCore::flowSet(ff1, ff2),
            nTotalEvents = 1000000 # big number to have a simple aggregation
        )[,1:20]
    }
}
OMIP021Sim2 <- as(ffList, "flowSet")

pData2 <- data.frame(name = paste0("S", seq_len(nSample)),
                     origin = factor(c("Raw", rep("Sim", nSample-2), "Raw")),
                     rawLabel = c("D1", rep("", nSample-2), "D2"),
                     S1prop = S1prop)

4 Calculating distances between samples

We can now calculate pairwise Earth Mover’s Distances (EMD) between all samples of our simulated data sets.

This is done by calling the pairwiseEMDDist() function.
The latter function takes here a flowCore::flowSet as input parameter, which is a collection of all samples as flowCore::flowFrame objects. Note that, for heavy data sets that contains a lot of samples, this can create memory issues. In that case, there are other ways to call the pairwiseEMDDist() function (see ‘Handling heavy data sets’ section).

Using the channels argument, it is possible to restrict the EMD calculation to some of the channels. However, by default, all signal channels will be incorporated. Signal channels means all scatter and fluorescent channels. The known non-relevant channels for multivariate distribution distances, like time and other usual house-keeping channels, are automatically excluded from the calculation.

pwDist1 <- pairwiseEMDDist(
    OMIP021Sim1, channels = NULL)

pwDist2 <- pairwiseEMDDist(
    OMIP021Sim2, channels = NULL)

The calculated distance is a symmetric square matrix, with as many rows (columns) as input samples (extract shown here below for OMIP021Sim1 simulated data set).

round(pwDist1[1:10, 1:10], 2)
##       1    2    3    4    5    6    7    8    9   10
## 1  0.00 1.77 0.56 1.63 0.66 1.68 0.50 1.77 0.42 1.86
## 2  1.77 0.00 1.90 0.51 2.18 0.58 1.63 0.44 1.96 0.44
## 3  0.56 1.90 0.00 1.78 0.67 1.93 0.68 1.84 0.57 2.03
## 4  1.63 0.51 1.78 0.00 1.95 0.60 1.55 0.55 1.76 0.60
## 5  0.66 2.18 0.67 1.95 0.00 2.06 0.97 2.21 0.46 2.26
## 6  1.68 0.58 1.93 0.60 2.06 0.00 1.60 0.74 1.87 0.63
## 7  0.50 1.63 0.68 1.55 0.97 1.60 0.00 1.59 0.70 1.72
## 8  1.77 0.44 1.84 0.55 2.21 0.74 1.59 0.00 1.96 0.50
## 9  0.42 1.96 0.57 1.76 0.46 1.87 0.70 1.96 0.00 2.04
## 10 1.86 0.44 2.03 0.60 2.26 0.63 1.72 0.50 2.04 0.00

One relevant way to visualize this distance matrix is to draw the histogram of pairwise distances, as shown in the below plot, for the OMIP021Sim1 simulated data set. Notice here the bi-modal distribution of the distances, due to the way the data set was generated (half of the samples originate from sub-sampling the first original sample, and the other half originates from sub-sampling of the second original sample.)

distVec1 <- pwDist1[upper.tri(pwDist1)]
distVecDF1 <- data.frame(dist = distVec1)
pHist1 <- ggplot(distVecDF1, mapping = aes(x=dist)) + 
    geom_histogram(fill = "darkgrey", col = "black", bins = 15) + 
    theme_bw() + ggtitle("EMD distances for data set 1")
pHist1

The same type of pairwise distance histogram, but this time for the OMIP021Sim12 simulated data set, looks much different: here the distribution is unimodal and more regular, as one can expect from the way this second simulated data set was generated.

distVec2 <- pwDist2[upper.tri(pwDist2)]
distVecDF2 <- data.frame(dist = distVec2)
ggplot(distVecDF2, mapping = aes(x=dist)) + 
    geom_histogram(bins = 15, fill = "darkgrey", col = "black") + 
    theme_bw() + ggtitle("EMD distances for data set 2")

5 Metric Multidimensional scaling

5.1 Calculating the MDS projection

Once the pairwise distance matrix has been calculated, computing the Multi Dimensional Scaling (MDS) projection is done by calling the computMetricMDS() function. In its simplest form, only the distance matrix needs to be passed to the function. In that case, the number of dimensions to use in the MDS is automatically set in order to reach a specific value for a projection quality indicator, i.e. a target pseudo R square, which in turn is set by default set to 0.95 (see Quality of projection - diagnostic tools section).

Note that the Smacof algorithm (Leeuw and Mair 2009), used to compute the MDS projection, is stochastic, so it is sensitive to the ‘seed’ used. Therefore, in cases where reproducible results from one run to another is required , it is advised to set the seed argument to a specific value.

mdsObj1 <- CytoMDS::computeMetricMDS(pwDist1, seed = 0)

5.2 Plotting the MDS projection

Plotting the obtained MDS projection is done using ggplotSampleMDS(). If no phenoData is used, then, by default, numbers are used as labels, and the samples are represented as black dots.

ggplotSampleMDS(mdsObj1)

However, by providing a ‘phenoData’ dataframe to the ggplotSampleMDS() function, the corresponding variable can be used for highlighting sample points with different colours and/or shapes. Here below, the previous plot is enhanced with red and blue colours, dot and triangle shapes, distinguishing samples based on the value of the original_sample variable. Also, we have here added explicit labels to each data point, using the corresponding value of the name variable for each sample.

ggplotSampleMDS(mdsObj1, 
                pData = pData1, 
                pDataForColour = "original_sample",
                pDataForShape = "original_sample",
                pDataForLabel = "name")

5.3 Quality of projection - diagnostic tools

In order to be able to trust the projected distances obtained on the CytoMDS plots, a couple of projection quality indicators need to be taken into account: - the pseudo RSquare indicator shows what percentage of the variability contained in the pairwise distance matrix is actually shown in the projection. It is analog to the statistical RSquare for a linear regression model: the closer to one the pseudo RSquare is, the better.

Note that the latter refers to the variability contained in ALL dimensions of the MDS projection, not only the two plotted axes. - nDim is the number of dimensions of the projection that was needed to obtain the corresponding pseudo RSquare - the percentage of variation that is captured along each axis (coordinates), is to be interpreted with respect to the total variability that is captured by the MDS projection, not the total variability. For example, in the plot above, using 2 dimensions, the MDS projection is able to capture 97.01% (pseudo RSquare) of the initial variability contained in the calculated pairwise distance matrix. Of these 97.01%, 90.96% is in turn captured by axis 1, and 9.04% is captured by axis 2.

Another useful projection quality diagnostic tool is provided by the Shepard diagram. On this plot, each dot represents one distance between a sample pair, with as x coordinate the original (high dimensional) distance between the two samples, and as y coordinate the projected low dimensional distance between these two samples, as obtained by the MDS projection algorithm. In the Shepard diagram, an ideal situation corresponds to all points being located on the straight line passing through through the (0,0) and (1,1) points.

ggplotSampleMDSShepard(mdsObj1)

5.4 Additional options

In order to show some of the additional options available to the user of computeMetricMDS() and ggplotSampleMDS(), let us project the pairwise sample distances of the OMIP021Sim2 data set.

mdsObj2 <- CytoMDS::computeMetricMDS(pwDist2)
ggplotSampleMDS(mdsObj2,
                pData = pData2, 
                pDataForLabel = "rawLabel",
                pDataForShape = "origin",
                pDataForColour = "S1prop")

In the subtitle of this latter plot, it is mentioned that the obtained R square of 95.41% was obtained thanks when using 3 dimensions in the Multi Dimensional Scaling. Therefore, one can visualize the MDS projection using any combination of two axes, for example axes 2 and 3, as below:

ggplotSampleMDS(mdsObj2,
                projectionAxes = c(2, 3),
                pData = pData2, 
                pDataForLabel = "rawLabel",
                pDataForShape = "origin",
                pDataForColour = "S1prop")

Now it is also possible to impose the number of dimensions used in the MDS projection explicitly, for example to 2, as is shown below:

mdsObj2_2 <- CytoMDS::computeMetricMDS(pwDist2, nDim = 2)
ggplotSampleMDS(mdsObj2_2,
                pData = pData2, 
                pDataForLabel = "rawLabel",
                pDataForShape = "origin",
                pDataForColour = "S1prop")

Note that the obtained projection on 2 axes, although similar, is not exactly the same as the one obtained when visualizing the first two axis of the MDS projected before, on 3 dimensions. Actually, this is a feature of the Metric MDS projection, although it might appear a bit counter-intuitive at first.

Finally, it is also possible to adjust the number of dimensions indirectly, by setting an explicit pseudo Rsquare target. In that case the algorithm will increase the number of dimensions until reaching the required quality target. The below example shows how to obtain a pseudo R Square of at least 0.99. Here the obtained number of dimensions is 6, instead of 3.

mdsObj2_3 <- CytoMDS::computeMetricMDS(pwDist2, targetPseudoRSq = 0.99)
ggplotSampleMDS(mdsObj2_3,
                pData = pData2, 
                pDataForLabel = "rawLabel",
                pDataForShape = "origin",
                pDataForColour = "S1prop")

The corresponding Shepard diagram is obtained as below:

ggplotSampleMDSShepard(mdsObj2_3)

5.5 Aid to interpreting projection axes

With MDS projections, it is possible to (try to) associate some axis directions to specific characteristics of the samples. The idea is to calculate the correlation of well chosen sample statistics w.r.t. the axes of projection, so that these correlations can be represented on a correlation circle, which is in turn overlaid on the projection plot. This plot set-up is called a ‘bi-plot’.

In order to leverage on this functionality, the user first needs to calculate some statistics of interest for which they want to assess the association with the axis directions. Typically, one chooses channel specific statistics, like e.g. the mean, the standard deviation, or any quantile that might be of interest. However, any statistics that can be calculated for each sample can be used (number of events,…)

Here below, we provide an example where the user overlays the median of the different channels, on a bi-plot for the MDS projection obtained for the first data set.

On the bi-plot, each arrow - here representing a channel median - is located at coordinates equal to its Pearson correlation with the respective axis.

Here, one can identify that the x axis has a strong positive correlation with the median of markers ‘Viability’, ‘gdTCR’, ‘TCR Va7’, ‘CD45-RA’, ‘CD8a’ and ‘CD27’, and a strong negative correlation with the median of channels ‘FSC-A’, ‘SSC-A’ and marker ‘CD161’. The y axis has a strong negative correlation with the medians of markers ‘CD28’, and ‘CD3’.

medians <- channelSummaryStats(OMIP021Sim1, statFUNs = median)
ggplotSampleMDS(mdsObj1, 
                pData = pData1, 
                pDataForColour = "original_sample",
                pDataForShape = "original_sample",
                displayPointLabels = FALSE,
                displayArrowLabels = TRUE,
                repelArrowLabels = TRUE,
                biplot = TRUE,
                extVariables = medians)

Note that, on the bi-plots, only the arrows of length greater or equal to a specific threshold (by default set at 0.8) are represented, in order to not overwhelm the plot with arrows, especially when the data sets contains a high dimensional panels.

It is however possible to adjust this threshold by explicitly setting the arrowThreshold argument. For example, in the below plot, this threshold is set set to 0.9:

ggplotSampleMDS(mdsObj1, 
                pData = pData1, 
                pDataForColour = "original_sample",
                pDataForShape = "original_sample",
                displayPointLabels = FALSE,
                displayArrowLabels = TRUE,
                repelArrowLabels = TRUE,
                biplot = TRUE,
                extVariables = medians,
                arrowThreshold = 0.9) 

Instead of having one bi-plot related to a specific type of statistics, for example channel medians, one can try to associate the axes to different types of channel statistics at once. In the next plot, we represent such bi-plots for channel medians, 25% and 75% quantiles, and standard deviations.

The ‘faceting-alike’ plot is obtained thanks to the ggplotSampleMDSWrapBiplots() function, which internally calls ggplotSampleMDS() function several times, and arrange the obtained outputs on a single plot.

statFUNs = c("median" = stats::median,
             "Q25" = function(x, na.rm) {
                 stats::quantile(x, probs = 0.25)
             },
             "Q75" = function(x, na.rm) {
                 stats::quantile(x, probs = 0.75)
             },
             "standard deviation" = stats::sd)
chStats <- channelSummaryStats(OMIP021Sim1, statFUNs = statFUNs)
ggplotSampleMDSWrapBiplots(
    mdsObj1, 
    extVariableList = chStats,
    ncol = 2,
    pData = pData1,
    pDataForColour = "original_sample",
    pDataForShape = "original_sample",
    displayPointLabels = FALSE,
    displayArrowLabels = TRUE,
    repelArrowLabels = TRUE,
    arrowThreshold = 0.9,
    displayLegend = FALSE) 
## Warning: ggrepel: 12 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Note that the last plot, with arrows corresponding to channel standard deviations, ganerates a warning indicating that the arrow labels could not be displayed. This is due to the fact that too many channel standard deviations are strongly negatively correlated with the x axis. When encountering such visual issues, it is advised to generate series of bi-plots, with subsets of channel statistics, in order to better identify the strongly correlated ones. One example is provided below:

stdDevs <- list()
stdDevs[["std dev of channels 1 to 6"]] <- 
    chStats[["standard deviation"]][,1:6]
stdDevs[["std dev of channels 7 to 12"]] <- 
            chStats[["standard deviation"]][,7:12]
stdDevs[["std dev of channels 13 to 18"]] <- 
            chStats[["standard deviation"]][,13:18]
ggplotSampleMDSWrapBiplots(
    mdsObj1, 
    ncol = 1,
    extVariableList = stdDevs,
    pData = pData1, 
    pDataForColour = "original_sample",
    pDataForShape = "original_sample",
    displayPointLabels = FALSE,
    displayArrowLabels = TRUE,
    repelArrowLabels = TRUE,
    arrowThreshold = 0.9) 

6 Handling heavy datasets

Computing Earth Mover’s Distances between all sample pairs of large data sets (e.g. with hundreds of samples), is a heavy computational task.

First, loading the whole data set as a flowCore::flowSet() in RAM at once, might not be possible due to its size. Second, calculating a matrix of pairwise distances, has a computational complexity of O(N2), which can lead to very long computation times for large data sets.

Therefore, the CytoMDS package provides several mechanisms allowing to mitigate these issues.

6.1 Loading flow frames dynamically during distance matrix computation

In order to be able to handle data set of greater size than the available computer RAM, the pairwiseEMDDist() function allows for differential input mode, where: - the input samples are NOT provided directly via a flowCore::flowSet, but - the user provides the nb of samples, and a user-written flow frame loading function that will be called to dynamically load the ith sample upon request, and optionally additional arguments.

Typically, the flow frame loading function provided by the user shall describe how to read the ith sample from disk.

In the below, an example using data set OMIP021Sim1 is provided. This is
for illustrative purpose only, as this data set is light enough to reside fully in memory. In such a workflow, it is advised, as a preliminary step, to store all scale transformed samples on disk. Here we do this in a temporary directory. Note this scale transformation could also be done on the fly, as part of the flow frame loading function. However, this would then require scale transforming the channel intensities a number of times during distance matrix calculation, which would be inefficient.

storageLocation <- suppressMessages(base::tempdir())

nSample <- length(OMIP021Sim1)
fileNames <- file.path(
    storageLocation,
    paste0("OMIP021Sim1_Sample", seq_len(nSample), ".rds"))

for (i in seq_len(nSample)) {
    saveRDS(OMIP021Sim1[[i]], 
            file = fileNames[i])
}

Then, we now call the pairwiseEMDDist() function, and specify a loading function.

pwDist1Again <- pairwiseEMDDist(
    x = nSample,
    loadFlowFrameFUN = function(ffIndex, theFiles){
        readRDS(file = theFiles[ffIndex])
    },
    loadFlowFrameFUNArgs = list(theFiles = fileNames),
    verbose = TRUE
)
## Pre-calculating all histograms...
## Loading file 1...
## Calculating histogram for file 1...
## Loading file 2...
## Calculating histogram for file 2...
## Loading file 3...
## Calculating histogram for file 3...
## Loading file 4...
## Calculating histogram for file 4...
## Loading file 5...
## Calculating histogram for file 5...
## Loading file 6...
## Calculating histogram for file 6...
## Loading file 7...
## Calculating histogram for file 7...
## Loading file 8...
## Calculating histogram for file 8...
## Loading file 9...
## Calculating histogram for file 9...
## Loading file 10...
## Calculating histogram for file 10...
## Loading file 11...
## Calculating histogram for file 11...
## Loading file 12...
## Calculating histogram for file 12...
## Loading file 13...
## Calculating histogram for file 13...
## Loading file 14...
## Calculating histogram for file 14...
## Loading file 15...
## Calculating histogram for file 15...
## Loading file 16...
## Calculating histogram for file 16...
## Loading file 17...
## Calculating histogram for file 17...
## Loading file 18...
## Calculating histogram for file 18...
## Loading file 19...
## Calculating histogram for file 19...
## Loading file 20...
## Calculating histogram for file 20...
## Calculating pairwise distances between histograms...
## i = 1; j = 2; dist = 1.76925
## i = 1; j = 3; dist = 0.5556
## i = 1; j = 4; dist = 1.62995
## i = 1; j = 5; dist = 0.6623
## i = 1; j = 6; dist = 1.67935
## i = 1; j = 7; dist = 0.5003
## i = 1; j = 8; dist = 1.76675
## i = 1; j = 9; dist = 0.4188
## i = 1; j = 10; dist = 1.85695
## i = 1; j = 11; dist = 0.59835
## i = 1; j = 12; dist = 1.84495
## i = 1; j = 13; dist = 0.37195
## i = 1; j = 14; dist = 1.90415
## i = 1; j = 15; dist = 0.5332
## i = 1; j = 16; dist = 1.5768
## i = 1; j = 17; dist = 0.46625
## i = 1; j = 18; dist = 1.9113
## i = 1; j = 19; dist = 0.39265
## i = 1; j = 20; dist = 1.73065
## i = 2; j = 3; dist = 1.89565
## i = 2; j = 4; dist = 0.5081
## i = 2; j = 5; dist = 2.17985
## i = 2; j = 6; dist = 0.5827
## i = 2; j = 7; dist = 1.63315
## i = 2; j = 8; dist = 0.4436
## i = 2; j = 9; dist = 1.96395
## i = 2; j = 10; dist = 0.4391
## i = 2; j = 11; dist = 1.9695
## i = 2; j = 12; dist = 0.4181
## i = 2; j = 13; dist = 1.7841
## i = 2; j = 14; dist = 0.4271
## i = 2; j = 15; dist = 2.02115
## i = 2; j = 16; dist = 0.50355
## i = 2; j = 17; dist = 1.8465
## i = 2; j = 18; dist = 0.51935
## i = 2; j = 19; dist = 1.6643
## i = 2; j = 20; dist = 0.4097
## i = 3; j = 4; dist = 1.78135
## i = 3; j = 5; dist = 0.6705
## i = 3; j = 6; dist = 1.92735
## i = 3; j = 7; dist = 0.6789
## i = 3; j = 8; dist = 1.83945
## i = 3; j = 9; dist = 0.5709
## i = 3; j = 10; dist = 2.02975
## i = 3; j = 11; dist = 0.53525
## i = 3; j = 12; dist = 1.97165
## i = 3; j = 13; dist = 0.57005
## i = 3; j = 14; dist = 2.00675
## i = 3; j = 15; dist = 0.6302
## i = 3; j = 16; dist = 1.7297
## i = 3; j = 17; dist = 0.40925
## i = 3; j = 18; dist = 1.947
## i = 3; j = 19; dist = 0.54205
## i = 3; j = 20; dist = 1.93255
## i = 4; j = 5; dist = 1.95135
## i = 4; j = 6; dist = 0.6006
## i = 4; j = 7; dist = 1.55475
## i = 4; j = 8; dist = 0.5472
## i = 4; j = 9; dist = 1.75765
## i = 4; j = 10; dist = 0.5975
## i = 4; j = 11; dist = 1.8457
## i = 4; j = 12; dist = 0.4716
## i = 4; j = 13; dist = 1.6647
## i = 4; j = 14; dist = 0.6292
## i = 4; j = 15; dist = 1.91555
## i = 4; j = 16; dist = 0.42345
## i = 4; j = 17; dist = 1.7772
## i = 4; j = 18; dist = 0.67705
## i = 4; j = 19; dist = 1.5781
## i = 4; j = 20; dist = 0.4644
## i = 5; j = 6; dist = 2.06245
## i = 5; j = 7; dist = 0.9733
## i = 5; j = 8; dist = 2.20845
## i = 5; j = 9; dist = 0.4595
## i = 5; j = 10; dist = 2.26375
## i = 5; j = 11; dist = 0.58575
## i = 5; j = 12; dist = 2.19265
## i = 5; j = 13; dist = 0.63315
## i = 5; j = 14; dist = 2.33165
## i = 5; j = 15; dist = 0.5165
## i = 5; j = 16; dist = 1.9022
## i = 5; j = 17; dist = 0.76855
## i = 5; j = 18; dist = 2.3527
## i = 5; j = 19; dist = 0.81135
## i = 5; j = 20; dist = 2.10035
## i = 6; j = 7; dist = 1.60175
## i = 6; j = 8; dist = 0.7385
## i = 6; j = 9; dist = 1.87115
## i = 6; j = 10; dist = 0.6313
## i = 6; j = 11; dist = 1.8147
## i = 6; j = 12; dist = 0.6742
## i = 6; j = 13; dist = 1.7136
## i = 6; j = 14; dist = 0.7753
## i = 6; j = 15; dist = 1.91265
## i = 6; j = 16; dist = 0.55255
## i = 6; j = 17; dist = 1.8616
## i = 6; j = 18; dist = 0.86815
## i = 6; j = 19; dist = 1.6282
## i = 6; j = 20; dist = 0.4977
## i = 7; j = 8; dist = 1.58885
## i = 7; j = 9; dist = 0.6984
## i = 7; j = 10; dist = 1.72295
## i = 7; j = 11; dist = 0.77395
## i = 7; j = 12; dist = 1.73335
## i = 7; j = 13; dist = 0.53125
## i = 7; j = 14; dist = 1.73185
## i = 7; j = 15; dist = 0.7266
## i = 7; j = 16; dist = 1.465
## i = 7; j = 17; dist = 0.51825
## i = 7; j = 18; dist = 1.7288
## i = 7; j = 19; dist = 0.44405
## i = 7; j = 20; dist = 1.64745
## i = 8; j = 9; dist = 1.95965
## i = 8; j = 10; dist = 0.4972
## i = 8; j = 11; dist = 2.0653
## i = 8; j = 12; dist = 0.4714
## i = 8; j = 13; dist = 1.8184
## i = 8; j = 14; dist = 0.4517
## i = 8; j = 15; dist = 2.07105
## i = 8; j = 16; dist = 0.57195
## i = 8; j = 17; dist = 1.8139
## i = 8; j = 18; dist = 0.39635
## i = 8; j = 19; dist = 1.6594
## i = 8; j = 20; dist = 0.5088
## i = 9; j = 10; dist = 2.04465
## i = 9; j = 11; dist = 0.54995
## i = 9; j = 12; dist = 1.98555
## i = 9; j = 13; dist = 0.45605
## i = 9; j = 14; dist = 2.10725
## i = 9; j = 15; dist = 0.4727
## i = 9; j = 16; dist = 1.7225
## i = 9; j = 17; dist = 0.56655
## i = 9; j = 18; dist = 2.1004
## i = 9; j = 19; dist = 0.54895
## i = 9; j = 20; dist = 1.89905
## i = 10; j = 11; dist = 2.0588
## i = 10; j = 12; dist = 0.456
## i = 10; j = 13; dist = 1.8845
## i = 10; j = 14; dist = 0.3702
## i = 10; j = 15; dist = 2.11985
## i = 10; j = 16; dist = 0.60155
## i = 10; j = 17; dist = 1.9871
## i = 10; j = 18; dist = 0.50695
## i = 10; j = 19; dist = 1.777
## i = 10; j = 20; dist = 0.4208
## i = 11; j = 12; dist = 2.048
## i = 11; j = 13; dist = 0.5603
## i = 11; j = 14; dist = 2.1403
## i = 11; j = 15; dist = 0.45555
## i = 11; j = 16; dist = 1.76255
## i = 11; j = 17; dist = 0.5428
## i = 11; j = 18; dist = 2.21025
## i = 11; j = 19; dist = 0.6201
## i = 11; j = 20; dist = 1.9024
## i = 12; j = 13; dist = 1.8701
## i = 12; j = 14; dist = 0.4895
## i = 12; j = 15; dist = 2.10775
## i = 12; j = 16; dist = 0.45175
## i = 12; j = 17; dist = 1.9607
## i = 12; j = 18; dist = 0.50495
## i = 12; j = 19; dist = 1.7726
## i = 12; j = 20; dist = 0.4146
## i = 13; j = 14; dist = 1.9214
## i = 13; j = 15; dist = 0.50905
## i = 13; j = 16; dist = 1.58085
## i = 13; j = 17; dist = 0.4638
## i = 13; j = 18; dist = 1.96335
## i = 13; j = 19; dist = 0.4102
## i = 13; j = 20; dist = 1.7461
## i = 14; j = 15; dist = 2.20185
## i = 14; j = 16; dist = 0.64085
## i = 14; j = 17; dist = 1.956
## i = 14; j = 18; dist = 0.41505
## i = 14; j = 19; dist = 1.7949
## i = 14; j = 20; dist = 0.4988
## i = 15; j = 16; dist = 1.8063
## i = 15; j = 17; dist = 0.58645
## i = 15; j = 18; dist = 2.2122
## i = 15; j = 19; dist = 0.59735
## i = 15; j = 20; dist = 1.98965
## i = 16; j = 17; dist = 1.71405
## i = 16; j = 18; dist = 0.7033
## i = 16; j = 19; dist = 1.51255
## i = 16; j = 20; dist = 0.44395
## i = 17; j = 18; dist = 1.94515
## i = 17; j = 19; dist = 0.4321
## i = 17; j = 20; dist = 1.8906
## i = 18; j = 19; dist = 1.80025
## i = 18; j = 20; dist = 0.61825
## i = 19; j = 20; dist = 1.6692

6.2 Using BiocParallel to parallelize distance matrix computation

Finally, CytoMDS pairwise distance calculation supports parallelization of distance matrix computation, through the use of BiocParallel package.

When parallelization is used, the calculation engine will automatically create worker tasks corresponding to specific blocks of the distance matrix to be calculated.

Here below is an example, using BiocParallel::SnowParam() backbone.

bp <- BiocParallel::SnowParam(
    stop.on.error = FALSE,
    progressbar = TRUE)
pwDist1Last <- suppressWarnings(pairwiseEMDDist(
    x = nSample,
    loadFlowFrameFUN = function(ffIndex, theFiles){
        readRDS(file = theFiles[ffIndex])
    },
    loadFlowFrameFUNArgs = list(theFiles = fileNames),
    useBiocParallel = TRUE,
    BPPARAM = bp))
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |======================================================================| 100%
## 
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=====================                                                 |  31%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===================================                                   |  51%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |==========================================                            |  61%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |=================================================                     |  71%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |========================================================              |  81%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |===========================================================           |  85%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |===============================================================       |  91%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================|  99%
  |                                                                            
  |======================================================================| 100%

The obtained distances - as displayed in the below histogram - are exactly the same as before.

distVec1Last <- pwDist1Last[upper.tri(pwDist1Last)]
distVecDF1Last <- data.frame(dist = distVec1Last)
pHist1Last <- ggplot(distVecDF1Last, mapping = aes(x=dist)) + 
    geom_histogram(fill = "darkgrey", col = "black", bins = 15) + 
    theme_bw() + ggtitle("EMD distances for data set 1 - parallel computation")
pHist1Last

Session information

## R Under development (unstable) (2024-01-16 r85808)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.4.4      CytoMDS_0.99.8     CytoPipeline_1.3.3 BiocStyle_2.31.0  
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3   rstudioapi_0.15.0    jsonlite_1.8.8      
##   [4] shape_1.4.6          magrittr_2.0.3       jomo_2.7-6          
##   [7] magick_2.8.2         farver_2.1.1         nloptr_2.0.3        
##  [10] rmarkdown_2.25       zlibbioc_1.49.0      vctrs_0.6.5         
##  [13] minqa_1.2.6          heplots_1.6.0        base64enc_0.1-3     
##  [16] htmltools_0.5.7      polynom_1.4-1        plotrix_3.8-4       
##  [19] weights_1.0.4        broom_1.0.5          Formula_1.2-5       
##  [22] mitml_0.4-5          sass_0.4.8           pracma_2.4.4        
##  [25] bslib_0.6.1          htmlwidgets_1.6.4    plyr_1.8.9          
##  [28] cachem_1.0.8         lifecycle_1.0.4      iterators_1.0.14    
##  [31] pkgconfig_2.0.3      Matrix_1.6-5         R6_2.5.1            
##  [34] fastmap_1.1.1        digest_0.6.34        colorspace_2.1-0    
##  [37] patchwork_1.2.0      S4Vectors_0.41.3     Hmisc_5.1-1         
##  [40] ellipse_0.5.0        labeling_0.4.3       cytolib_2.15.2      
##  [43] fansi_1.0.6          nnls_1.5             gdata_3.0.0         
##  [46] polyclip_1.10-6      abind_1.4-5          compiler_4.4.0      
##  [49] proxy_0.4-27         withr_3.0.0          doParallel_1.0.17   
##  [52] htmlTable_2.4.2      backports_1.4.1      BiocParallel_1.37.0 
##  [55] carData_3.0-5        hexbin_1.28.3        highr_0.10          
##  [58] ggforce_0.4.1        Rttf2pt1_1.3.12      pan_1.9             
##  [61] MASS_7.3-60.2        gtools_3.9.5         tools_4.4.0         
##  [64] foreign_0.8-86       extrafontdb_1.0      nnet_7.3-19         
##  [67] glue_1.7.0           nlme_3.1-164         grid_4.4.0          
##  [70] checkmate_2.3.1      cluster_2.1.6        snow_0.4-4          
##  [73] generics_0.1.3       gtable_0.3.4         class_7.3-22        
##  [76] tidyr_1.3.1          data.table_1.15.0    car_3.1-2           
##  [79] utf8_1.2.4           BiocGenerics_0.49.1  ggrepel_0.9.5       
##  [82] foreach_1.5.2        pillar_1.9.0         stringr_1.5.1       
##  [85] splines_4.4.0        flowCore_2.15.2      tweenr_2.0.2        
##  [88] dplyr_1.1.4          smacof_2.1-5         lattice_0.22-5      
##  [91] survival_3.5-7       RProtoBufLib_2.15.0  tidyselect_1.2.0    
##  [94] ggcyto_1.31.1        transport_0.14-6     knitr_1.45          
##  [97] gridExtra_2.3        bookdown_0.37        flowWorkspace_4.15.4
## [100] stats4_4.4.0         xfun_0.41            Biobase_2.63.0      
## [103] matrixStats_1.2.0    stringi_1.8.3        ncdfFlow_2.49.0     
## [106] yaml_2.3.8           boot_1.3-28.1        evaluate_0.23       
## [109] codetools_0.2-19     wordcloud_2.6        extrafont_0.19      
## [112] tibble_3.2.1         Rgraphviz_2.47.0     BiocManager_1.30.22 
## [115] graph_1.81.0         cli_3.6.2            rpart_4.1.23        
## [118] munsell_0.5.0        jquerylib_0.1.4      candisc_0.8-6       
## [121] Rcpp_1.0.12          XML_3.99-0.16.1      parallel_4.4.0      
## [124] rgl_1.2.8            lme4_1.1-35.1        glmnet_4.1-8        
## [127] scales_1.3.0         e1071_1.7-14         purrr_1.0.2         
## [130] rlang_1.1.3          mice_3.16.0

References

Ellis, B, Perry Haaland, Florian Hahne, Nolwenn Le Meur, Nishant Gopalakrishnan, Josef Spidlen, Mike Jiang, and Greg Finak. 2023. FlowCore: FlowCore: Basic Structures for Flow Cytometry Data. https://doi.org/10.18129/B9.bioc.flowCore.

Gherardin, Nicholas A, David S Ritchie, Dale I Godfrey, and Paul J Neeson. 2014. “OMIP-021: Simultaneous Quantification of Human Conventional and Innate-Like T-Cell Subsets.” Cytometry A 85 (7): 573–75.

Hauchamps, Philippe, and Laurent Gatto. 2023. CytoPipeline: Automation and Visualization of Flow Cytometry Data Analysis Pipelines. https://uclouvain-cbio.github.io/CytoPipeline.

Leeuw, Jan de, and Patrick Mair. 2009. “Multidimensional Scaling Using Majorization: SMACOF in R.” J. Stat. Softw. 31 (August): 1–30.

Orlova, Darya Y, Noah Zimmerman, Stephen Meehan, Connor Meehan, Jeffrey Waters, Eliver E B Ghosn, Alexander Filatenkov, et al. 2016. “Earth Mover’s Distance (EMD): A True Metric for Comparing Biomarker Expression Levels in Cell Populations.” PLOS ONE.