---
title: "Quick start of CytoTree"
author: "Yuting Dai"
date: "`r Sys.Date()`"
output:
prettydoc::html_pretty:
highlight: github
theme: cayman
toc: yes
pdf_document:
toc: yes
html_document:
df_print: paged
toc: yes
package: CytoTree
vignette: |
%\VignetteIndexEntry{Quick_start}
\usepackage[utf8]{inputenc}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
---
```{r echo = TRUE}
knitr::opts_chunk$set(echo = TRUE, cache = FALSE, eval = TRUE,
warning = TRUE, message = TRUE,
fig.width = 6, fig.height = 5)
```
## Introduction
Although multidimensional single-cell-based flow and mass cytometry have been increasingly applied to microenvironmental composition and stem-cell research, integrated analysis workflows to facilitate the interpretation of experimental cytometry data remain underdeveloped. We present CytoTree, a comprehensive R package designed for the analysis and interpretation of flow and mass cytometry data. We applied CytoTree to mass cytometry and time-course flow cytometry data to demonstrate the usage and practical utility of its computational modules. CytoTree is a reliable tool for multidimensional cytometry data workflows and produces compelling results for trajectory construction and pseudotime estimation.
## Overview of CytoTree workflow
The CytoTree package is developed to complete the majority of standard analysis and visualization workflow for FCS data. In CytoTree workflow, an S4 object in R is built to implement the statistical and computational approach, and all computational modules are integrated into one single channel which only requires a specified input data format.
`CytoTree` can help you to perform four main types of analysis:
- **Clustering**. `CytoTree` can help you to discover and identify subtypes of cells.
- **Dimensionality Reduction**. Several dimensionality reduction methods are provided in `CytoTree` package such as Principal Components Analysis (PCA), t-distributed Stochastic Neighbor Embedding (tSNE), Diffusion Maps and Uniform Manifold Approximation and Projection (UMAP). CytoTree provides both cell-based and cluster-based dimensionality reduction.
- **Trajectory Inference**. `CytoTree` can help you to construct the cellular differential based on minimum spanning tree (MST) algorithm.
- **Pseudotime and Intermediate states definition**. The root cells need to be defined by users. The trajctroy value will be calculated based on Shortest Path from root cells and leaf cells using R `igraph` package. Subset FCS data set in `CytoTree` and find the key intermediate cell states based on trajectory value.
**Fig. 1 Workflow of CytoTree**
## Quick start
``` {r eval = TRUE}
# Loading packages
suppressMessages({
library(ggplot2)
library(CytoTree)
library(flowCore)
library(stringr)
})
# Read fcs files
fcs.path <- system.file("extdata", package = "CytoTree")
fcs.files <- list.files(fcs.path, pattern = '.FCS$', full = TRUE)
fcs.data <- runExprsMerge(fcs.files, comp = FALSE, transformMethod = "none")
# Refine colnames of fcs data
recol <- c(`FITC-A` = "CD43", `APC-A` = "CD34",
`BV421-A` = "CD90", `BV510-A` = "CD45RA",
`BV605-A` = "CD31", `BV650-A` = "CD49f",
`BV 735-A` = "CD73", `BV786-A` = "CD45",
`PE-A` = "FLK1", `PE-Cy7-A` = "CD38")
colnames(fcs.data)[match(names(recol), colnames(fcs.data))] = recol
fcs.data <- fcs.data[, recol]
day.list <- c("D0", "D2", "D4", "D6", "D8", "D10")
meta.data <- data.frame(cell = rownames(fcs.data),
stage = str_replace(rownames(fcs.data), regex(".FCS.+"), "") )
meta.data$stage <- factor(as.character(meta.data$stage), levels = day.list)
markers <- c("CD43","CD34","CD90","CD45RA","CD31","CD49f","CD73","CD45","FLK1","CD38")
# Build the CYT object
cyt <- createCYT(raw.data = fcs.data, markers = markers,
meta.data = meta.data,
normalization.method = "log",
verbose = TRUE)
# See information
cyt
```
``` {r eval = TRUE}
# Cluster cells by SOM algorithm
# Set random seed to make results reproducible
set.seed(1)
cyt <- runCluster(cyt, cluster.method = "som")
# Do not perform downsampling
set.seed(1)
cyt <- processingCluster(cyt)
# run Principal Component Analysis (PCA)
cyt <- runFastPCA(cyt)
# run t-Distributed Stochastic Neighbor Embedding (tSNE)
cyt <- runTSNE(cyt)
# run Diffusion map
cyt <- runDiffusionMap(cyt)
# run Uniform Manifold Approximation and Projection (UMAP)
cyt <- runUMAP(cyt)
# build minimum spanning tree based on tsne
cyt <- buildTree(cyt, dim.type = "tsne", dim.use = 1:2)
# DEGs of different branch
diff.list <- runDiff(cyt)
# define root cells
cyt <- defRootCells(cyt, root.cells = c(28,26))
# run pseudotime
cyt <- runPseudotime(cyt, verbose = TRUE, dim.type = "raw")
# define leaf cells
cyt <- defLeafCells(cyt, leaf.cells = c(27, 13), verbose = TRUE)
# run walk between root cells and leaf cells
cyt <- runWalk(cyt, verbose = TRUE)
# Save object
if (FALSE) {
save(cyt, file = "Path to you output directory")
}
######################## Visualization
# Plot 2D tSNE. And cells are colored by cluster id
plot2D(cyt, item.use = c("tSNE_1", "tSNE_2"), color.by = "cluster.id",
alpha = 1, main = "tSNE", category = "categorical", show.cluser.id = TRUE)
# Plot 2D UMAP. And cells are colored by cluster id
plot2D(cyt, item.use = c("UMAP_1", "UMAP_2"), color.by = "cluster.id",
alpha = 1, main = "UMAP", category = "categorical", show.cluser.id = TRUE)
# Plot 2D tSNE. And cells are colored by cluster id
plot2D(cyt, item.use = c("tSNE_1", "tSNE_2"), color.by = "branch.id",
alpha = 1, main = "tSNE", category = "categorical", show.cluser.id = TRUE)
# Plot 2D UMAP. And cells are colored by cluster id
plot2D(cyt, item.use = c("UMAP_1", "UMAP_2"), color.by = "branch.id",
alpha = 1, main = "UMAP", category = "categorical", show.cluser.id = TRUE)
# Plot 2D tSNE. And cells are colored by stage
plot2D(cyt, item.use = c("tSNE_1", "tSNE_2"), color.by = "stage",
alpha = 1, main = "UMAP", category = "categorical") +
scale_color_manual(values = c("#00599F","#009900","#FF9933",
"#FF99FF","#7A06A0","#FF3222"))
# Plot 2D UMAP. And cells are colored by stage
plot2D(cyt, item.use = c("UMAP_1", "UMAP_2"), color.by = "stage",
alpha = 1, main = "UMAP", category = "categorical") +
scale_color_manual(values = c("#00599F","#009900","#FF9933",
"#FF99FF","#7A06A0","#FF3222"))
# Tree plot
plotTree(cyt, color.by = "D0.percent", show.node.name = TRUE, cex.size = 1) +
scale_colour_gradientn(colors = c("#00599F", "#EEEEEE", "#FF3222"))
plotTree(cyt, color.by = "CD43", show.node.name = TRUE, cex.size = 1) +
scale_colour_gradientn(colors = c("#00599F", "#EEEEEE", "#FF3222"))
# plot clusters
plotCluster(cyt, item.use = c("tSNE_1", "tSNE_2"), category = "numeric",
size = 100, color.by = "CD45RA") +
scale_colour_gradientn(colors = c("#00599F", "#EEEEEE", "#FF3222"))
# plot pie tree
plotPieTree(cyt, cex.size = 3, size.by.cell.number = TRUE) +
scale_fill_manual(values = c("#00599F","#FF3222","#009900",
"#FF9933","#FF99FF","#7A06A0"))
# plot pie cluster
plotPieCluster(cyt, item.use = c("tSNE_1", "tSNE_2"), cex.size = 40) +
scale_fill_manual(values = c("#00599F","#FF3222","#009900",
"#FF9933","#FF99FF","#7A06A0"))
# plot heatmap of cluster
plotClusterHeatmap(cyt)
plotBranchHeatmap(cyt)
# Violin plot
plotViolin(cyt, color.by = "cluster.id", marker = "CD45RA", text.angle = 90)
plotViolin(cyt, color.by = "branch.id", marker = "CD45RA", text.angle = 90)
# UMAP plot colored by pseudotime
plot2D(cyt, item.use = c("UMAP_1", "UMAP_2"), category = "numeric",
size = 1, color.by = "pseudotime") +
scale_colour_gradientn(colors = c("#F4D31D", "#FF3222","#7A06A0"))
# tSNE plot colored by pseudotime
plot2D(cyt, item.use = c("tSNE_1", "tSNE_2"), category = "numeric",
size = 1, color.by = "pseudotime") +
scale_colour_gradientn(colors = c("#F4D31D", "#FF3222","#7A06A0"))
# denisty plot by different stage
plotPseudotimeDensity(cyt, adjust = 1) +
scale_color_manual(values = c("#00599F","#009900","#FF9933",
"#FF99FF","#7A06A0","#FF3222"))
# Tree plot
plotTree(cyt, color.by = "pseudotime", cex.size = 1.5) +
scale_colour_gradientn(colors = c("#F4D31D", "#FF3222","#7A06A0"))
plotViolin(cyt, color.by = "cluster.id", order.by = "pseudotime",
marker = "CD49f", text.angle = 90)
# trajectory value
plotPseudotimeTraj(cyt, var.cols = TRUE) +
scale_colour_gradientn(colors = c("#F4D31D", "#FF3222","#7A06A0"))
plotHeatmap(cyt, downsize = 1000, cluster_rows = TRUE, clustering_method = "ward.D",
color = colorRampPalette(c("#00599F","#EEEEEE","#FF3222"))(100))
# plot cluster
plotCluster(cyt, item.use = c("tSNE_1", "tSNE_2"), color.by = "traj.value.log",
size = 10, show.cluser.id = TRUE, category = "numeric") +
scale_colour_gradientn(colors = c("#EEEEEE", "#FF3222", "#CC0000", "#CC0000"))
```
## Announcement
The previous version of `CytoTree` is `flowSpy` **[link to GitHub](https://github.com/JhuangLab/CytoTree) and [link to Bioconductor](https://bioconductor.org/packages/flowSpy/)**. To improve the identification and avoid awkward duplication of names in some situations, we changed the name of `flowSpy` to `CytoTree`. `CytoTree` more fits the functional orientation of this software.
We apologized for the inconvenience.
## References
1. Hahne F, Arlt D, Sauermann M, Majety M, Poustka A, Wiemann S, Huber W: Statistical methods and software for the analysis of highthroughput reverse genetic assays using flow cytometry readouts. Genome Biol 2006, 7:R77.
2. Olsen LR, Leipold MD, Pedersen CB, Maecker HT: The anatomy of single cell mass cytometry data. Cytometry A 2019, 95:156-172.
3. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R: Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 2018, 36:411-420.
4. Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikkelsen TS, Rinn JL: The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 2014, 32:381-386.
5. Kiselev VY, Yiu A, Hemberg M: scmap: projection of single-cell RNA-seq data across data sets. Nat Methods 2018, 15:359-362.
6. Amir el AD, Davis KL, Tadmor MD, Simonds EF, Levine JH, Bendall SC, Shenfeld DK, Krishnaswamy S, Nolan GP, Pe'er D: viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat Biotechnol 2013, 31:545-552.
7. Haghverdi L, Buettner F, Theis FJ: Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 2015, 31:2989-2998.
8. Becht E, McInnes L, Healy J, Dutertre CA, Kwok IWH, Ng LG, Ginhoux F, Newell EW: Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 2018.
9. Wang L, Hoffman RA: Standardization, Calibration, and Control in Flow Cytometry. Curr Protoc Cytom 2017, 79:1 3 1-1 3 27.
10. Hahne F, LeMeur N, Brinkman RR, Ellis B, Haaland P, Sarkar D, Spidlen J, Strain E, Gentleman R: flowCore: a Bioconductor package for high throughput flow cytometry. BMC Bioinformatics 2009, 10:106.
11. Sarkar D, Le Meur N, Gentleman R: Using flowViz to visualize flow cytometry data. Bioinformatics 2008, 24:878-879.
12. Van Gassen S, Callebaut B, Van Helden MJ, Lambrecht BN, Demeester P, Dhaene T, Saeys Y: FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry A 2015, 87:636-645.
13. Qiu P, Simonds EF, Bendall SC, Gibbs KD, Jr., Bruggner RV, Linderman MD, Sachs K, Nolan GP, Plevritis SK: Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nat Biotechnol 2011, 29:886-891.
14. Chen H, Lau MC, Wong MT, Newell EW, Poidinger M, Chen J: Cytofkit: A Bioconductor Package for an Integrated Mass Cytometry Data Analysis Pipeline. PLoS Comput Biol 2016, 12:e1005112.
15. Chattopadhyay PK, Winters AF, Lomas WE, 3rd, Laino AS, Woods DM: High-Parameter Single-Cell Analysis. Annu Rev Anal Chem (Palo Alto Calif) 2019, 12:411-430.
16. Bendall SC, Davis KL, Amir el AD, Tadmor MD, Simonds EF, Chen TJ, Shenfeld DK, Nolan GP, Pe'er D: Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell 2014, 157:714-725.
17. Nowicka M, Krieg C, Crowell HL, Weber LM, Hartmann FJ, Guglietta S, Becher B, Levesque MP, Robinson MD: CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets. F1000Res 2017, 6:748.