DecontPro assess and decontaminate single-cell protein expression data, such as those generated from CITE-seq or Total-seq. The count matrix is decomposed into three matrices, the native, the ambient and the background that represent the contribution from the true protein expression on cells, the ambient material and other non-specific background contamination.
DecontX Package can be installed from Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
Then the package can be loaded in R using the following command:
To see the latest updates and releases or to post a bug, see our GitHub page at
Here we use an example dataset from SingleCellMultiModal
dat <- CITEseq("cord_blood", = FALSE)
counts <- experiments(dat)$scADT
For this tutorial, we sample only 1000 droplets from the dataset to demonstrate the use of functions. When analyzing your dataset, sub-sampling should be done with caution, as decontPro
approximates contamination profile using the dataset. A biased sampling may introduce bias to the contamination profile approximation.
sample_id <- sample(dim(counts)[2], 1000, replace = FALSE)
counts_sample <- counts[, sample_id]
requires a vector indicating the cell types of each droplet. Here we use Seurat
for clustering.
adt_seurat <- CreateSeuratObject(counts_sample, assay = "ADT")
adt_seurat <- NormalizeData(adt_seurat, normalization.method = "CLR", margin = 2) %>%
ScaleData(assay = "ADT") %>%
RunPCA(assay = "ADT", features = rownames(adt_seurat), npcs = 10, = "pca_adt") %>%
FindNeighbors(dims = 1:10, assay = "ADT", reduction = "pca_adt") %>%
FindClusters(resolution = 0.5)
#> Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
#> Number of nodes: 1000
#> Number of edges: 32498
#> Running Louvain algorithm...
#> Maximum modularity in 10 random starts: 0.8567
#> Number of communities: 9
#> Elapsed time: 0 seconds
adt_seurat <- RunUMAP(adt_seurat,
dims = 1:10,
assay = "ADT",
reduction = "pca_adt", = "adtUMAP",
verbose = FALSE)
DimPlot(adt_seurat, reduction = "adtUMAP", label = TRUE)
features = c("CD3", "CD4", "CD8", "CD19", "CD14", "CD16", "CD56"))
clusters <- as.integer(Idents(adt_seurat))
You can run DecontPro
by simply:
counts <- as.matrix(counts_sample)
out <- decontPro(counts,
Priors (delta_sd
and background_sd
) may need tuning with the help of plotting the decontaminated results. The two priors encode belief if a more spread-out count should be considered contamination vs. native. We tested the default values on datasets ranging 5k to 10k droplets and 10 to 30 ADTs and the results are reasonable. A more contaminated or a smaller dataset may need larger priors. In this tutorial, since we sampled only 1k droplets from the original 10k droplets, we use slightly larger priors:
counts <- as.matrix(counts_sample)
out <- decontPro(counts,
delta_sd = 2e-4,
background_sd = 2e-5)
The output contains three matrices, and model parameters after inference.
represent the true protein expression on cells.
decontaminated_counts <- out$decontaminated_counts
Plot ADT density before and after decontamination. For bimodal ADTs, the background peak should be removed. Note CD4 is tri-modal with the intermediate mode corresponding to monocytes. This can be used as a QC metric for decontamination as only the lowest mode should be removed.
c("CD3", "CD4", "CD8", "CD14", "CD16", "CD19"))
We can also visualize the decontamination by each cell cluster.
c("CD3", "CD4", "CD8", "CD16"))
