--- title: "A quick tour of RCSL" author: "Qinglin Mei" date: "`r Sys.Date()`" output: BiocStyle::html_document: toc: true vignette: > %\VignetteIndexEntry{RCSL package manual} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r knitr-options, echo=FALSE, message=FALSE, warning=FALSE, include = FALSE} library(knitr) opts_chunk$set( collapse = TRUE, fig.align = 'center', fig.width = 6, fig.height = 5, dev = 'png', comment = "#>" ) ``` # Introduction `RCSL` is an R toolkit for single-cell clustering and trajectory analysis using single-cell RNA-seq data. # Installation ### Install RCSL package and other requirements `RCSL` can be installed directly from GitHub with 'devtools'. ```{r, eval=FALSE} library(devtools) devtools::install_github("QinglinMei/RCSL") ``` Now we can load `RCSL`. We also load the `SingleCellExperiment`, `ggplot2` and `igraph` package. ```{r, results="hide"} library(RCSL) library(SingleCellExperiment) library(ggplot2) library(igraph) library(umap) ``` # Run RCSL ## Load dataset (yan) We illustrate the usage of RCSL on a human preimplantation embryos and embryonic stem cells(*Yan et al., (2013)*). The yan data is distributed together with the RCSL package, with 90 cells and 20,214 genes: ```{r} head(ann) yan[1:3, 1:3] origData <- yan label <- ann$cell_type1 ``` ## 1. Pre-processing In practice, we find it always beneficial to pre-process single-cell RNA-seq datasets, including: 1. Log transformation. 2. Gene filter ```{r, cache=TRUE} data <- log2(as.matrix(origData) + 1) gfData <- GenesFilter(data) ``` ## 2. Calculate the initial similarity matrix S ```{r, cache=TRUE} resSimS <- SimS(gfData) ``` ## 3. Estimate the number of clusters C ```{r, cache=TRUE} Estimated_C <- EstClusters(resSimS$drData,resSimS$S) ``` ## 4. Calculate the block diagonal matrix B ```{r, cache=TRUE} resBDSM <- BDSM(resSimS$S, Estimated_C) ``` # Calculate accuracy of the clustering ```{r, cache=TRUE} ARI_RCSL <- igraph::compare(resBDSM$y, label, method = "adjusted.rand") ``` # Trajectory analysis to time-series datasets ```{r, cache=TRUE} DataName <- "Yan" res_TrajecAnalysis <- TrajectoryAnalysis(gfData, resSimS$drData, resSimS$S, clustRes = resBDSM$y, TrueLabel = label, startPoint = 1, dataName = DataName) ``` # Display the constructed MST ```{r, cache=TRUE} res_TrajecAnalysis$MSTPlot ``` # Display the plot of the pseudo-temporal ordering ```{r, cache=TRUE} res_TrajecAnalysis$PseudoTimePlot ``` # Display the plot of the inferred developmental trajectory ```{r, cache=TRUE} res_TrajecAnalysis$TrajectoryPlot ```