--- title: "InterCellar User Guide" author: - name: Marta Interlandi affiliation: - Institute of Medical Informatics, University of Muenster, Muenster (DE) email: marta.interlandi@uni-muenster.de date: "`r BiocStyle::doc_date()`" package: "`r BiocStyle::pkg_ver('InterCellar')`" output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{InterCellar User Guide} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: InterCellar.bib --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", error = FALSE, warning = FALSE, message = FALSE, crop = NULL ) ``` ```{r, echo=FALSE, out.width='50%', fig.align='center'} knitr::include_graphics(path = system.file("app", "www", "logo_lowres.png", package="InterCellar", mustWork=TRUE)) ``` # Introduction `InterCellar` is a [Bioconductor](http://bioconductor.org) package that provides an interactive Shiny application to enable the analysis of cell-cell communication from single-cell RNA sequencing (scRNA-seq) data. Every step of the analysis can be performed interactively, thus not requiring any programming skills. Moreover, `InterCellar` runs on your local machine, avoiding issues related to data privacy. ## Installation `InterCellar` is distributed as a [Bioconductor](https://www.bioconductor.org/) package and requires R (version 4.1) and Bioconductor (version 3.14). To install `InterCellar` package enter: ```{r eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("InterCellar") ``` ## Launching the app Once InterCellar is successfully installed, it can be loaded as follow: ```{r setup} library(InterCellar) ``` In order to start the app, please run the following command: ```{r demostart, eval=FALSE} InterCellar::run_app( reproducible = TRUE ) ``` `InterCellar` should be opening in a browser. If this does not happen automatically, please open a browser and navigate to the address shown (for example, `Listening on http://127.0.0.1:6134`). The flag `reproducible = TRUE` ensures that your results will be reproducible across R sessions. # Data upload The first step of the workflow requires the upload of pre-computed results generated by an external tool capable of predicting cell-cell communication mediated by ligand-receptor interactions. `InterCellar` supports both published tools such as [*CellPhoneDBv2*](https://www.cellphonedb.org/) [@efremova2020cellphonedb], [*CellChat*](https://github.com/sqjin/CellChat)[@jin2021inference], [*ICELLNET*](https://github.com/soumelis-lab/ICELLNET)[@noel2021dissection], and `r Biocpkg("SingleCellSignalR")` [@cabello2020singlecellsignalr], and custom results output of *ad hoc* methods, which must contain necessary information as described in the panel *From custom analysis*. For this user guide, we will use *CellPhoneDB* (CPDB) results computed on a scRNA-seq dataset from Chua et al.[@chua2020covid]. This dataset comprises data of COVID-19 patients, divided in critical and moderate cases, as well as healthy controls. Cell-cell interaction (CCI) data output of CPDB on each condition can be found at [InterCellar-reproducibility](https://github.com/martaint/InterCellar-reproducibility). By navigating to **1. Data** and **Upload**, we can import our 3 CCI data from the *Supported tools* panel. We specify an existing local folder where `InterCellar` will create output folders to save figures and tables results of the analysis. To upload a CCI data, we must specify an ID and an output folder tag. Next, we can select the folder containing *CellPhoneDB* results from our local drive. `InterCellar` will read and pre-process the data and show the resulting table in **Table view**. The pre-processing step consists of: * Mapping interaction pairs to the corresponding genes (when necessary) * Annotating genes to their molecular function: L (ligand) or R (receptor) * Re-ordering the interaction pairs listed as R-L to L-R Finally, we can switch active CCI data on the left menu, to easily analyze multiple datasets in parallel. ```{r, echo=FALSE, out.width='120%',fig.align='center'} knitr::include_graphics(path = "screenshots/upload.png") ``` # Data exploration: Universes Once the input data has been uploaded, `InterCellar` takes us to the exploration of three **Universes**. Each universe has its focus on a different biological domain: cell clusters, genes and functions. Specific filtering options can be applied and multiple visualization choices are available to enable a deep exploration of the cellular communication. ## Cluster-verse Focus of this universe are clusters of cells participating in the communication. The filtering options allow the user to subset the dataset by: * excluding entire clusters from the data. All interactions related to the excluded clusters will be excluded as well from further steps of the analysis; * setting a minimum interaction score; * (when available) changing the p-value threshold for significant interactions (default to 0.05). The analyst will be able to see the effect of these filtering steps by looking at the box showing the number of total interactions. Warning: these filters have global influence on the analysis, since they subset the input data! Three tabs are part of the Cluster-verse: **Network**, **Barplot** and **Table**. The **Network** of clusters shows the overall cellular communication. Nodes represent different clusters while edges show the (total or weighted by interaction score) number of paracrine interactions occurring between two clusters. Edges that fall back on the same cluster represent autocrine interactions. ```{r, echo=FALSE, fig.align='center'} knitr::include_graphics(path = "screenshots/cl_verse_net.png") ``` **Barplot** offers two different barplots representing: (1) the total number of interactions per cluster, divided in paracrine and autocrine interactions; and (2) the relative number of interactions for a certain cell type. ```{r, echo=FALSE, fig.align='center'} knitr::include_graphics(path = "screenshots/cl_verse_bar.png") ``` In the **Table** panel, the analyst can restrict the data exploration to a specific focus, by subsetting the data to one cluster of interest, called *viewpoint*, and one *flow* of communication among: * Directed, outgoing interactions (L-R): for which the *viewpoint* cluster sends (expresses) the ligand to the other clusters, that are in turn expressing the corresponding receptor; * Directed, incoming interactions (R-L): for which the *viewpoint* expresses the receptor that binds to the corresponding ligand sent by other clusters; * Undirected interactions (L-L and R-R): for which both elements of an interaction pair are either ligands or receptors. ## Gene-verse `InterCellar` second universe focuses on the genes. Filtering options to exclude interaction pairs (int-pairs) are available and are specific to the input tool chosen by the user. The **Table** shows all distinct int-pairs enriched in our data, regardless of the clusters in which these are found. Included in this Table are [Ensembl](https://www.ensembl.org) and [UniProt](https://www.uniprot.org/) IDs of each gene, with hyperlinks to the respective web pages to facilitate investigation of unfamiliar genes. ```{r, echo=FALSE, fig.align='center'} knitr::include_graphics(path = "screenshots/g_verse_table.png") ``` Upon selection of one or multiple int-pairs from the previous **Table**, a dot plot is generated and visible in the **Dot Plot** panel. The analyst can decide to select a subset of clusters for the visualization as well as choose different colors for high and low int-pair score. ```{r, echo=FALSE, fig.align='center'} knitr::include_graphics(path = "screenshots/g_verse_dot.png") ``` **Network** panel visualizes the selected int-pairs in a cluster network. ## Function-verse In the **Function-verse**, the analyst is required to perform a functional annotation, before proceeding to the next steps of the analysis. To this scope, `InterCellar` offers multiple sources of functional annotations in terms of [Gene Ontology](http://geneontology.org) (queried from [Ensembl](http://www.ensembl.org), via the package [biomaRt](https://bioconductor.org/packages/biomaRt/)) and pre-downloaded pathway databases (from the package [graphite](https://bioconductor.org/packages/graphite/)). After selection of suitable sources, the annotation can be performed and a **Table** showing all functional terms annotated to each int-pair is displayed. Worth to note is the fact that a functional term is annotated to an int-pair only when the functional term is enriched in both genes (or gene complexes), partners of the interaction. The **Barplot** panel summarizes the number of functional terms annotated for each source. In the following panel, **Ranking**, functional terms are listed individually, along with information on *occurrence* (i.e. how many int-pairs have been annotated to this term). By selecting one row of the **Ranking** table, we can explore the term of interest in the **Sunburst** plot. This visualization allows to connect functions to int-pairs and clusters. On the left side of the panel, a table lists all int-pairs annotated to the term. The user can choose to visualize the number of interactions or the weighted number (by score). On the right side, the sunburst plot is composed as follows: * the selected functional term is shown in the inner circle of the plot; * the inner ring shows all "first partner" clusters, enriched by the relevant int-pairs. Specifically, clusters on the inner ring express the first gene of each int-pair; * the outer ring displays all "second partner" clusters, expressing the second gene of each int-pair; * the width of each section represents the fraction of int-pairs found in that section (also shown when hovering on the inner ring sections); * hovering on outer ring sections will show the individual int-pairs enriched in the cluster pairs. ```{r, echo=FALSE, fig.align='center'} knitr::include_graphics(path = "screenshots/f_verse_sunburst.png") ``` # Data-driven analysis ## Int-Pair Modules This step of `InterCellar`'s workflow allows the analyst to define and analyze **Int-Pair Modules**, i.e. groups of int-pairs that share a similar functional profile. To this aim, the choice of a *viewpoint* cluster and communication *flow* is required. `InterCellar` will subset the input data accordingly. This analysis can be repeated for each viewpoint and flow of interest. To define the number of int-pair modules in the data subset, four visualizations are provided. On the left hand side, the optimal number of modules is calculated by `InterCellar` using (1) the elbow method on the total within-clusters sum of squares (which should be minimized) and (2) the average silhouette width (which should be maximized). Both methods are standard practice in cluster analysis and are supposed to help the choice of the optimal number of modules. However, the user is free to choose the best number of modules depending on each case. In general, high (low) number of groups is reflected in high (low) specificity of a module. For this purpose, two visualization offer yet another way to investigate the optimal number of modules. A **dendrogram** of int-pairs shows the results of a hierarchical clustering obtained on the first two components of the **UMAP** underneath. Each point of the UMAP represents one int-pair (shown by hovering) and color-coding is consistent for both UMAP and dendrogram, showing the number of modules chosen. Moreover, dendrogram and UMAP are initialized with the optimal number of modules chosen by the elbow method (giving usually higher resolution compared to the average silhouette). ```{r, echo=FALSE, fig.align='center'} knitr::include_graphics(path = "screenshots/ipM_analysis.png") ``` Once the int-pair modules have been defined, `InterCellar` offers the possibility to visualize the int-pairs belonging to each module and the respective clusters in a **Circle plot**. Directed interactions are represented here by arrows originating from ligands (double segment) towards receptors (single segment). The **Table** panel summarizes the same info in a tabular format. ```{r, echo=FALSE, fig.align='center'} knitr::include_graphics(path = "screenshots/ipM_circle.png") ``` Last step of the int-pair modules analysis concerns functional terms. `InterCellar` performs a permutation test to calculate empirical p-values assessing the significance of functional terms annotated to int-pairs of each module. A **Table** displays functional terms that are found significant (p-value <= 0.05 by default) for the chosen int-pair module. The significant functional terms listed in these tables can help the user to "manually" select terms that are of biological interest and can be used to annotate the UMAP, as we did in our manuscript. ```{r, echo=FALSE, fig.align='center'} knitr::include_graphics(path = "screenshots/ipM_function.png") ``` ## Multiple conditions The final step of `InterCellar`'s analysis allows the comparison of cell-cell communication from different conditions (up to 3). The user can choose which conditions to compare (we recommend having the same -or very similar- composition in terms of cell clusters). The analysis is then structured as follow: * **Cluster-based**: the comparison focuses on the number of interactions per cluster. Panel **Back-to-Back Barplot** considers the first two conditions and plots a barplot comparing the total number of interactions per cluster. Panel **Radar Plot** compares the relative numbers of interaction from a certain viewpoint cell cluster, for two or three conditions. Here we compare COVID-19 critical cases VS moderate ones. For the radar plots, we consider also control cases. ```{r, echo=FALSE, fig.align='center'} knitr::include_graphics(path = "screenshots/MC_clust_bar.png") ``` ```{r, echo=FALSE, fig.align='center'} knitr::include_graphics(path = "screenshots/MC_clust_radar.png") ``` * **Gene-based**: `InterCellar` computes *int-pair/cluster-pair couplets* that are unique to each condition and displays them in the **Table**. Upon selection of one or multiple unique couplets, a **Dot Plot** is generated, showing the occurrence of each couplet in the respective condition. A **Pie Chart** summarizes the contribution of the selected couplets to each condition. ```{r, echo=FALSE, fig.align='center'} knitr::include_graphics(path = "screenshots/MC_gene_dot.png") ``` * **Function-based**: here only int-pairs that are uniquely found in each condition are considered. These are shown in **Table-UniqueIntPairs** along with the condition and cluster-pairs. Based on the functional annotation performed in the *function-verse* `InterCellar` implements a permutation test to calculate an empirical p-value of significance for the functional terms that were annotated to these unique int-pairs. **Table-FuncTerms** shows all functional terms that are significantly enriched by int-pairs unique to each condition. Finally, by selecting a term of interest, the user can visualize it in a **Sunburst Plot**. ```{r, echo=FALSE, fig.align='center'} knitr::include_graphics(path = "screenshots/MC_func_sun.png") ``` # References