---
title: "Using scTHI"
author: 
- name:    "Michele Ceccarelli" 
- name:    "Francesca Pia Caruso"
output:    BiocStyle::html_document 
vignette: >
    %\VignetteIndexEntry{Using scTHI}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}  
    %\VignetteDepends{BiocStyle} 
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```



# Introduction
The detection of tumor-host interaction is a major challange in cancer 
understanding. More recent advances in single cell next generation 
sequencing have provided new insights in unveiling tumor microenvironment 
(TME) cells comunications. The scTHI package provides some useful functions 
to identify and visualize the cell types making up the TME, and a rank-based 
approach to discover interesting ligand-receptor interactions establishing 
among malignant and non tumor cells

# Installation

```{r echo = FALSE, message=FALSE,warning=FALSE, eval=FALSE, echo=TRUE}
if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("scTHI")
```

# Examples 
## Analysis of H3K27M glioma from Fibin et al. Science 2018

### Input Data
The scTHI package requires a matrix of count data (or normalized counts) 
from single cell RNA-seq experiments, where rows are genes presented with 
Hugo Symbols and columns are cells. For a practical demonstration, we 
will use the scRNA-seq data available on the [Broad Institute Single-Cell 
Portal](https://portals.broadinstitute.org/single_cell/study/single-cell-
analysis-in-pediatric-midline-gliomas-withhistone-h3k27m-mutation) [1]. 
The dataset provides 3,321 scRNA-seq profiles from six primary Glioma 
(H3K27M-glioma), and includes both malignant cells and several tumor 
microenvironment cell types, as immune cells and oligodendrocytes. 
Our goal is to identify the significant ligand-receptor interactions 
that are established between cancer and immune cells present in the tumor
microenvironment. For this reason, after downloading H3K27M-glioma data 
we preprocessed them, selecting only one sample of interest (i.e. BCH836), 
removing not-expresse genes, transforming data in log TPM , and applying 
quantile normalization. Processed are included in the package and can be 
loaded as follows:

```{r,warning=FALSE}
library(scTHI)
library(scTHI.data)
set.seed(1234)
data("H3K27")
data("H3K27.meta")
```


H3K27.meta includes the annotation of each cell, so as to identify 
the tumor cells and the immune cells in BCH836 patient:

```{r}
table(H3K27.meta$Type)
```

We will use the cells annotated as Immune and a subset of Malignant.

```{r}
Malignant <-
rownames(H3K27.meta)[H3K27.meta$Type == "Malignant"][1:100]
Immune <- rownames(H3K27.meta)[H3K27.meta$Type == "Immune cell"]
```

### Run scTHI.score
The function `scTHI.score` can be used  to detect significant 
ligand-receptor interactions occurring between cancer and immune cells. 
The function uses a dataset of 1044 ligand-receptor pairs to identify 
those which are significantly enriched between pairs clusters of your 
dataset. A score is computed for each interaction pair. This score does 
not simply reflect the average expression of the two partners in the 
interaction pair, it is rather based on the percentage of cells that 
express the interaction pair at the top of the ordered gene list. It 
considers a gene expressed in a cell, only if it is in the top 10% of 
the ranked gene list, as set in `topRank` argument.  
Interactions are not symmetric, so partnerA expression is considered 
on the first cluster (in the example the Malignant cells), and partnerB 
expression is considered on the second cluster (i.e. Immune cells). 
A p-value for each pair is computed, based on a null model from random 
permutations of data, and  and only significant 
interaction pairs are reported.

```{r, warning=FALSE}
H3K27.result <-
scTHI_score(
expMat = H3K27,
cellCusterA = Malignant,
cellCusterB = Immune,
cellCusterAName = "Malignant",
cellCusterBName = "Immune",
topRank = 10,
PValue = TRUE,
pvalueCutoff = 0.05,
nPermu = 10,
ncore = 1
)
```
For which partner A and partner B are expressed in at least 50% 
of the cells of the respective clusters, as set in `filterCutoff` 
argument. P-value computation is optional and is estimated by 
permuting the mates of interaction pairs. Result also includes 
two columns with the average expression values of each partner
in the respective clusters.

```{r}
head(H3K27.result$result)
```

The function `scTHI.plotResult` with argument `plotType` as 
"score" displays the score of each significant  pair through 
barplots. The number of asterisks indicates significance. 
Otherwise, setting the argument `plotType` as "pair", for each 
pair are shown bar plots displaying the percentage of cells in 
each cluster expressing partner A and partner B, respectively.

```{r}
scTHI_plotResult(scTHIresult = H3K27.result,
                cexNames = 0.7,
                plotType = "score")
```


```{r}
scTHI_plotResult(scTHIresult = H3K27.result,
                cexNames = 0.7,
                plotType = "pair")
```

Another useful way to explore results is the visualization of the 
entire data set through the t-SNE plot. scTHI allows the user to 
compute the non-linear dimensionality reduction of data using the 
scTHI.runTsne function based on the Rtsne R package [2].
```{r}
H3K27.result <- scTHI_runTsne(scTHIresult = H3K27.result)
```

Users can decide to display on the tsne the two input clusters, 
i.e. Maglignant and Immune cells, or to observe the intensity of 
the expression of an intresting interaction pair in the whole 
dataset. The scTHI.plotCluster function differently labels cells 
on t-SNE if they belong to ClusterA or ClusterB.

```{r}
scTHI_plotCluster(scTHIresult = H3K27.result,
                cexPoint = 0.8,
                legendPos = "bottomleft")
```


On the other hand the scTHI.plotPairs function colors the cells on 
the t-SNE based on the expression levels of a user-defined 
interaction pair. For example, let’s try plotting the expression 
of the interaction pair consisting of THY1, which should be expressed 
by the cancer cells and the ITGAX-ITGB2 complex, instead expressed 
by the immune cells.

```{r}
scTHI_plotPairs(
scTHIresult = H3K27.result,
cexPoint = 0.8,
interactionToplot = "THY1_ITGAX:ITGB2"
)
```

The THY1 gene appears uniformly expressed in tumor cells, and 
in according to results table, it was expressed in 861% of 
malignant cells at the top 10% of the ranked gene list of 
each cell. On the other hand the ITGAX-ITGB2 complex is highly 
specific of the Immune cluster, where it is expressed by 92% of
the cells at the top 10% of the ranked gene list of each cell.

### Tumor microenvironment classification
Another important problem in study tumor-host interaction is 
the classification of all cell types that make up the tumor 
microenvironment. Generally, tools for single cell data analysis 
are based on the use of marker genes to classify different cell 
types identified by clustering approaches. This strategy is not 
always the most accurate, our approach uses a different method 
for the classification of TME based on gene signature enrichment 
analysis. The `TME.classification` implements the 
Mann-Whitney-Wilcoxon Gene Set Test (MWW-GST) algorithm [3] 
and tests for each cell the enrichment of a collection of 
signatures of different cell types, with greater interest for 
the Immune System and the Central Nervous System cells. Thus, 
each cell is associated with a specific phenotype based on the 
more significant enriched gene set. 
Let's try to classify the TME cells of the BCH836 patient, 
excluding malignant cells from the dataset.

```{r}
Tme.cells <-
rownames(H3K27.meta)[which(H3K27.meta$Type != "Malignant")]
H3K27.tme <- H3K27[, Tme.cells]
```
Note that `TME.classification` function can take a 
long time if the number of cells to test is large.



```{r, message=FALSE}
Class <- TME_classification(expMat = H3K27.tme, minLenGeneSet = 10)
```
The H3K27M dataset annotation table suggests that non-tumor cells 
from BCH836 patient are mainly composed by Oligodendrocyte and Immune 
cells, respectively. The TME.classification function identifies a 
similar number of Oligodendrocytes and Immune cells, distinguishing 
the latter mainly in Microglia, Monocytes and Macrophages and fewer 
other Immune cells.

```{r}
table(Class$Class[Tme.cells], H3K27.meta[Tme.cells, "Type"])
```
Finally, using the previously calculated t-SNE or a set of 
user-defined coordinates, you can view the new classification 
of the TME.
```{r}
TME_plot(tsneData = H3K27.result$tsneData, Class, cexPoint = 0.8)
```


# Session Info
```{r}
sessionInfo()
```

# References
1. Filbin, M. G., Tirosh, I., Hovestadt, V., Shaw, M. L., Escalante, 
L. E., Mathewson, N. D., ... & Haberler, C. (2018). Developmental 
and oncogenic programs in H3K27M gliomas dissected by single-cell 
RNA-seq. Science, 360(6386), 331-335.


2. Maaten, L. V. D., & Hinton, G. (2008). Visualizing data 
using t-SNE. Journal of machine learning research, 9(Nov), 2579-2605.

3. Frattini, V., Pagnotta, S. M., Fan, J. J., Russo, M. V., Lee, 
S. B., Garofano, L., ... & Frederick, V. (2018). A metabolic 
function of FGFR3-TACC3 gene fusions in cancer. Nature, 553(7687), 222.