---
title: "GmicR_vignette"
output: rmarkdown::html_vignette 
vignette: >
  %\VignetteIndexEntry{GmicR_vignette}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Abstract
In this example, we will generate a gene module-immune cell signature
network using the GmicR pacakge. This package uses WGCNA to compresses 
high-dimensional expression data into module eigenegenes, which are used with
bayesian learning and xCell cell signatures to infer causal relationships 
between gene modules and cell signatures. Expression data must be normalized
(RPKM/FPKM/TPM/RSEM) and annotated with official gene symbols.


```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  fig.align= "center",
  comment = "#>"
)
```

## Installation of Bioconductor packages
```{r Bioconductor installation, echo=TRUE, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install('GmicR')
```
For macosx users experiencing WGCNA installation errors,
try downloading a compiled version from: 

  * https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/#manualInstall

For macosx users experiencing installation errors,
try downloading OS X binaries from:

  * https://CRAN.R-project.org/package=gRain
  * https://CRAN.R-project.org/package=gRbase

# Step 1 for GMIC building: Accessing Expression data 
For this example, we are downloading microarray expression data provided by
the xCell web portal. This dataset contains the expression profiles of 
twelve different types of human leukocytes from peripheral blood 
and bone marrow, before and after different treatments.

Detailed information about this dataset is available:

  * https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22886
  
## Downloading expression data
### NOTE: GmicR requires official gene symbols

```{r downloading data, echo=TRUE}
url <- "http://xcell.ucsf.edu/iris_u133a_expr.txt"

dat_download <- data.frame(read.delim(url),
row.names = 1, stringsAsFactors = FALSE, check.rows = FALSE)

# data are transposed for processing
datExpr0<-data.frame(t(dat_download))
```

## QC of expression data
WGCNA is used to for quality control of genes via the goodSamplesGenes function
```{r checking genes, message=FALSE, warning=FALSE}
library(WGCNA)
gsg = goodSamplesGenes(datExpr0, verbose = 3) # columns must be genes
gsg$allOK
```

A sampleTree can be used to check for outlier samples. For this example 
all samples are kept.
```{r checking samples}
sampleTree = hclust(dist(datExpr0), method = "average");

par(cex = 0.6);
par(mar = c(0,4,2,0))
plot(sampleTree, main = "Sample Filtering", 
labels = FALSE)

# final expression set ----------------------------------------------------
datExpr = datExpr0

```

## Exporting expression data for xCell signature analysis 
For cell signature detection using xCell, the expression data can be
written to a csv file. The file can be uploaded at: http://xcell.ucsf.edu
```{r saving expression data, echo=TRUE, eval = FALSE}
Exps_for_xCell_analysis<-data.frame(t(datExpr), check.names = FALSE)

write.csv(Exps_for_xCell_analysis, file = "Exps_for_xCell_analysis.csv")
```


### The xCell results will be emailed to you.  
Once you have the xCell data processed by http://xcell.ucsf.edu, you will 
receive an email linking to three text files. Download these files.
For GmicR, the "xCell results" file is required for Step 3.
```{r xCell email screen shot, echo=FALSE, out.width = '80%'}
xCell_email_dir<-system.file("extdata", "xCell_email.png", 
package = "GmicR", mustWork = TRUE)
knitr::include_graphics(xCell_email_dir)
```


```{r clearing environment and loading data, include=FALSE}
remove(list = ls())
library(GmicR)
sample_dat_dir<-system.file("extdata", "sample_dat.Rdata", 
                            package = "GmicR", mustWork = TRUE)
load(sample_dat_dir)
```


# Step 2 for GMIC building: gene module detection and annotation 
For simplicity, we will carryout WGCNA using 1000 randomly selected genes 
from 50 randomly selected samples


## WGCNA module detection
Auto_WGCNA is a wrapper for WGCNA. Not all options are avaible. For more
advanced features please use WGCNA.
```{r module detection, echo=TRUE, results=FALSE}
library(GmicR)

GMIC_Builder<-Auto_WGCNA(sample_dat, 
  mergeCutHeight = 0.35, minModuleSize = 10,
  deepSplit = 4, networkType = "signed hybrid", TOMType = "unsigned",
  corFnc = "bicor",  sft_RsquaredCut = 0.85,
  reassignThreshold = 1e-06, maxBlockSize = 25000)

```                         

Viewing input parameters
```{r modules, echo=TRUE, fig.height=5, fig.width=5}
GMIC_Builder$Input_Parameters
```

Soft threshold plot
```{r plot1,, echo=TRUE, fig.height=5, fig.width=5}
GMIC_Builder$Output_plots$soft_threshold_plot
```


```{r loading processed data, include=FALSE}

GMIC_Builder_dir<-system.file("extdata", "GMIC_Builder.Rdata", 
                            package = "GmicR", mustWork = TRUE)
load(GMIC_Builder_dir)

```

module clustering
```{r plot2, , echo=TRUE, fig.height=5, fig.width=5}
GMIC_Builder$Output_plots$module_clustering
```

dendrogram
```{r plot3, , echo=TRUE, fig.height=5, fig.width=5}
GMIC_Builder$Output_plots$net_dendrogram
```

## Module annotation
WGCNA functions intramodularConnectivity and chooseOneHubInEachModule are
used to build a dataframe with gene module information.
```{r GO module annotations, echo=TRUE}
# Module hubs and Gene influence
GMIC_Builder<-Query_Prep(GMIC_Builder,  
  calculate_intramodularConnectivity= TRUE,
  Find_hubs = TRUE)

head(GMIC_Builder$Query)
```


This function constructs a library for gene ontology enrichment, which will
be used for module naming with the GO_Module_NameR function. 

```{r GO enrichment, echo=TRUE}

GMIC_Builder<-GSEAGO_Builder(GMIC_Builder,
  species = "Homo sapiens", ontology = "BP", no_cores = 1)

```

GO_Module_NameR will assign a name to each module based on ontology size. A
smaller cut off size will generate a more specific term.
```{r GO module names, echo=TRUE}
GMIC_Builder<-GO_Module_NameR(GMIC_Builder)
```

This table provides a summary of detected modules. "Freq" indicates the total 
genes within each module
```{r GO_table}
head(GMIC_Builder$GO_table, n = 4)
```

A searchable dataframe is also generated
```{r GO_Query}
head(GMIC_Builder$GO_Query, n = 4)
```


# Step 3: Preparing module eigengenes and cell signatures for BN learning
## Specify the "xCell results" file directory
For this example, we are using cell signatures provided by the GmicR package,
which were generated using the xCell web portal.
```{r cell signatures, echo=TRUE}
file_dir<-system.file("extdata", "IRIS_xCell_sig.txt", 
                      package = "GmicR", mustWork = TRUE)
```

## Discretization
This function merges module eigengenes with xCell signatures prior to 
discretization. Only xCell signatures are supported. Discretization is carried 
out with bnlearning using "hartemink" method. For detailed information 
discretization see: 
http://www.bnlearn.com/documentation/man/preprocessing.html
```{r discretizing data}
GMIC_Builder_disc<-Data_Prep(GMIC_Builder,  
  xCell_Signatures = file_dir, 
ibreaks=10, Remove_ME0 = TRUE)

head(GMIC_Builder_disc$disc_data[sample(seq(1,64),4)])

```


```{r loading processed network, message=FALSE, warning=FALSE, include=FALSE}

GMIC_net_dir<-system.file("extdata", "GMIC_net.Rdata", 
                            package = "GmicR", mustWork = TRUE)
load(GMIC_net_dir)

```

# Step 4: BN learning
## Bayesian network learing with bootstrapping.
Although the default score for this function is Bayesian Dirichlet 
equivalent score (bde), for this example we will use the Bayesian Dirichlet 
sparse score (bds). For sparse data, such as the data used in this example,
the bds score is better suited: https://arxiv.org/abs/1605.03884.
```{r bnlearning, eval=FALSE, echo=TRUE}

no_cores<-1 # multicore support
cl<-parallel::makeCluster(1)


GMIC_net<-bn_tabu_gen(GMIC_Builder_disc, 
  cluster = cl, debug = FALSE, 
  bootstraps_replicates = 50, score = "bds")

parallel::stopCluster(cl) # stop cluster
```

## Detecting arcs for inversly related nodes
For hypothesis generation, it may be helpful to distiguish positive 
relationships from negative. The InverseARCs function from GmicR identifies 
these relationships from probability distributions generated from mutilated 
network queries. A correlation matrix is generated and a threshold is applied
to specify a slope cut off for inverse relationships. By default the threhold is
set to -0.3.
```{r detecting inverse relationships, echo=TRUE}


GMIC_Final<-InverseARCs(GMIC_net, threshold = -0.3)

```


## GmicR shiny app
Once complete, the GMIC network can be viewed using the Gmic_viz shiny app. 
```{r Visualizing network, echo=TRUE}
GMIC_Final_dir<-system.file("extdata", "GMIC_Final.Rdata", 
                          package = "GmicR", mustWork = TRUE)
load(GMIC_Final_dir)

if(interactive()){
Gmic_viz(GMIC_Final)
}
```

### GMIC_network_Query
You can view the entire network or just a subset of nodes. Inverse relationships
can be highlighted based on color and/or edge pattern. 

Not all nodes are represented:

  * Module must have at least one connection
  * xCell signature must have a relationship with a module
  
```{r screen shot1, echo=FALSE, out.width = '100%'}
example_shiny_dir<-system.file("extdata", "example_shiny1.png", 
package = "GmicR", mustWork = TRUE)
knitr::include_graphics(example_shiny_dir)
```


### Module_names_Query
You can search for your favorite gene or module of interest. 
```{r screen shot2, echo=FALSE, out.width = '100%'}
example_shiny_dir<-system.file("extdata", "example_shiny2.png", 
package = "GmicR", mustWork = TRUE)
knitr::include_graphics(example_shiny_dir)
```

### Module_names_BP_table
Or view a module summary table
```{r screen shot3, echo=FALSE, out.width = '100%'}
example_shiny_dir<-system.file("extdata", "example_shiny3.png", 
package = "GmicR", mustWork = TRUE)
knitr::include_graphics(example_shiny_dir)
```