--- title: "GmicR_vignette" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{GmicR_vignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Abstract In this example, we will generate a gene module-immune cell signature network using the GmicR pacakge. This package uses WGCNA to compresses high-dimensional expression data into module eigenegenes, which are used with bayesian learning and xCell cell signatures to infer causal relationships between gene modules and cell signatures. Expression data must be normalized (RPKM/FPKM/TPM/RSEM) and annotated with official gene symbols. ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, fig.align= "center", comment = "#>" ) ``` ## Installation of Bioconductor packages ```{r Bioconductor installation, echo=TRUE, eval=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install('GmicR') ``` For macosx users experiencing WGCNA installation errors, try downloading a compiled version from: * https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/#manualInstall For macosx users experiencing installation errors, try downloading OS X binaries from: * https://CRAN.R-project.org/package=gRain * https://CRAN.R-project.org/package=gRbase # Step 1 for GMIC building: Accessing Expression data For this example, we are downloading microarray expression data provided by the xCell web portal. This dataset contains the expression profiles of twelve different types of human leukocytes from peripheral blood and bone marrow, before and after different treatments. Detailed information about this dataset is available: * https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22886 ## Downloading expression data ### NOTE: GmicR requires official gene symbols ```{r downloading data, echo=TRUE} url <- "http://xcell.ucsf.edu/iris_u133a_expr.txt" dat_download <- data.frame(read.delim(url), row.names = 1, stringsAsFactors = FALSE, check.rows = FALSE) # data are transposed for processing datExpr0<-data.frame(t(dat_download)) ``` ## QC of expression data WGCNA is used to for quality control of genes via the goodSamplesGenes function ```{r checking genes, message=FALSE, warning=FALSE} library(WGCNA) gsg = goodSamplesGenes(datExpr0, verbose = 3) # columns must be genes gsg$allOK ``` A sampleTree can be used to check for outlier samples. For this example all samples are kept. ```{r checking samples} sampleTree = hclust(dist(datExpr0), method = "average"); par(cex = 0.6); par(mar = c(0,4,2,0)) plot(sampleTree, main = "Sample Filtering", labels = FALSE) # final expression set ---------------------------------------------------- datExpr = datExpr0 ``` ## Exporting expression data for xCell signature analysis For cell signature detection using xCell, the expression data can be written to a csv file. The file can be uploaded at: http://xcell.ucsf.edu ```{r saving expression data, echo=TRUE, eval = FALSE} Exps_for_xCell_analysis<-data.frame(t(datExpr), check.names = FALSE) write.csv(Exps_for_xCell_analysis, file = "Exps_for_xCell_analysis.csv") ``` ### The xCell results will be emailed to you. Once you have the xCell data processed by http://xcell.ucsf.edu, you will receive an email linking to three text files. Download these files. For GmicR, the "xCell results" file is required for Step 3. ```{r xCell email screen shot, echo=FALSE, out.width = '80%'} xCell_email_dir<-system.file("extdata", "xCell_email.png", package = "GmicR", mustWork = TRUE) knitr::include_graphics(xCell_email_dir) ``` ```{r clearing environment and loading data, include=FALSE} remove(list = ls()) library(GmicR) sample_dat_dir<-system.file("extdata", "sample_dat.Rdata", package = "GmicR", mustWork = TRUE) load(sample_dat_dir) ``` # Step 2 for GMIC building: gene module detection and annotation For simplicity, we will carryout WGCNA using 1000 randomly selected genes from 50 randomly selected samples ## WGCNA module detection Auto_WGCNA is a wrapper for WGCNA. Not all options are avaible. For more advanced features please use WGCNA. ```{r module detection, echo=TRUE, results=FALSE} library(GmicR) GMIC_Builder<-Auto_WGCNA(sample_dat, mergeCutHeight = 0.35, minModuleSize = 10, deepSplit = 4, networkType = "signed hybrid", TOMType = "unsigned", corFnc = "bicor", sft_RsquaredCut = 0.85, reassignThreshold = 1e-06, maxBlockSize = 25000) ``` Viewing input parameters ```{r modules, echo=TRUE, fig.height=5, fig.width=5} GMIC_Builder$Input_Parameters ``` Soft threshold plot ```{r plot1,, echo=TRUE, fig.height=5, fig.width=5} GMIC_Builder$Output_plots$soft_threshold_plot ``` ```{r loading processed data, include=FALSE} GMIC_Builder_dir<-system.file("extdata", "GMIC_Builder.Rdata", package = "GmicR", mustWork = TRUE) load(GMIC_Builder_dir) ``` module clustering ```{r plot2, , echo=TRUE, fig.height=5, fig.width=5} GMIC_Builder$Output_plots$module_clustering ``` dendrogram ```{r plot3, , echo=TRUE, fig.height=5, fig.width=5} GMIC_Builder$Output_plots$net_dendrogram ``` ## Module annotation WGCNA functions intramodularConnectivity and chooseOneHubInEachModule are used to build a dataframe with gene module information. ```{r GO module annotations, echo=TRUE} # Module hubs and Gene influence GMIC_Builder<-Query_Prep(GMIC_Builder, calculate_intramodularConnectivity= TRUE, Find_hubs = TRUE) head(GMIC_Builder$Query) ``` This function constructs a library for gene ontology enrichment, which will be used for module naming with the GO_Module_NameR function. ```{r GO enrichment, echo=TRUE} GMIC_Builder<-GSEAGO_Builder(GMIC_Builder, species = "Homo sapiens", ontology = "BP", no_cores = 1) ``` GO_Module_NameR will assign a name to each module based on ontology size. A smaller cut off size will generate a more specific term. ```{r GO module names, echo=TRUE} GMIC_Builder<-GO_Module_NameR(GMIC_Builder) ``` This table provides a summary of detected modules. "Freq" indicates the total genes within each module ```{r GO_table} head(GMIC_Builder$GO_table, n = 4) ``` A searchable dataframe is also generated ```{r GO_Query} head(GMIC_Builder$GO_Query, n = 4) ``` # Step 3: Preparing module eigengenes and cell signatures for BN learning ## Specify the "xCell results" file directory For this example, we are using cell signatures provided by the GmicR package, which were generated using the xCell web portal. ```{r cell signatures, echo=TRUE} file_dir<-system.file("extdata", "IRIS_xCell_sig.txt", package = "GmicR", mustWork = TRUE) ``` ## Discretization This function merges module eigengenes with xCell signatures prior to discretization. Only xCell signatures are supported. Discretization is carried out with bnlearning using "hartemink" method. For detailed information discretization see: http://www.bnlearn.com/documentation/man/preprocessing.html ```{r discretizing data} GMIC_Builder_disc<-Data_Prep(GMIC_Builder, xCell_Signatures = file_dir, ibreaks=10, Remove_ME0 = TRUE) head(GMIC_Builder_disc$disc_data[sample(seq(1,64),4)]) ``` ```{r loading processed network, message=FALSE, warning=FALSE, include=FALSE} GMIC_net_dir<-system.file("extdata", "GMIC_net.Rdata", package = "GmicR", mustWork = TRUE) load(GMIC_net_dir) ``` # Step 4: BN learning ## Bayesian network learing with bootstrapping. Although the default score for this function is Bayesian Dirichlet equivalent score (bde), for this example we will use the Bayesian Dirichlet sparse score (bds). For sparse data, such as the data used in this example, the bds score is better suited: https://arxiv.org/abs/1605.03884. ```{r bnlearning, eval=FALSE, echo=TRUE} no_cores<-1 # multicore support cl<-parallel::makeCluster(1) GMIC_net<-bn_tabu_gen(GMIC_Builder_disc, cluster = cl, debug = FALSE, bootstraps_replicates = 50, score = "bds") parallel::stopCluster(cl) # stop cluster ``` ## Detecting arcs for inversly related nodes For hypothesis generation, it may be helpful to distiguish positive relationships from negative. The InverseARCs function from GmicR identifies these relationships from probability distributions generated from mutilated network queries. A correlation matrix is generated and a threshold is applied to specify a slope cut off for inverse relationships. By default the threhold is set to -0.3. ```{r detecting inverse relationships, echo=TRUE} GMIC_Final<-InverseARCs(GMIC_net, threshold = -0.3) ``` ## GmicR shiny app Once complete, the GMIC network can be viewed using the Gmic_viz shiny app. ```{r Visualizing network, echo=TRUE} GMIC_Final_dir<-system.file("extdata", "GMIC_Final.Rdata", package = "GmicR", mustWork = TRUE) load(GMIC_Final_dir) if(interactive()){ Gmic_viz(GMIC_Final) } ``` ### GMIC_network_Query You can view the entire network or just a subset of nodes. Inverse relationships can be highlighted based on color and/or edge pattern. Not all nodes are represented: * Module must have at least one connection * xCell signature must have a relationship with a module ```{r screen shot1, echo=FALSE, out.width = '100%'} example_shiny_dir<-system.file("extdata", "example_shiny1.png", package = "GmicR", mustWork = TRUE) knitr::include_graphics(example_shiny_dir) ``` ### Module_names_Query You can search for your favorite gene or module of interest. ```{r screen shot2, echo=FALSE, out.width = '100%'} example_shiny_dir<-system.file("extdata", "example_shiny2.png", package = "GmicR", mustWork = TRUE) knitr::include_graphics(example_shiny_dir) ``` ### Module_names_BP_table Or view a module summary table ```{r screen shot3, echo=FALSE, out.width = '100%'} example_shiny_dir<-system.file("extdata", "example_shiny3.png", package = "GmicR", mustWork = TRUE) knitr::include_graphics(example_shiny_dir) ```