--- title: "Zenith gene set testing after dream analysis" author: - name: "[Gabriel Hoffman](http://gabrielhoffman.github.io)" affiliation: | Icahn School of Medicine at Mount Sinai, New York vignette: > %\VignetteIndexEntry{Example usage of zenith on RNA-seq} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\usepackage[utf8]{inputenc} output: html_document: toc: true toc_float: true --- ```{r knitr.setup, echo=FALSE} library(knitr) knitr::opts_chunk$set( echo = TRUE, warning=FALSE, message=FALSE, error = FALSE, tidy = FALSE, cache = FALSE) ``` `zenith` performs gene set analysis on the result of differential expression using linear (mixed) modeling with [dream](https://doi.org/10.1093/bioinformatics/btaa687) by considering the correlation between gene expression traits. This package implements the [camera](https://www.rdocumentation.org/packages/limma/versions/3.28.14/topics/camera) method from the [limma](https://bioconductor.org/packages/limma/) package proposed by [Wu and Smyth (2012)](https://doi.org/10.1093/nar/gks461). `zenith()` is a simple extension of `camera()` to be compatible with linear (mixed) models implemented in `dream()`. # Standard workflow ```{r standard} # Load packages library(zenith) library(edgeR) library(variancePartition) library(tweeDEseqCountData) library(kableExtra) # Load RNA-seq data from LCL's data(pickrell) geneCounts = exprs(pickrell.eset) df_metadata = pData(pickrell.eset) # Filter genes # Note this is low coverage data, so just use as code example dsgn = model.matrix(~ gender, df_metadata) keep = filterByExpr(geneCounts, dsgn, min.count=5) # Compute library size normalization dge = DGEList(counts = geneCounts[keep,]) dge = calcNormFactors(dge) # Estimate precision weights using voom vobj = voomWithDreamWeights(dge, ~ gender, df_metadata) # Apply dream analysis fit = dream(vobj, ~ gender, df_metadata) fit = eBayes(fit) # Load get_MSigDB database, Hallmark genes # use gene 'SYMBOL', or 'ENSEMBL' id # use get_GeneOntology() to load Gene Ontology msdb.gs = get_MSigDB("H", to="ENSEMBL") # Run zenith analysis, and specific which coefficient to evaluate res.gsa = zenith_gsa(fit, msdb.gs, 'gendermale', progressbar=FALSE ) # Show top gene sets: head(res.gsa) kable_styling(kable(head(res.gsa), row.names=FALSE)) ``` ```{r plots} # for each cell type select 3 genesets with largest t-statistic # and 1 geneset with the lowest # Grey boxes indicate the gene set could not be evaluted because # to few genes were represented plotZenithResults(res.gsa) ``` # Session Info ```{r session, echo=FALSE} sessionInfo() ```