--- title: "The VennDetail package" author: - name: Kai Guo, Brett McGregor, James Porter, and Junguk Hur affiliation: - Biomedical Sciences, University of North Dakota date: "`r Sys.Date()`" output: html_document: df_print: paged word_document: toc: yes toc_depth: '6' rmarkdown::html_vignette: default pdf_document: toc: yes toc_depth: 6 vignette: | \usepackage[utf8]{inputenc} %\VignetteIndexEntry{VennDetail} %\VignetteEngine{knitr::knitr} --- __VennDetail__ An R package for visualizing and extracting details of multi-sets intersection ## 1. Introduction Visualizing and extracting unique (disjoint) or overlapping subsets of multiple gene datasets are a frequently performed task for bioinformatics. Although various packages and web applications are available, no R package offering functions to extract and combine details of these subsets with user datasets in data frame is available. Moreover, graphical visualization is usually limited to six or less gene datasets and a novel method is required to properly show the subset details.We have developed __VennDetail__, an R package to generate high-quality Venn-Pie charts and to allow extraction of subset details from input datasets. ## 2. Software Usage ### 2.1 Installation The package can be installed as ``` {r install,eval = FALSE} if (!requireNamespace("BiocManager")) install.packages("BiocManager") `BiocManager::install("VennDetail") ``` ### 2.2 Data Input ```{r load, results = 'hide', message = FALSE} library(VennDetail) data(T2DM) ``` T2DM data include three sets of differentially expressed genes (DEGs) from the publication by _Hinder et al_ [1]. The three DEG datasets were obtained in three different tissues, kidney Cortex, kidney glomerula, and sciatic nerve, by comparing db/db diabetic mice and db/db mice with pioglitazone treatment. Differential expression was determined by using Cuffdiff with a false discovery rate (FDR) < 0.05. ### 2.3 Quick Tour ``` {r quick} ven <- venndetail(list(Cortex = T2DM$Cortex$Entrez, SCN = T2DM$SCN$Entrez, Glom = T2DM$Glom$Entrez)) ``` _VennDetail_ supports three different types of Venn diagram display formats ``` {r fig1, fig.width = 6, fig.height = 5, fig.align = "center"} ##traditional venn diagram plot(ven) ``` ``` {r fig2, fig.width = 6, fig.height = 5, fig.align = "center"} ##Venn-Pie format plot(ven, type = "vennpie") ``` ``` {r fig3, fig.width = 6, fig.height = 5, fig.align = "center"} ##Upset format plot(ven, type = "upset") ``` ### 2.4 Main Functions -- _venndetail_ uses a list of vectors as input to construct the shared or disjoint subsets _Venn_ object. _venndetail_ accepts a list of vector as input and returns a _Venn_ object for the following analysis. Users can also use _merge_ function to merge two _Venn_ objects together to save time. -- _plot_ generates figures with different layouts with _type_ parameter. _plot_ function also provides lots of parameters for users to modify the figures. -- _getSet_ function provides a way to extract subsets from the main result along with any available annotations. The parameter _subset_ asks the users to give the subset names to extract. It accepts a vector of subset names. Here, we will show how the DEGs shared by all three tissues as well as those that are only included by SCN tissue can be extracted. ```{r get} ## List the subsets name detail(ven) head(getSet(ven, subset = c("Shared", "SCN")), 10) ``` -- _result_ function can be used to extract and export all of the subsets for further processing. We currently support two different formats of result (long and wide formats). ```{r result} ## long format: the first column lists the subsets name, and the second column ## shows the genes included in the subsets head(result(ven)) ## wide format: the first column lists all the genes, the following columns ## display the groups name (three tissues) and the last column is the total ## number of the gene shared by groups. head(result(ven, wide = TRUE)) ``` -- _vennpie_ creates a Venn-pie diagram with unique or common subsets in multiple ways such as highlighting unique or shared subsets. The following example illustrates how to show the unique subsets on the venn-pie plots. ```{r fig4, fig.width = 6, fig.height = 5, fig.align = "center"} vennpie(ven, any = 1, revcolor = "lightgrey") ``` The parameters _any_ and _group_ provide two different ways to highlight the subsets. _any_ determines the subsets to show up in the number of groups (1: those included in just one group; 2: those shared by any two groups). _group_ asks users to specify the subsets to be highlighted. Users may check the sets name by using _detail_ function. Since the example datasets used in this vignette include only a small number of shared genes all across three sets (n=8), it may be a little hard to see the shared subset (grey), particularly in the Cortex group (the inner-most circle). . ```{r fig5, fig.width = 6, fig.height = 5, fig.align = "center" } vennpie(ven, log = TRUE) ``` When we have five datasets, we can use vennpie to show the sets include elements from at least four datasets. Below show the reults with five datasets as input. ```{r fig6, fig.width = 6, fig.height = 5, fig.align = "center" } set.seed(123) A <- sample(1:1000, 400, replace = FALSE) B <- sample(1:1000, 600, replace = FALSE) C <- sample(1:1000, 350, replace = FALSE) D <- sample(1:1000, 550, replace = FALSE) E <- sample(1:1000, 450, replace = FALSE) venn <- venndetail(list(A = A, B = B, C= C, D = D, E = E)) vennpie(venn, min = 4) ``` -- _getFeature_ allows users to combine the details of any or all subsets from the main result with users’ other datasets, containing a list of data frames, and to export the combined data as a data frame. In the following example, we will demonstrate how to add other available annotation in the input data (T2DM) such as log2FC and FDR values for the shared genes among these three tissues. ```{r getfeature} head(getFeature(ven, subset = "Shared", rlist = T2DM)) ``` -- _dplot_ shows the details of these subsets with bar-plot. ```{r fig7, fig.width = 6, fig.height = 5, fig.align = "center"} dplot(ven, order = TRUE, textsize = 4) ``` ### 2.5 Shiny web app A shiny web application is here: [VennDetail](http://hurlab.med.und.edu/VennDetail/) Note: Only support five input datasets now ## 3 Contact information For any questions please contact guokai8@gmail.com ## 4 Reference [1] Hinder LM, Park M, Rumora AE, Hur J, Eichinger F, Pennathur S, Kretzler M, Brosius FC 3rd, Feldman EL.Comparative RNA-Seq transcriptome analyses reveal distinct metabolic pathways in diabetic nerve and kidney disease. _J Cell Mol Med._ 2017 Sep;21(9):2140-2152. doi: 10.1111/jcmm.13136. Epub 2017 Mar 8.