--- title: 'Creating data metrics object' author: - name: Lindsay Rutter date: '`r Sys.Date()`' package: bigPint bibliography: bigPint.bib output: BiocStyle::html_document: toc_float: true tidy: TRUE vignette: > \usepackage[utf8]{inputenc} %\VignetteIndexEntry{"Creating data metrics object"} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignettePackage{bigPint} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo=TRUE) ``` ## Data metrics object Information about the `dataMetrics` object can be found in the article [Data metrics object ](https://lindsayrutter.github.io/bigPint/articles/dataMetrics.html). If a researcher is using the `bigPint` package to plot RNA-seq data, then many will create the `dataMetrics` object using popular RNA-seq packages such as [edgeR](https://bioconductor.org/packages/release/bioc/html/edgeR.html) [@robinson2010edger], [DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) [@love2014moderated], and [limma](https://bioconductor.org/packages/release/bioc/html/limma.html) [@ritchie2015limma]. These R packages will output several interesting quantitative variables for each gene in the dataset, such as log fold change, p-value, and FDR. These quantitative variables can be incorporated into the `dataMetrics` object and determine by thresholds what subset of genes will be superimposed onto plots. ____________________________________________________________________________________ ## Example: two treatments The following example shows how to create the `dataMetrics` object called `soybean_ir_sub_metrics`, which was shown in the article [Data metrics object](https://lindsayrutter.github.io/bigPint/articles/dataMetrics.html) [@soybeanIR]. The dataset from which it is created (`soybean_ir_sub`) contains only two treatment groups, N and P. In this case, the [edgeR](https://bioconductor.org/packages/release/bioc/html/edgeR.html) [@robinson2010edger] package was primarily followed. ```{r, eval=TRUE, include=TRUE, message=FALSE, warning=FALSE} library(bigPint) library(edgeR) library(data.table) data(soybean_ir_sub) data = soybean_ir_sub rownames(data) = data[,1] y = DGEList(counts=data[,-1]) group = c(1,1,1,2,2,2) y = DGEList(counts=y, group=group) Group = factor(c(rep("N",3), rep("P",3))) design <- model.matrix(~0+Group, data=y$samples) colnames(design) <- levels(Group) y <- estimateDisp(y, design) fit <- glmFit(y, design) soybean_ir_sub_metrics <- list() for (i in 1:(ncol(fit)-1)){ for (j in (i+1):ncol(fit)){ contrast=rep(0,ncol(fit)) contrast[i]=1 contrast[j]=-1 lrt <- glmLRT(fit, contrast=contrast) lrt <- topTags(lrt, n = nrow(y[[1]]))[[1]] setDT(lrt, keep.rownames = TRUE)[] colnames(lrt)[1] = "ID" lrt <- as.data.frame(lrt) soybean_ir_sub_metrics[[paste0(colnames(fit)[i], "_", colnames(fit)[j])]] <- lrt } } ``` We can indeed examine the generated `soybean_ir_sub_metrics` object as follows: ```{r, eval=TRUE, include=TRUE} str(soybean_ir_sub_metrics, strict.width = "wrap") ``` And verify that it contains one list element: ```{r, eval=TRUE, include=TRUE} names(soybean_ir_sub_metrics) ``` ____________________________________________________________________________________ ## Example: three treatments The following example shows how to create the `dataMetrics` object called `soybean_cn_sub_metrics`, which was shown in the article [Data metrics object](https://lindsayrutter.github.io/bigPint/articles/dataMetrics.html)). The dataset from which it is created (`soybean_cn_sub`) contains three treatment groups, S1, S2, and S3 [@brown2015developmental]. In this case, the [edgeR](https://bioconductor.org/packages/release/bioc/html/edgeR.html) [@robinson2010edger] package was primarily followed. ```{r, eval=TRUE, include=TRUE} library(edgeR) library(data.table) data(soybean_cn_sub) data = soybean_cn_sub rownames(data) = data[,1] y = DGEList(counts=data[,-1]) group = c(1,1,1,2,2,2,3,3,3) y = DGEList(counts=y, group=group) Group = factor(c(rep("S1",3), rep("S2",3), rep("S3",3))) design <- model.matrix(~0+Group, data=y$samples) colnames(design) <- levels(Group) y <- estimateDisp(y, design) fit <- glmFit(y, design) soybean_cn_sub_metrics <- list() for (i in 1:(ncol(fit)-1)){ for (j in (i+1):ncol(fit)){ contrast=rep(0,ncol(fit)) contrast[i]=1 contrast[j]=-1 lrt <- glmLRT(fit, contrast=contrast) lrt <- topTags(lrt, n = nrow(y[[1]]))[[1]] setDT(lrt, keep.rownames = TRUE)[] colnames(lrt)[1] = "ID" lrt <- as.data.frame(lrt) soybean_cn_sub_metrics[[paste0(colnames(fit)[i], "_", colnames(fit)[j])]] <- lrt } } ``` We can indeed examine the generated `soybean_cn_sub_metrics` object as follows: ```{r, eval=TRUE, include=TRUE} str(soybean_cn_sub_metrics, strict.width = "wrap") ``` And verify that it contains three list element: ```{r, eval=TRUE, include=TRUE} names(soybean_cn_sub_metrics) ``` ____________________________________________________________________________________ ## References