--- title: 'OMICsPCAdata: Supporting data for package OMICsPCA' author: "Subhadeep Das" date: "`r Sys.Date()`" output: html_document: df_print: tibble fig_height: 7 fig_width: 9 highlight: pygments keep_md: yes number_sections: yes theme: lumen toc: yes toc_depth: 6 vignette: > %\VignetteEncoding{UTF-8} %\VignetteIndexEntry{OMICsPCAdata} %\VignetteEngine{knitr::rmarkdown} --- ```{r include = TRUE, echo = FALSE, message = FALSE, warning = FALSE} library(OMICsPCAdata) library(kableExtra) library(rmarkdown) library(knitr) library(MultiAssayExperiment) ``` ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#", eval = TRUE ) ``` ```{r global_options, include=FALSE} knitr::opts_chunk$set(fig.pos = 'H') ``` ```{r echo = FALSE} datalist <- data(package = "OMICsPCAdata") datanames <- datalist$results[,3] data(list = datanames) assays <- assays(multi_assay) ``` # Overview This package contains suporting data for the package OMICsPCA. The datasets included in this package are used to run the examples provided at the vignette and the man pages of the functions of OMICsPCA package. # Datasets OMICsPCAdata contains 4 data sets including 1 expression dataset (CAGE) and 3 ChIP-seq experiments of various Histone modifications. Each dataset contains the values corresponding to 28770 GENCODE TSS groups (rows). TSS groups are obtained by merging the neighboring TSSs together. Each column contains either ChIP-seq peak intensities (for Histone modifications) or length of DHS (for DNaseI hypersensitivity) or number of reads (tpm) of CAGE defined TSSs (CTSS), coming from a specific cell line. The datasets are described below: ```{r echo =FALSE} data_summary = data.frame( Name = c( "CAGE" ,"H2az" ,"H3k9ac", "H3k4me1"), Type_of_Assay = c( "Expression of Transcription Start Sites(TSS)", "location of Histone modification peaks", "location of Histone modification peaks", "location of Histone modification peaks" ), No_of_Cell_lines =c( ncol(assays$CAGE), ncol(assays$H2az), ncol(assays$H3k9ac), ncol(assays$H3k4me1) ), Name_of_Cell_lines = c( paste(names(assays$CAGE), collapse = " ,"), paste(names(assays$H2az), collapse = " ,"), paste(names(assays$H3k9ac), collapse = " ,"), paste(names(assays$H3k4me1), collapse = " ,") ), Type_of_data = c( "Cap Analysis\nof\nGene Expression", "ChIP-seq", "ChIP-seq", "ChIP-seq" ) ) names(data_summary) <- c("Assay", "Assay\nType", "Number of\nCell lines", "Name of\ncell lines", "Experiment") ``` \ ```{r,eval = TRUE, echo=FALSE, results='asis'} #for html knitr::kable(data_summary, caption = "Summary of data sets", align = 'c') %>% kable_styling("bordered",full_width = FALSE, position = "center") %>% column_spec(1, bold = FALSE, border_right = FALSE, border_left = FALSE, width = "5em") %>% column_spec(2, bold = FALSE, border_right = FALSE, border_left = FALSE, width = "5em") %>% column_spec(3, border_right = FALSE, width = "4em") %>% column_spec(4, border_right = FALSE, width = "23em") %>% column_spec(5, border_right = FALSE, width = "5em") ``` \ # Example of datasets Each dataset contains the value of 28770 GENCODE TSS groups in multiple Cell lines. Here is an example: ## CAGE data ```{r} # The CAGE data set contains normalized CAGE data of 28770 GENCODE #TSS groups in from 31 cell lines dim(assays$CAGE) # Let's look at the first five rows and columns of this dataset head(assays$CAGE[1:5,1:5]) ```