--- title: "Overview of the tidySummarizedExperiment package" author: "Stefano Mangiola" date: "`r Sys.Date()`" package: tidySummarizedExperiment output: BiocStyle::html_document: toc_float: true bibliography: tidySummarizedExperiment.bib vignette: > %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{Overview of the tidySummarizedExperiment package} %\usepackage[UTF-8]{inputenc} --- [![Lifecycle:maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing) **Brings SummarizedExperiment to the tidyverse!** website: [stemangiola.github.io/tidySummarizedExperiment/](https://stemangiola.github.io/tidySummarizedExperiment/) Please also have a look at - [tidySingleCellExperiment](https://stemangiola.github.io/tidySingleCellExperiment/) for tidy manipulation of SingleCellExperiment objects - [tidyseurat](https://stemangiola.github.io/tidyseurat/) for tidy manipulation of Seurat objects - [tidybulk](https://stemangiola.github.io/tidybulk/) for tidy analysis of RNA sequencing data - [nanny](https://github.com/stemangiola/nanny) for tidy high-level data analysis and manipulation - [tidygate](https://github.com/stemangiola/tidygate) for adding custom gate information to your tibble - [tidyHeatmap](https://stemangiola.github.io/tidyHeatmap/) for heatmaps produced with tidy principles ```{r, echo=FALSE, include=FALSE} library(knitr) knitr::opts_chunk$set(warning=FALSE, message=FALSE) ``` # Introduction tidySummarizedExperiment provides a bridge between Bioconductor [SummarizedExperiment](https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html) [@morgan2020summarized] and the tidyverse [@wickham2019welcome]. It creates an invisible layer that enables viewing the Bioconductor *SummarizedExperiment* object as a tidyverse tibble, and provides SummarizedExperiment-compatible *dplyr*, *tidyr*, *ggplot* and *plotly* functions. This allows users to get the best of both Bioconductor and tidyverse worlds. ## Functions/utilities available SummarizedExperiment-compatible Functions | Description ------------ | ------------- `all` | After all `tidySummarizedExperiment` is a SummarizedExperiment object, just better tidyverse Packages | Description ------------ | ------------- `dplyr` | Almost all `dplyr` APIs like for any tibble `tidyr` | Almost all `tidyr` APIs like for any tibble `ggplot2` | `ggplot` like for any tibble `plotly` | `plot_ly` like for any tibble Utilities | Description ------------ | ------------- `as_tibble` | Convert cell-wise information to a `tbl_df` ## Installation ```{r, eval=FALSE} if (!requireNamespace("BiocManager", quietly=TRUE)) { install.packages("BiocManager") } BiocManager::install("tidySummarizedExperiment") ``` From Github (development) ```{r, eval=FALSE} devtools::install_github("stemangiola/tidySummarizedExperiment") ``` Load libraries used in the examples. ```{r} library(ggplot2) library(tidySummarizedExperiment) ``` # Create `tidySummarizedExperiment`, the best of both worlds! This is a SummarizedExperiment object but it is evaluated as a tibble. So it is fully compatible both with SummarizedExperiment and tidyverse APIs. ```{r} pasilla_tidy <- tidySummarizedExperiment::pasilla ``` **It looks like a tibble** ```{r} pasilla_tidy ``` **But it is a SummarizedExperiment object after all** ```{r} assays(pasilla_tidy) ``` # Tidyverse commands We can use tidyverse commands to explore the tidy SummarizedExperiment object. We can use `slice` to choose rows by position, for example to choose the first row. ```{r} pasilla_tidy %>% slice(1) ``` We can use `filter` to choose rows by criteria. ```{r} pasilla_tidy %>% filter(condition == "untreated") ``` We can use `select` to choose columns. ```{r} pasilla_tidy %>% select(.sample) ``` We can use `count` to count how many rows we have for each sample. ```{r} pasilla_tidy %>% count(.sample) ``` We can use `distinct` to see what distinct sample information we have. ```{r} pasilla_tidy %>% distinct(.sample, condition, type) ``` We could use `rename` to rename a column. For example, to modify the type column name. ```{r} pasilla_tidy %>% rename(sequencing=type) ``` We could use `mutate` to create a column. For example, we could create a new type column that contains single and paired instead of single_end and paired_end. ```{r} pasilla_tidy %>% mutate(type=gsub("_end", "", type)) ``` We could use `unite` to combine multiple columns into a single column. ```{r} pasilla_tidy %>% unite("group", c(condition, type)) ``` We can also combine commands with the tidyverse pipe `%>%`. For example, we could combine `group_by` and `summarise` to get the total counts for each sample. ```{r} pasilla_tidy %>% group_by(.sample) %>% summarise(total_counts=sum(counts)) ``` We could combine `group_by`, `mutate` and `filter` to get the transcripts with mean count > 0. ```{r} pasilla_tidy %>% group_by(.feature) %>% mutate(mean_count=mean(counts)) %>% filter(mean_count > 0) ``` # Plotting ```{r} my_theme <- list( scale_fill_brewer(palette="Set1"), scale_color_brewer(palette="Set1"), theme_bw() + theme( panel.border=element_blank(), axis.line=element_line(), panel.grid.major=element_line(size=0.2), panel.grid.minor=element_line(size=0.1), text=element_text(size=12), legend.position="bottom", aspect.ratio=1, strip.background=element_blank(), axis.title.x=element_text(margin=margin(t=10, r=10, b=10, l=10)), axis.title.y=element_text(margin=margin(t=10, r=10, b=10, l=10)) ) ) ``` We can treat `pasilla_tidy` as a normal tibble for plotting. Here we plot the distribution of counts per sample. ```{r plot1} pasilla_tidy %>% tidySummarizedExperiment::ggplot(aes(counts + 1, group=.sample, color=`type`)) + geom_density() + scale_x_log10() + my_theme ``` # Session Info ```{r} sessionInfo() ``` # References