Brings SummarizedExperiment to the tidyverse!
website: stemangiola.github.io/tidySummarizedExperiment/
Please also have a look at
tidySummarizedExperiment provides a bridge between Bioconductor SummarizedExperiment [@morgan2020summarized] and the tidyverse [@wickham2019welcome]. It creates an invisible layer that enables viewing the Bioconductor SummarizedExperiment object as a tidyverse tibble, and provides SummarizedExperiment-compatible dplyr, tidyr, ggplot and plotly functions. This allows users to get the best of both Bioconductor and tidyverse worlds.
SummarizedExperiment-compatible Functions | Description |
---|---|
all |
After all tidySummarizedExperiment is a SummarizedExperiment object, just better |
tidyverse Packages | Description |
---|---|
dplyr |
Almost all dplyr APIs like for any tibble |
tidyr |
Almost all tidyr APIs like for any tibble |
ggplot2 |
ggplot like for any tibble |
plotly |
plot_ly like for any tibble |
Utilities | Description |
---|---|
as_tibble |
Convert cell-wise information to a tbl_df |
if (!requireNamespace("BiocManager", quietly=TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("tidySummarizedExperiment")
From Github (development)
devtools::install_github("stemangiola/tidySummarizedExperiment")
Load libraries used in the examples.
library(ggplot2)
library(tidySummarizedExperiment)
tidySummarizedExperiment
, the best of both worlds!This is a SummarizedExperiment object but it is evaluated as a tibble. So it is fully compatible both with SummarizedExperiment and tidyverse APIs.
pasilla_tidy <- tidySummarizedExperiment::pasilla
It looks like a tibble
pasilla_tidy
## # A SummarizedExperiment-tibble abstraction: 102,193 × 7
## # [90mFeatures=14599 | Samples=7 | Assays=counts[0m
## .feature .sample counts condition type
## <chr> <chr> <int> <chr> <chr>
## 1 FBgn0000003 untrt1 0 untreated single_end
## 2 FBgn0000008 untrt1 92 untreated single_end
## 3 FBgn0000014 untrt1 5 untreated single_end
## 4 FBgn0000015 untrt1 0 untreated single_end
## 5 FBgn0000017 untrt1 4664 untreated single_end
## 6 FBgn0000018 untrt1 583 untreated single_end
## 7 FBgn0000022 untrt1 0 untreated single_end
## 8 FBgn0000024 untrt1 10 untreated single_end
## 9 FBgn0000028 untrt1 0 untreated single_end
## 10 FBgn0000032 untrt1 1446 untreated single_end
## # … with 40 more rows
But it is a SummarizedExperiment object after all
assays(pasilla_tidy)
## List of length 1
## names(1): counts
We can use tidyverse commands to explore the tidy SummarizedExperiment object.
We can use slice
to choose rows by position, for example to choose the first row.
pasilla_tidy %>%
slice(1)
## # A SummarizedExperiment-tibble abstraction: 1 × 1
## # [90mFeatures=1 | Samples=1 | Assays=counts[0m
## .feature .sample counts condition type
## <chr> <chr> <int> <chr> <chr>
## 1 FBgn0000003 untrt1 0 untreated single_end
We can use filter
to choose rows by criteria.
pasilla_tidy %>%
filter(condition == "untreated")
## # A SummarizedExperiment-tibble abstraction: 58,396 × 4
## # [90mFeatures=14599 | Samples=4 | Assays=counts[0m
## .feature .sample counts condition type
## <chr> <chr> <int> <chr> <chr>
## 1 FBgn0000003 untrt1 0 untreated single_end
## 2 FBgn0000008 untrt1 92 untreated single_end
## 3 FBgn0000014 untrt1 5 untreated single_end
## 4 FBgn0000015 untrt1 0 untreated single_end
## 5 FBgn0000017 untrt1 4664 untreated single_end
## 6 FBgn0000018 untrt1 583 untreated single_end
## 7 FBgn0000022 untrt1 0 untreated single_end
## 8 FBgn0000024 untrt1 10 untreated single_end
## 9 FBgn0000028 untrt1 0 untreated single_end
## 10 FBgn0000032 untrt1 1446 untreated single_end
## # … with 40 more rows
We can use select
to choose columns.
pasilla_tidy %>%
select(.sample)
## # A tibble: 102,193 × 1
## .sample
## <chr>
## 1 untrt1
## 2 untrt1
## 3 untrt1
## 4 untrt1
## 5 untrt1
## 6 untrt1
## 7 untrt1
## 8 untrt1
## 9 untrt1
## 10 untrt1
## # … with 102,183 more rows
We can use count
to count how many rows we have for each sample.
pasilla_tidy %>%
count(.sample)
## # A tibble: 7 × 2
## .sample n
## <chr> <int>
## 1 trt1 14599
## 2 trt2 14599
## 3 trt3 14599
## 4 untrt1 14599
## 5 untrt2 14599
## 6 untrt3 14599
## 7 untrt4 14599
We can use distinct
to see what distinct sample information we have.
pasilla_tidy %>%
distinct(.sample, condition, type)
## # A tibble: 7 × 3
## .sample condition type
## <chr> <chr> <chr>
## 1 untrt1 untreated single_end
## 2 untrt2 untreated single_end
## 3 untrt3 untreated paired_end
## 4 untrt4 untreated paired_end
## 5 trt1 treated single_end
## 6 trt2 treated paired_end
## 7 trt3 treated paired_end
We could use rename
to rename a column. For example, to modify the type column name.
pasilla_tidy %>%
rename(sequencing=type)
## # A SummarizedExperiment-tibble abstraction: 102,193 × 7
## # [90mFeatures=14599 | Samples=7 | Assays=counts[0m
## .feature .sample counts condition sequencing
## <chr> <chr> <int> <chr> <chr>
## 1 FBgn0000003 untrt1 0 untreated single_end
## 2 FBgn0000008 untrt1 92 untreated single_end
## 3 FBgn0000014 untrt1 5 untreated single_end
## 4 FBgn0000015 untrt1 0 untreated single_end
## 5 FBgn0000017 untrt1 4664 untreated single_end
## 6 FBgn0000018 untrt1 583 untreated single_end
## 7 FBgn0000022 untrt1 0 untreated single_end
## 8 FBgn0000024 untrt1 10 untreated single_end
## 9 FBgn0000028 untrt1 0 untreated single_end
## 10 FBgn0000032 untrt1 1446 untreated single_end
## # … with 40 more rows
We could use mutate
to create a column. For example, we could create a new type column that contains single
and paired instead of single_end and paired_end.
pasilla_tidy %>%
mutate(type=gsub("_end", "", type))
## # A SummarizedExperiment-tibble abstraction: 102,193 × 7
## # [90mFeatures=14599 | Samples=7 | Assays=counts[0m
## .feature .sample counts condition type
## <chr> <chr> <int> <chr> <chr>
## 1 FBgn0000003 untrt1 0 untreated single
## 2 FBgn0000008 untrt1 92 untreated single
## 3 FBgn0000014 untrt1 5 untreated single
## 4 FBgn0000015 untrt1 0 untreated single
## 5 FBgn0000017 untrt1 4664 untreated single
## 6 FBgn0000018 untrt1 583 untreated single
## 7 FBgn0000022 untrt1 0 untreated single
## 8 FBgn0000024 untrt1 10 untreated single
## 9 FBgn0000028 untrt1 0 untreated single
## 10 FBgn0000032 untrt1 1446 untreated single
## # … with 40 more rows
We could use unite
to combine multiple columns into a single column.
pasilla_tidy %>%
unite("group", c(condition, type))
## # A SummarizedExperiment-tibble abstraction: 102,193 × 7
## # [90mFeatures=14599 | Samples=7 | Assays=counts[0m
## .feature .sample counts group
## <chr> <chr> <int> <chr>
## 1 FBgn0000003 untrt1 0 untreated_single_end
## 2 FBgn0000008 untrt1 92 untreated_single_end
## 3 FBgn0000014 untrt1 5 untreated_single_end
## 4 FBgn0000015 untrt1 0 untreated_single_end
## 5 FBgn0000017 untrt1 4664 untreated_single_end
## 6 FBgn0000018 untrt1 583 untreated_single_end
## 7 FBgn0000022 untrt1 0 untreated_single_end
## 8 FBgn0000024 untrt1 10 untreated_single_end
## 9 FBgn0000028 untrt1 0 untreated_single_end
## 10 FBgn0000032 untrt1 1446 untreated_single_end
## # … with 40 more rows
We can also combine commands with the tidyverse pipe %>%
.
For example, we could combine group_by
and summarise
to get the total counts for each sample.
pasilla_tidy %>%
group_by(.sample) %>%
summarise(total_counts=sum(counts))
## # A tibble: 7 × 2
## .sample total_counts
## <chr> <int>
## 1 trt1 18670279
## 2 trt2 9571826
## 3 trt3 10343856
## 4 untrt1 13972512
## 5 untrt2 21911438
## 6 untrt3 8358426
## 7 untrt4 9841335
We could combine group_by
, mutate
and filter
to get the transcripts with mean count > 0.
pasilla_tidy %>%
group_by(.feature) %>%
mutate(mean_count=mean(counts)) %>%
filter(mean_count > 0)
## # A tibble: 86,513 × 6
## # Groups: .feature [12,359]
## .feature .sample counts condition type mean_count
## <chr> <chr> <int> <chr> <chr> <dbl>
## 1 FBgn0000003 untrt1 0 untreated single_end 0.143
## 2 FBgn0000008 untrt1 92 untreated single_end 99.6
## 3 FBgn0000014 untrt1 5 untreated single_end 1.43
## 4 FBgn0000015 untrt1 0 untreated single_end 0.857
## 5 FBgn0000017 untrt1 4664 untreated single_end 4672.
## 6 FBgn0000018 untrt1 583 untreated single_end 461.
## 7 FBgn0000022 untrt1 0 untreated single_end 0.143
## 8 FBgn0000024 untrt1 10 untreated single_end 7
## 9 FBgn0000028 untrt1 0 untreated single_end 0.429
## 10 FBgn0000032 untrt1 1446 untreated single_end 1085.
## # … with 86,503 more rows
my_theme <-
list(
scale_fill_brewer(palette="Set1"),
scale_color_brewer(palette="Set1"),
theme_bw() +
theme(
panel.border=element_blank(),
axis.line=element_line(),
panel.grid.major=element_line(size=0.2),
panel.grid.minor=element_line(size=0.1),
text=element_text(size=12),
legend.position="bottom",
aspect.ratio=1,
strip.background=element_blank(),
axis.title.x=element_text(margin=margin(t=10, r=10, b=10, l=10)),
axis.title.y=element_text(margin=margin(t=10, r=10, b=10, l=10))
)
)
We can treat pasilla_tidy
as a normal tibble for plotting.
Here we plot the distribution of counts per sample.
pasilla_tidy %>%
tidySummarizedExperiment::ggplot(aes(counts + 1, group=.sample, color=`type`)) +
geom_density() +
scale_x_log10() +
my_theme
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] tidySummarizedExperiment_1.8.1 SummarizedExperiment_1.28.0
## [3] Biobase_2.58.0 GenomicRanges_1.50.2
## [5] GenomeInfoDb_1.34.9 IRanges_2.32.0
## [7] S4Vectors_0.36.2 BiocGenerics_0.44.0
## [9] MatrixGenerics_1.10.0 matrixStats_0.63.0
## [11] ggplot2_3.4.1 knitr_1.42
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.0 xfun_0.37 purrr_1.0.1
## [4] lattice_0.20-45 colorspace_2.1-0 vctrs_0.5.2
## [7] generics_0.1.3 htmltools_0.5.4 viridisLite_0.4.1
## [10] utf8_1.2.3 plotly_4.10.1 rlang_1.1.0
## [13] pillar_1.8.1 glue_1.6.2 withr_2.5.0
## [16] RColorBrewer_1.1-3 GenomeInfoDbData_1.2.9 lifecycle_1.0.3
## [19] stringr_1.5.0 zlibbioc_1.44.0 munsell_0.5.0
## [22] gtable_0.3.1 htmlwidgets_1.6.1 evaluate_0.20
## [25] labeling_0.4.2 fastmap_1.1.1 fansi_1.0.4
## [28] highr_0.10 scales_1.2.1 DelayedArray_0.24.0
## [31] jsonlite_1.8.4 XVector_0.38.0 farver_2.1.1
## [34] digest_0.6.31 stringi_1.7.12 dplyr_1.1.0
## [37] grid_4.2.2 cli_3.6.0 tools_4.2.2
## [40] bitops_1.0-7 magrittr_2.0.3 RCurl_1.98-1.10
## [43] lazyeval_0.2.2 tibble_3.2.0 tidyr_1.3.0
## [46] pkgconfig_2.0.3 ellipsis_0.3.2 Matrix_1.5-3
## [49] data.table_1.14.8 httr_1.4.5 R6_2.5.1
## [52] compiler_4.2.2