Main Function: summarize_pathway_level
You can then access the main function provided by the package, summarize_pathway_level with the type of pooling operator desired to be applied for each molecular set. This function has several options for adjusting parameters and settings to suit specific research questions and data types. The available aggregation operators and other parameters options are described in detail in the package documentation. You can also see the documentation for this function with the command:
Let’s see two examples of usage, with a simulated gene expression matrix X
of dimensions g*s
(g
genes and s
samples), and a list of 100 gene sets pathways
. The expression values are random positive values sampled from a standard normal distribution. These examples mimic gene expression data and pathway gene sets, but funOmics
can be used to aggregate other types of omics data and molecular sets, such as metabolomics or proteomics. Go to section Real data: Where to find molecular sets?, to see where to find real molecular sets for different types of omics. Note that if your actual omics data is in SummarizedExperiment
format, the assay matrix can easily be retrieved as a matrix format (assay(SEobject)
).
Generate simulation data
# Example usage:
set.seed(1)
g <- 10000
s <- 20
X <- matrix(abs(rnorm(g * s)), nrow = g, dimnames = list(paste0("g", 1:g), paste0("s", 1:s)))
pathways <- as.list(sample(10:100, size = 100, replace = TRUE))
pathways <- lapply(pathways, function(s, g) paste0("g", sample(1:g, size = s, replace = FALSE)), g)
names(pathways) <- paste0("pathway", seq_along(pathways))
print(paste("Dimensions of omics matrix X:", dim(X)[1], "*", dim(X)[2], "; Length of molecular sets list:", length(pathways)))
## [1] "Dimensions of omics matrix X: 10000 * 20 ; Length of molecular sets list: 100"
## s1 s2 s3 s4 s5 s6 s7
## g1 0.6264538 0.8043316 0.2353485 0.6179223 0.2212571 0.5258908 0.3413341
## g2 0.1836433 1.0565257 0.2448250 0.8935057 0.3517935 0.4875444 0.4136665
## g3 0.8356286 1.0353958 0.6421869 0.4277562 0.1606019 1.1382508 0.1220357
## g4 1.5952808 1.1855604 1.9348085 0.2999012 0.1240523 1.2151344 1.5893806
## g5 0.3295078 0.5004395 1.0386957 0.5319833 0.6598739 0.4248307 0.7874385
## g6 0.8204684 0.5249887 0.2835501 1.7059816 0.5038493 1.4508403 1.5920640
## s8 s9 s10 s11 s12 s13 s14
## g1 1.00203611 1.55915937 0.09504307 0.7914415 0.0145724495 0.8663644 0.1604426
## g2 0.02590761 0.20166217 0.38805939 0.3921679 1.7854043337 0.9476952 0.9241849
## g3 0.44814178 1.04017610 2.13657003 0.4726670 0.0002997544 0.4522428 1.5561751
## g4 0.84323332 0.07195772 0.55661945 0.4579517 0.4356948690 0.2782408 0.8812202
## g5 0.21846310 0.01526544 0.59094164 0.1681319 1.4076452475 1.4175945 0.5263595
## g6 0.47678629 0.33938598 1.52014345 0.5856737 0.6929698698 0.6329981 0.4627372
## s15 s16 s17 s18 s19 s20
## g1 0.249371112 1.2520152 2.3150930 0.4414410 0.82485558 1.37086468
## g2 0.335346796 0.3351313 1.0603800 0.4130862 0.74402087 0.54569610
## g3 0.004405287 0.1080678 0.3970672 0.8660777 0.69009734 1.62446330
## g4 0.986768348 0.4717051 0.4840034 2.2615708 1.76900681 0.06247283
## g5 0.543575705 2.5070607 1.3584146 0.1787018 0.55215640 0.57021255
## g6 0.626142823 1.2451344 0.6370574 0.4713582 0.03257056 0.31350574
## $pathway1
## [1] "g6248" "g3566" "g2064" "g8249" "g8532" "g5047" "g4018" "g7193" "g2574"
## [10] "g3728" "g4493" "g4394" "g9017" "g2334" "g7652" "g9604" "g9035" "g4100"
## [19] "g408" "g9042" "g1229" "g2484" "g7442" "g7239" "g3059" "g811" "g509"
## [28] "g2182" "g8420" "g5947" "g1425" "g7218" "g7758" "g1236" "g6966" "g9457"
## [37] "g2668" "g9397" "g7394" "g2975" "g4304" "g9533" "g5287" "g5723" "g6432"
## [46] "g3826" "g3774" "g5906" "g6948" "g295" "g6675" "g2471" "g4504" "g5162"
## [55] "g3110" "g1901" "g7837" "g5189" "g4422" "g2063" "g7950" "g2301" "g1107"
## [64] "g856" "g6523" "g6524" "g1880" "g1675" "g7082" "g9914" "g2523" "g6892"
## [73] "g3631" "g8880" "g5560" "g6779" "g5713" "g5485" "g6659" "g6694" "g9592"
## [82] "g2434" "g3335" "g7725" "g3508" "g9399" "g7404" "g7160" "g9628" "g9345"
##
## $pathway2
## [1] "g1046" "g9504" "g6237" "g4026" "g2799" "g1764" "g8273" "g9165" "g6322"
## [10] "g9264" "g6996" "g3352" "g6635" "g2801" "g4064" "g2936" "g1823" "g2693"
## [19] "g3927" "g9267" "g3880" "g4245" "g2254" "g1570" "g6507" "g7158" "g7553"
## [28] "g292" "g7602" "g7571" "g2353" "g3741" "g5694" "g706" "g9877" "g6889"
## [37] "g4524" "g9776" "g4015" "g3747" "g6639" "g2807" "g2101" "g6546" "g8113"
## [46] "g847" "g9948" "g6219" "g4743" "g7975" "g466" "g869" "g5913" "g8362"
## [55] "g7665" "g838" "g8119" "g167" "g2999" "g2484" "g8541" "g7628" "g8437"
## [64] "g6932" "g3017" "g6458" "g8875" "g7429" "g6948" "g8641" "g2004" "g338"
## [73] "g9654" "g1781" "g5126" "g9549" "g469" "g5229" "g115" "g7608" "g2606"
## [82] "g3479" "g5674"
##
## $pathway3
## [1] "g9331" "g5832" "g5311" "g6300" "g4939" "g3028" "g8170" "g277" "g2162"
## [10] "g1887" "g1011" "g9797" "g6396" "g7073" "g3353" "g6757" "g3167" "g3991"
## [19] "g3457" "g7550" "g5707" "g5096" "g7510" "g1334" "g6667" "g7433" "g7005"
## [28] "g3474" "g6809" "g9081" "g6952" "g3525" "g6561" "g8581" "g4255" "g1702"
## [37] "g9013" "g8298" "g2499" "g2690" "g6702" "g6605" "g7020" "g1743" "g8261"
## [46] "g9191" "g6258" "g7717" "g4973" "g3082" "g9346" "g2522" "g6260" "g9108"
## [55] "g7869" "g6858" "g7192" "g9378"
##
## $pathway4
## [1] "g7398" "g6306" "g4012" "g6673" "g7713" "g2518" "g9017" "g3360" "g4784"
## [10] "g6259" "g9059" "g1799" "g9813" "g5004" "g4810" "g4174" "g2439" "g8029"
## [19] "g5730" "g7516" "g154" "g7419" "g19" "g4068" "g4232" "g9767" "g9163"
## [28] "g311" "g1916" "g3805" "g8445" "g3603" "g6501" "g153" "g2922" "g8097"
## [37] "g9985" "g9719" "g7827" "g1428" "g2116" "g2994" "g2918" "g8054" "g2103"
## [46] "g3184" "g5540" "g8584" "g8271" "g9014" "g960" "g1951" "g2607" "g8926"
## [55] "g7844" "g1999" "g1731" "g7758" "g5109" "g9245" "g6640" "g7569" "g8124"
## [64] "g394" "g7910" "g8973" "g366" "g2027" "g1070" "g8787" "g5606" "g5249"
## [73] "g7203" "g6650" "g1617"
##
## $pathway5
## [1] "g6454" "g2022" "g9627" "g8471" "g9677" "g3516" "g3262" "g4189" "g4912"
## [10] "g3286" "g2037" "g821" "g3957" "g2482" "g8656" "g9962" "g3550" "g5620"
## [19] "g2042" "g1207" "g8654" "g9313" "g8064" "g2393" "g757" "g1574" "g9199"
## [28] "g7132" "g9324" "g1657" "g8535" "g5200" "g4639" "g8986" "g6650" "g488"
## [37] "g6241" "g5407" "g1038" "g198" "g8992" "g9396" "g9200" "g4567" "g6191"
## [46] "g9970" "g4035" "g1762" "g815" "g1205" "g5063" "g4883" "g2993" "g3763"
## [55] "g1555" "g5455" "g8534" "g1242" "g3209" "g2687" "g3786" "g9385" "g6187"
## [64] "g3120" "g1013" "g7091" "g6423" "g1548" "g8316" "g4777" "g8392" "g2101"
## [73] "g9991" "g8915" "g3369" "g1075" "g3770" "g9501" "g7476" "g5178" "g9552"
## [82] "g1877" "g6608" "g1399" "g5689" "g4526" "g1043" "g5942" "g2524" "g5958"
## [91] "g8162"
##
## $pathway6
## [1] "g748" "g2887" "g2237" "g6300" "g562" "g1823" "g1476" "g3523" "g5496"
## [10] "g4240" "g7780" "g2982" "g3004" "g6110" "g2512" "g6411" "g7219" "g8294"
Example usage 1: apply summarize_pathway_level
to summarize data with mean pooling and a minimum size of sets
Now we can apply the function summarize_pathway_level
. In this example, pathway activity is summarized using the mean pooling aggregation for those sets containing at least 12 genes. Note that you can adjust the minsize and type of aggregation as desired.
##
## 100 functional sets read.
## iteration 100
## 95 successful functional aggregations over minsize
## 5 failed functional aggregations under minsize
print(paste("Pathway activity score matrix has dimensions:", nrow(pathway_activity), ",", ncol(pathway_activity)))
## [1] "Pathway activity score matrix has dimensions: 95 , 20"
From the original matrix X
with dimensions [g,s]
(in this example, 1000 genes and 20 samples), summarize_pathway_level
has generated a pathway-level activity score for each of the 20 samples, for 95 pathways having containing more than 12 genes. Let’s see how this matrix looks like:
## s1 s2 s3 s4 s5 s6 s7
## pathway1 0.8856707 0.7523236 0.8549943 0.7388130 0.7925286 0.8334914 0.7992642
## pathway2 0.8195863 0.8626590 0.8864375 0.7387156 0.7385007 0.7833466 0.7602959
## pathway3 0.7850385 0.7178379 0.8601687 0.7368610 0.7706303 0.7247713 0.8604497
## pathway4 0.9223525 0.6942613 0.8204792 0.8153523 0.7955479 0.9152442 0.7992134
## pathway5 0.9114610 0.7112581 0.8054646 0.8684732 0.8134853 0.7694264 0.6587641
## pathway6 0.6227958 0.8245890 0.8223787 0.7626611 0.8641242 0.9292614 0.6487087
## s8 s9 s10 s11 s12 s13 s14
## pathway1 0.8360021 0.6835664 0.7858110 0.7760340 0.7035882 0.7978041 0.7516856
## pathway2 0.8074307 0.6878013 0.7672425 0.7680346 0.8845340 0.7153691 0.8658347
## pathway3 0.7472542 0.7420240 0.7557788 0.8714399 0.6596007 0.7748371 0.7146510
## pathway4 0.8153612 0.8172311 0.7705315 0.8365182 0.7158393 0.7362310 0.9941277
## pathway5 0.8208152 0.8176990 0.7511963 0.8832247 0.7476202 0.7715769 0.8993376
## pathway6 0.9917968 0.6278624 0.8573334 0.6735154 0.4910625 0.7790914 0.8918141
## s15 s16 s17 s18 s19 s20
## pathway1 0.7707884 0.8048002 0.8139325 0.7562304 0.8257812 0.7988866
## pathway2 0.7450440 0.8073472 0.8396782 0.7724498 0.7732119 0.6771755
## pathway3 0.7427666 0.9419171 0.9125146 0.7504836 0.7127936 0.7270405
## pathway4 0.7121663 0.7625575 0.7675355 0.8329739 0.8193932 0.7584810
## pathway5 0.8388787 0.8025472 0.8337763 0.7940322 0.6808545 0.7773564
## pathway6 0.7292842 0.7034269 0.6653677 0.7941846 0.7447829 0.8574432
The resulting matrix of higher-level functional representations looks very similar to the original one, except that the original had many more features (1000 instead of 95).
In this example, 5 of the gene sets in pathways
has size < 12. You can check which pathways have been left out and how many genes they had by running the following command:
length_sets <- sapply(pathways, function(p) length(p))
short_sets <- names(length_sets[length_sets < min])
print(length_sets[short_sets])
## pathway20 pathway29 pathway56 pathway60 pathway100
## 10 11 11 11 11
You can also check which genes were found in these sets:
## $pathway20
## [1] "g7619" "g7228" "g4784" "g7604" "g2909" "g550" "g3498" "g9766" "g1379"
## [10] "g9024"
##
## $pathway29
## [1] "g9593" "g7338" "g9592" "g1214" "g5061" "g2451" "g2666" "g2299" "g836"
## [10] "g3948" "g4898"
##
## $pathway56
## [1] "g1402" "g2296" "g2731" "g5182" "g3993" "g7093" "g6375" "g3336" "g1762"
## [10] "g2398" "g1667"
##
## $pathway60
## [1] "g3429" "g5579" "g6744" "g4102" "g7061" "g5217" "g1025" "g2838" "g1540"
## [10] "g8468" "g6575"
##
## $pathway100
## [1] "g3438" "g443" "g7243" "g8206" "g2375" "g9002" "g625" "g5807" "g6973"
## [10] "g8004" "g1757"
Example usage 2: apply summarize_pathway_level
to summarize data with pca pooling and no minimum size of sets
Using the same matrix X
and gene sets pathways
, let’s generate aggregated representations using dimension-reduction derived scores from the PCA. The pca-aggregated activity scores values represent the projection of the overall expression of all genes in each pathway onto the first principal component.
##
## 100 functional sets read.
## iteration 100
## 100 successful functional aggregations over minsize
## 0 failed functional aggregations under minsize
print(paste("Pathway activity score matrix has dimensions:", nrow(pathway_activity), ",", ncol(pathway_activity)))
## [1] "Pathway activity score matrix has dimensions: 100 , 20"
Now from the original matrix X
with dimensions [g,s]
(1000 genes and 20 samples), summarize_pathway_level
has generated a pathway-level activity score for each of the 20 samples for all 100 pathways, since the size of the molecular sets was not restricted. Let’s see how this matrix looks like:
## s1 s2 s3 s4 s5 s6
## pathway1 -0.51596316 4.4081016 -3.6172441 0.9078345 -4.3632274 4.47148780
## pathway2 -6.35344708 4.4721898 -0.1072179 -5.6146386 -0.4361479 3.43595874
## pathway3 -3.69833166 1.8035359 0.9002908 -0.7440726 3.7495110 1.25453327
## pathway4 -5.71654009 1.1753201 2.3176206 1.9626837 -2.9970946 0.08046059
## pathway5 -2.45806955 -0.9199807 -6.6012194 3.9444841 -0.5009367 2.02176596
## pathway6 0.02525345 0.3790888 1.4559051 1.2151299 -4.6422525 -2.02724802
## s7 s8 s9 s10 s11 s12
## pathway1 0.1786907 2.68548927 0.6952866 0.2297664 4.0740206 1.1389743
## pathway2 2.0274377 -0.05016894 3.8767232 1.0640257 -1.7338074 2.4601009
## pathway3 2.1352637 -1.38294820 1.0078658 4.1406329 0.2845910 -1.6128825
## pathway4 2.8826719 -2.24804495 -2.5774802 5.7096576 2.4964751 2.6572455
## pathway5 -1.0535810 -1.78555013 -0.1863068 1.6203119 4.3788157 -0.1565894
## pathway6 -2.0024241 2.17669923 -0.3979514 -1.7318485 0.6773351 -0.8882183
## s13 s14 s15 s16 s17 s18
## pathway1 -0.2447407 -4.932566 2.6488582 0.5928979 -2.2855632 1.0699588
## pathway2 1.4911608 -1.204245 1.1324795 -3.3654006 -0.6572542 -1.8564556
## pathway3 -0.8852759 -4.948525 -1.5682527 4.3381664 1.0671926 -0.6944561
## pathway4 0.2437641 -2.979152 0.8372801 -1.5266307 2.8745670 -0.5905516
## pathway5 -2.0571340 6.301765 2.0743195 -3.5761567 2.6095770 0.6137905
## pathway6 -2.4032898 2.959492 2.1802264 -0.1962708 0.8000977 2.0549126
## s19 s20
## pathway1 -5.2955798 -1.8464827
## pathway2 -0.2952237 1.7139303
## pathway3 -2.6939543 -2.4528843
## pathway4 -2.7427071 -1.8595451
## pathway5 -1.9449648 -2.3243405
## pathway6 -0.2149931 0.5803559
Packages & Session information
The funOmics
package was developed for R version >= 4.0.3. However, BioConductor release 3.19 runs on R-4.4. See session information and loaded packages below:
## R Under development (unstable) (2024-03-18 r86148)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] funOmics_0.99.3 bigmemory_4.6.4 Biobase_2.63.0
## [4] BiocGenerics_0.49.1
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.9 utf8_1.2.4 generics_0.1.3
## [4] stringi_1.8.3 digest_0.6.35 magrittr_2.0.3
## [7] evaluate_0.23 grid_4.4.0 RColorBrewer_1.1-3
## [10] iterators_1.0.14 fastmap_1.1.1 R.oo_1.26.0
## [13] foreach_1.5.2 doParallel_1.0.17 plyr_1.8.9
## [16] jsonlite_1.8.8 princurve_2.1.6 bigmemory.sri_0.1.8
## [19] fansi_1.0.6 scales_1.3.0 codetools_0.2-19
## [22] jquerylib_0.1.4 cli_3.6.2 registry_0.5-1
## [25] rlang_1.1.3 R.methodsS3_1.8.2 munsell_0.5.0
## [28] cachem_1.0.8 yaml_2.3.8 tools_4.4.0
## [31] parallel_4.4.0 uuid_1.2-0 reshape2_1.4.4
## [34] dplyr_1.1.4 colorspace_2.1-0 ggplot2_3.5.0
## [37] rngtools_1.5.2 NMF_0.27 pathifier_1.41.0
## [40] vctrs_0.6.5 R6_2.5.1 lifecycle_1.0.4
## [43] stringr_1.5.1 cluster_2.1.6 pkgconfig_2.0.3
## [46] pillar_1.9.0 bslib_0.6.2 gtable_0.3.4
## [49] glue_1.7.0 Rcpp_1.0.12 xfun_0.42
## [52] tibble_3.2.1 tidyselect_1.2.1 knitr_1.45
## [55] htmltools_0.5.7 rmarkdown_2.26 compiler_4.4.0
## [58] gridBase_0.4-7