1 Introduction

Mitochondria are cellular organelles whose main role is related to the aerobic respiration to supply energy to the cell. In addition to this, they are also involved in other tasks, such as signaling, cellular differentiation, and cell death, as well as maintaining control of the cell cycle and cell growth. The dysfunction of mitochondria activity has been implicated in several human disorders and conditions. Studying mitochondrial behavior in health and disease conditions is opening new possibilities of understanding disease mechanisms and providing new treatments. Indeed, the possibility to target mitochondria for cancer therapies is increasingly becoming a reality in biomedical research.

The analysis of high resolution transcriptomic data can help in better understanding mitochondrial activity in relation to the gene expression dynamics. Even if a curated mitochondrial specific pathway resource exists (MitoCarta3.0, which provides a list of mitochondrial genes organized into mitochondrial pathways [1]) computational tools for mitochondrial-focused pathway analysis are still lacking. In the field of gene set analysis, general purpose resources are pathway databases, such as Reactome [5] and the Gene Ontology (GO) project [4], which aim at providing gene set categories for high-throughput data analysis for all the cellular contexts. In the vast majority of pathways contained in these public databases, the mitochondrial-specific component of cellular signaling constitutes only a small portion, which is often overshadowed by the non-mitochondrial signaling. And, even if the information on protein localization is present, pathway analysis typically does not account for whether the activated signal involves the mitochondrion or not.

Thus, we thought that a tool able to focus specifically on the mitochondrial part of pathways could allow tailored and more specific analysis on this interesting organelle and we started to develop mitology to provide actionable and fast mitochondrial phenodata interpretation of high-throughput data.

These vignete shows some ways to perform mitochondrial pathway analyses using mitology. Also, some visualization functions are implemented to visualize the scores. These can help in the result interpretations.

2 Installation

To install this package:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("mitology")

3 Mitochondrial Genes

A list of mitochondrial genes was obtained by joining four public resources that were developed in the last years to define the mitochondrial proteome with the aim to study mitochondrial functions, dysfunctions and diseases. Specifically, we collected the genes from: MitoCarta3.0 [1], the Integrated Mitochondrial Protein Index (IMPI) [2], the Mitochondrial Disease Sequence Data Resource (MSeqDR) [3] and The Gene Ontology (GO) [4].

In the MitoGenesDB table we can see the full list of genes along with the database from which it was retrieved.

library(mitology)
## 
## 
data(MitoGenesDB)
head(MitoGenesDB)
##           ENSEMBL SYMBOL        DB
## 1 ENSG00000000003 TSPAN6      IMPI
## 2 ENSG00000000419   DPM1      MSDR
## 3 ENSG00000000938    FGR      GOCC
## 4 ENSG00000001084   GCLC IMPI,GOCC
## 5 ENSG00000001460  STPG1      GOCC
## 6 ENSG00000001626   CFTR      IMPI

4 Gene sets collected in mitology

4.1 Gene sets from MitoCarta3.0

The whole and original structure from the MitoCarta3.0 database was kept and included in mitology organized in a three-tier hierarchy.

In addition to MitoCarta3.0, we decided to propose three more comprehensive mitochondrial-oriented gene set resources exploiting Reactome, GO Biological Processes (GO-BP), and GO Cellular Components (GO-CC).

4.2 Gene sets from GO

We explored the GO-CC and GO-BP terms. To filter the GO terms we performed a four-steps selection. Firstly, terms were filtered by size, keeping all the ones with at least 10 mitochondrial genes. Then, we pruned the GO trees of CC and BP by the graph levels, filtering out the 0°, 1°, 2° and 3° levels and keeping the terms from the 4° level. Then, we tested the enrichment of the remaining sets for our mitochondrial gene list with the enrichGO function from the clusterProfiler R package (v4.14.3). Only sets with FDR lower than 0.05 have been passed to the next step. Finally, the last selection was topological, we exploited the GO hierarchical organization selecting the more general enriched gene sets filtering out their offspring terms.

4.3 Gene sets from Reactome

The Reactome pathways were retrieved with the graphite R package (v1.52.0). Reactome has a simpler structure with far less level of nested pathways, thus we applied only three filtering steps: number of mitochondrial genes over 10; FDR under 0.05 (enrichment computed with the phyper function) and filtering of the offspring pathways.

Once we obtained the final terms and pathways respectively from GO and Reactome, we extracted the mitochondrial gene sets by keeping only the genes included in the mitochondrial list. The same names of the original terms/pathways were kept for the gene sets included in mitology. The final gene sets and the corresponding tree-structures of the four databases were kept and included in the mitology package.

5 How to use mitology

5.1 Access mitochondrial gene sets

The getGeneSets function allows to get the mitochondrial gene sets. It returns them for one of the four possible databases (MitoCarta, Reactome, GO-CC and GO-BP) by the database argument. The nametype argument says the type of gene name ID, either one of SYMBOL, ENTREZID or ENSEMBL. The objectType argument can be set to return the gene sets in form of a list or a data frame.

MC_df <- getGeneSets(
    database = "MitoCarta", nametype = "SYMBOL", objectType = "dataframe")

MC_list <- getGeneSets(
    database = "MitoCarta", nametype = "SYMBOL", objectType = "list")

5.2 Enrichment analyses

In the following section, we use an example bulk expression dataset of ovarian cancer to show how to use mitology to perform an analysis of the mitochondrial activity.

# loading packages
library(SummarizedExperiment)
library(AnnotationDbi)
library(org.Hs.eg.db)
library(GSVA)
library(Biobase)
# load data
data(ovse)
ovse
## class: SummarizedExperiment 
## dim: 2388 40 
## metadata(0):
## assays(1): norm_expr
## rownames(2388): ABCA4 ABCB10 ... USP9X WDR45
## rowData names(2): PROvsIMR_logFC PROvsIMR_FDR
## colnames(40): sample1 sample2 ... sample39 sample40
## colData names(1): OV_subtype

5.2.1 Enrichment analyses of the mitochondrial gene sets

We can perform an over representation analysis with enrichMito.

genes <- rownames(ovse)[elementMetadata(ovse)$PROvsIMR_FDR < 0.01]
genes <- mapIds(
    org.Hs.eg.db, keys = genes, column = "ENSEMBL",
    keytype = "SYMBOL", multiVals = "first")
## 'select()' returned 1:many mapping between keys and columns
enrichresMC <- enrichMito(genes = genes, database = "MitoCarta")
enrichresRT <- enrichMito(genes = genes, database = "Reactome")
## 'select()' returned 1:many mapping between keys and columns

We can also compute the gene set enrichment analysis with gseaMito.

geneslFC <- elementMetadata(ovse)$PROvsIMR_logFC
names(geneslFC) <- rownames(ovse)
names(geneslFC) <- mapIds(
    org.Hs.eg.db, keys = names(geneslFC), column = "ENSEMBL",
    keytype = "SYMBOL", multiVals = "first")
## 'select()' returned 1:many mapping between keys and columns
geneslFC <- sort(geneslFC, decreasing = TRUE)
geneslFC <- geneslFC[!is.na(names(geneslFC))]

gsearesMC <- gseaMito(genes = geneslFC, database = "MitoCarta")
## using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
## preparing geneSet collections...
## GSEA analysis...
## no term enriched under specific pvalueCutoff...
gsearesRT <- gseaMito(genes = geneslFC, database = "Reactome")
## using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
## preparing geneSet collections...
## GSEA analysis...
## no term enriched under specific pvalueCutoff...
## 'select()' returned 1:many mapping between keys and columns
## using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
## preparing geneSet collections...
## GSEA analysis...
## no term enriched under specific pvalueCutoff...

Or also a single sample GSEA (ssGSEA) with GSVA.

gsvaPar <- ssgseaParam(exprData = ovse, geneSets = MC_list)
## ℹ No assay name provided; using default assay 'norm_expr'
res_ssGSEA <- gsva(gsvaPar)
## ℹ GSVA version 2.3.4
## ! 5 genes with constant values throughout the samples
## ! Some gene sets have size one. Consider setting minSize > 1
## ℹ Calculating  ssGSEA scores for 119 gene sets
## ℹ Calculating ranks
## ℹ Calculating rank weights
## ℹ Normalizing ssGSEA scores
## ✔ Calculations finished

5.3 Visualization

The results obtained with the over representation analysis can be visualized with a dot plot over the database tree hierarchy.

mitoTreePoint(data = enrichresMC, database = "MitoCarta", pvalCutoff = .9, color = "pvalue")
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Warning: Unknown or uninitialised column: `subgroup`.
## Unknown or uninitialised column: `subgroup`.
## Unknown or uninitialised column: `subgroup`.
## Unknown or uninitialised column: `subgroup`.
## Warning: Removed 117 rows containing missing values or values outside the scale range
## (`geom_interactive_point_g_gtree()`).
## Warning: Removed 149 rows containing missing values or values outside the scale range
## (`geom_text()`).

mitoTreePoint(data = enrichresRT, database = "Reactome", pvalCutoff = .4, color = "pvalue")
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Warning: Unknown or uninitialised column: `subgroup`.
## Warning: Unknown or uninitialised column: `subgroup`.
## Unknown or uninitialised column: `subgroup`.
## Unknown or uninitialised column: `subgroup`.
## Warning: Removed 85 rows containing missing values or values outside the scale range
## (`geom_interactive_point_g_gtree()`).
## Warning: Removed 66 rows containing missing values or values outside the scale range
## (`geom_interactive_point_g_gtree()`).
## Warning: Removed 174 rows containing missing values or values outside the scale range
## (`geom_text()`).

Instead, when we obtain a matrix of scores for each sample in each gene set we can visualize it in an heatmap.

Since we obtained a single score for each cell and it would be difficult to visualize it, we can summarize the information of samples by taking the mean value for each OV subtype.

res_ssGSEA_subtype <- do.call(
    cbind, lapply(unique(ovse$OV_subtype), function(x){
        rowMeans(assay(res_ssGSEA)[,ovse$OV_subtype==x])
    }))
colnames(res_ssGSEA_subtype) <- unique(ovse$OV_subtype)
rownames(res_ssGSEA_subtype) <- rownames(res_ssGSEA)
res_ssGSEA_subtype <- t(scale(t(res_ssGSEA_subtype)))
mitoHeatmap(data = res_ssGSEA_subtype, database = "MitoCarta")

mitoHeatmap(data = res_ssGSEA_subtype, database = "MitoCarta", splitSections = TRUE)

We can also visualize it by plotting a circular heatmap on the gene set tree hierarchy.

mitoTreeHeatmap(
    data = res_ssGSEA_subtype, database = "MitoCarta",
    labelNames = "leaves", font.size = 1)
## Warning: Unknown or uninitialised column: `subgroup`.
## Unknown or uninitialised column: `subgroup`.
## Unknown or uninitialised column: `subgroup`.
## Unknown or uninitialised column: `subgroup`.

It is possible to plot only the main section arguments instead of the single gene set names.

mitoTreeHeatmap(
    data = res_ssGSEA_subtype, database = "MitoCarta",
    labelNames = "sections", font.size = 3)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the ggtree package.
##   Please report the issue at <https://github.com/YuLab-SMU/ggtree/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Unknown or uninitialised column: `subgroup`.
## Unknown or uninitialised column: `subgroup`.
## Unknown or uninitialised column: `subgroup`.
## Unknown or uninitialised column: `subgroup`.

These analysis can be done with the mitochondrial gene sets from Reactome, GO-CC and GO-BP.

6 Bibliography

[1] Rath S, Sharma R, Gupta R, et al. MitoCarta3.0: an updated mitochondrial proteome now with sub-organelle localization and pathway annotations. Nucleic Acids Res 2020; 49:D1541–D1547

[2] Peck P. IMPI. 2021

[3] Shen L, Diroma MA, Gonzalez M, et al. MSeqDR: A Centralized Knowledge Repository and Bioinformatics Web Resource to Facilitate Genomic Investigations in Mitochondrial Disease. Hum Mutat 2016; 37:540–548

[4] The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res 2019; 47:D330–D338

[5] Rothfels K, Milacic M, Matthews L, et al. Using the Reactome Database. Curr Protoc 2023; 3:e722

7 Session Info

Here is the output of sessionInfo() on the system on which this document was compiled.

sessionInfo()
## R version 4.5.1 Patched (2025-08-23 r88802)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] GSVA_2.3.4                  org.Hs.eg.db_3.22.0        
##  [3] AnnotationDbi_1.71.2        SummarizedExperiment_1.39.2
##  [5] Biobase_2.69.1              GenomicRanges_1.61.6       
##  [7] Seqinfo_0.99.3              IRanges_2.43.5             
##  [9] S4Vectors_0.47.4            BiocGenerics_0.55.4        
## [11] generics_0.1.4              MatrixGenerics_1.21.0      
## [13] matrixStats_1.5.0           mitology_1.1.3             
## [15] BiocStyle_2.37.1           
## 
## loaded via a namespace (and not attached):
##   [1] splines_4.5.1               ggplotify_0.1.3            
##   [3] tibble_3.3.0                R.oo_1.27.1                
##   [5] polyclip_1.10-7             graph_1.87.0               
##   [7] XML_3.99-0.19               lifecycle_1.0.4            
##   [9] doParallel_1.0.17           lattice_0.22-7             
##  [11] MASS_7.3-65                 magrittr_2.0.4             
##  [13] sass_0.4.10                 rmarkdown_2.30             
##  [15] jquerylib_0.1.4             yaml_2.3.10                
##  [17] ggtangle_0.0.7              cowplot_1.2.0              
##  [19] DBI_1.2.3                   RColorBrewer_1.1-3         
##  [21] abind_1.4-8                 purrr_1.1.0                
##  [23] R.utils_2.13.0              ggraph_2.2.2               
##  [25] yulab.utils_0.2.1           tweenr_2.0.3               
##  [27] rappdirs_0.3.3              gdtools_0.4.4              
##  [29] circlize_0.4.16             enrichplot_1.29.4          
##  [31] ggrepel_0.9.6               irlba_2.3.5.1              
##  [33] tidytree_0.4.6              reactome.db_1.94.0         
##  [35] annotate_1.87.0             codetools_0.2-20           
##  [37] DelayedArray_0.35.3         DOSE_4.3.0                 
##  [39] ggforce_0.5.0               tidyselect_1.2.1           
##  [41] shape_1.4.6.1               aplot_0.2.9                
##  [43] farver_2.1.2                ScaledMatrix_1.17.0        
##  [45] viridis_0.6.5               jsonlite_2.0.0             
##  [47] GetoptLong_1.0.5            tidygraph_1.3.1            
##  [49] iterators_1.0.14            systemfonts_1.3.1          
##  [51] foreach_1.5.2               tools_4.5.1                
##  [53] treeio_1.33.0               Rcpp_1.1.0                 
##  [55] glue_1.8.0                  gridExtra_2.3              
##  [57] SparseArray_1.9.1           xfun_0.53                  
##  [59] qvalue_2.41.0               dplyr_1.1.4                
##  [61] HDF5Array_1.37.0            withr_3.0.2                
##  [63] BiocManager_1.30.26         fastmap_1.2.0              
##  [65] rhdf5filters_1.21.4         digest_0.6.37              
##  [67] rsvd_1.0.5                  R6_2.6.1                   
##  [69] gridGraphics_0.5-1          colorspace_2.1-2           
##  [71] Cairo_1.6-5                 GO.db_3.22.0               
##  [73] dichromat_2.0-0.1           RSQLite_2.4.3              
##  [75] R.methodsS3_1.8.2           h5mread_1.1.1              
##  [77] tidyr_1.3.1                 fontLiberation_0.1.0       
##  [79] data.table_1.17.8           graphlayouts_1.2.2         
##  [81] httr_1.4.7                  htmlwidgets_1.6.4          
##  [83] S4Arrays_1.9.1              graphite_1.55.3            
##  [85] pkgconfig_2.0.3             gtable_0.3.6               
##  [87] blob_1.2.4                  ComplexHeatmap_2.25.2      
##  [89] S7_0.2.0                    SingleCellExperiment_1.31.1
##  [91] XVector_0.49.1              clusterProfiler_4.17.0     
##  [93] htmltools_0.5.8.1           fontBitstreamVera_0.1.1    
##  [95] bookdown_0.45               fgsea_1.35.8               
##  [97] GSEABase_1.71.1             clue_0.3-66                
##  [99] scales_1.4.0                png_0.1-8                  
## [101] SpatialExperiment_1.19.1    ggfun_0.2.0                
## [103] knitr_1.50                  reshape2_1.4.4             
## [105] rjson_0.2.23                nlme_3.1-168               
## [107] cachem_1.1.0                rhdf5_2.53.6               
## [109] GlobalOptions_0.1.2         stringr_1.5.2              
## [111] parallel_4.5.1              ReactomePA_1.53.0          
## [113] pillar_1.11.1               grid_4.5.1                 
## [115] vctrs_0.6.5                 BiocSingular_1.25.0        
## [117] beachmat_2.25.5             xtable_1.8-4               
## [119] cluster_2.1.8.1             evaluate_1.0.5             
## [121] tinytex_0.57                magick_2.9.0               
## [123] cli_3.6.5                   compiler_4.5.1             
## [125] rlang_1.1.6                 crayon_1.5.3               
## [127] labeling_0.4.3              plyr_1.8.9                 
## [129] fs_1.6.6                    ggiraph_0.9.2              
## [131] stringi_1.8.7               viridisLite_0.4.2          
## [133] BiocParallel_1.43.4         Biostrings_2.77.2          
## [135] lazyeval_0.2.2              GOSemSim_2.35.2            
## [137] fontquiver_0.2.1            Matrix_1.7-4               
## [139] patchwork_1.3.2             sparseMatrixStats_1.21.0   
## [141] bit64_4.6.0-1               ggplot2_4.0.0              
## [143] Rhdf5lib_1.31.1             KEGGREST_1.49.2            
## [145] igraph_2.2.0                memoise_2.0.1              
## [147] bslib_0.9.0                 ggtree_3.99.2              
## [149] fastmatch_1.1-6             bit_4.6.0                  
## [151] ape_5.8-1                   gson_0.1.0