• CONTINUOUS VARIABLE ENRICHMENT
  • SESSION INFO

There are four testing scenarios depending on the type format of the query set and database sets. They are shown with the respective testing scenario in the table below. testEnrichment, testEnrichmentSEA are for Fisher’s exact test and Set Enrichment Analysis respectively.

Four knowYourCG Testing Scenarios
Continuous Database Set Discrete Database Set
Continuous Query Correlation-based Set Enrichment Analysis
Discrete Query Set Enrichment Analysis Fisher’s Exact Test

CONTINUOUS VARIABLE ENRICHMENT

The query may be a named continuous vector. In that case, either a gene enrichment score will be calculated (if the database is discrete) or a Spearman correlation will be calculated (if the database is continuous as well). The three other cases are shown below using biologically relevant examples.

To display this functionality, let’s load two numeric database sets individually. One is a database set for CpG density and the other is a database set corresponding to the distance of the nearest transcriptional start site (TSS) to each probe.

library(knowYourCG)
query <- getDBs("KYCG.MM285.designGroup")[["TSS"]]
sesameDataCache(data_titles = c("KYCG.MM285.seqContextN.20210630"))
res <- testEnrichmentSEA(query, "MM285.seqContextN")
main_stats <- c("dbname", "test", "estimate", "FDR", "nQ", "nD", "overlap")
res[,main_stats]
ABCDEFGHIJ0123456789
 
 
dbname
<chr>
test
<chr>
estimate
<dbl>
FDR
<dbl>
nQ
<int>
nD
<int>
overlap
<int>
2distToTSSSet Enrichment Score0.74865010.000000006923630342169236
1CpGDesity50Set Enrichment Score-0.26263350.017831626923629741569236

The estimate here is enrichment score.

NOTE: Negative enrichment score suggests enrichment of the categorical database with the higher values (in the numerical database). Positive enrichment score represent enrichment with the smaller values. As expected, the designed TSS CpGs are significantly enriched in smaller TSS distance and higher CpG density.

Alternatively one can test the enrichment of a continuous query with discrete databases. Here we will use the methylation level from a sample as the query and test it against the chromHMM chromatin states.

library(sesame)
sesameDataCache(data_titles = c("MM285.1.SigDF"))
beta_values <- getBetas(sesameDataGet("MM285.1.SigDF"))
res <- testEnrichmentSEA(beta_values, "MM285.chromHMM")
main_stats <- c("dbname", "test", "estimate", "FDR", "nQ", "nD", "overlap")
res[,main_stats] 
ABCDEFGHIJ0123456789
 
 
dbname
<chr>
test
<chr>
estimate
<dbl>
FDR
<dbl>
nQ
<int>
nD
<int>
overlap
<int>
14TssSet Enrichment Score0.80100370.000000e+004167529607041672
15TssBivSet Enrichment Score0.66098160.000000e+001227829607012278
10Quies4Set Enrichment Score0.34077880.000000e+0067512960706751
1EnhSet Enrichment Score0.32775620.000000e+0082692960708269
5EnhPrSet Enrichment Score0.29304470.000000e+0059122960705912
16TssFlnkSet Enrichment Score0.28733900.000000e+0094622960709461
12ReprPCSet Enrichment Score0.23658040.000000e+0088582960708858
3EnhLoSet Enrichment Score0.18986124.798845e-9418082960701808
6HetSet Enrichment Score-0.17488403.230284e-0235752960703575
17TxSet Enrichment Score-0.41113453.588231e-021780129607017801

As expected, chromatin states Tss, Enh has negative enrichment score, meaning these databases are associated with small values of the query (DNA methylation level). On the contrary, Het and Quies states are associated with high methylation level.

SESSION INFO

sessionInfo()
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] sesame_1.25.3               knitr_1.49                 
##  [3] gprofiler2_0.2.3            SummarizedExperiment_1.37.0
##  [5] Biobase_2.67.0              GenomicRanges_1.59.1       
##  [7] GenomeInfoDb_1.43.2         IRanges_2.41.2             
##  [9] S4Vectors_0.45.2            MatrixGenerics_1.19.0      
## [11] matrixStats_1.5.0           sesameData_1.25.0          
## [13] ExperimentHub_2.15.0        AnnotationHub_3.15.0       
## [15] BiocFileCache_2.15.0        dbplyr_2.5.0               
## [17] BiocGenerics_0.53.3         generics_0.1.3             
## [19] knowYourCG_1.3.15          
## 
## loaded via a namespace (and not attached):
##  [1] DBI_1.2.3               bitops_1.0-9            rlang_1.1.4            
##  [4] magrittr_2.0.3          compiler_4.5.0          RSQLite_2.3.9          
##  [7] png_0.1-8               vctrs_0.6.5             reshape2_1.4.4         
## [10] stringr_1.5.1           pkgconfig_2.0.3         crayon_1.5.3           
## [13] fastmap_1.2.0           XVector_0.47.2          fontawesome_0.5.3      
## [16] rmarkdown_2.29          tzdb_0.4.0              UCSC.utils_1.3.0       
## [19] preprocessCore_1.69.0   purrr_1.0.2             bit_4.5.0.1            
## [22] xfun_0.50               cachem_1.1.0            jsonlite_1.8.9         
## [25] blob_1.2.4              DelayedArray_0.33.3     BiocParallel_1.41.0    
## [28] parallel_4.5.0          R6_2.5.1                bslib_0.8.0            
## [31] stringi_1.8.4           RColorBrewer_1.1-3      jquerylib_0.1.4        
## [34] Rcpp_1.0.13-1           wheatmap_0.2.0          readr_2.1.5            
## [37] Matrix_1.7-1            tidyselect_1.2.1        abind_1.4-8            
## [40] yaml_2.3.10             codetools_0.2-20        curl_6.1.0             
## [43] lattice_0.22-6          tibble_3.2.1            plyr_1.8.9             
## [46] withr_3.0.2             KEGGREST_1.47.0         evaluate_1.0.1         
## [49] Biostrings_2.75.3       pillar_1.10.1           BiocManager_1.30.25    
## [52] filelock_1.0.3          plotly_4.10.4           RCurl_1.98-1.16        
## [55] BiocVersion_3.21.1      hms_1.1.3               ggplot2_3.5.1          
## [58] munsell_0.5.1           scales_1.3.0            glue_1.8.0             
## [61] lazyeval_0.2.2          tools_4.5.0             data.table_1.16.4      
## [64] grid_4.5.0              tidyr_1.3.1             AnnotationDbi_1.69.0   
## [67] colorspace_2.1-1        GenomeInfoDbData_1.2.13 cli_3.6.3              
## [70] rappdirs_0.3.3          S4Arrays_1.7.1          viridisLite_0.4.2      
## [73] dplyr_1.1.4             gtable_0.3.6            sass_0.4.9             
## [76] digest_0.6.37           SparseArray_1.7.2       ggrepel_0.9.6          
## [79] htmlwidgets_1.6.4       memoise_2.0.1           htmltools_0.5.8.1      
## [82] lifecycle_1.0.4         httr_1.4.7              bit64_4.5.2