DegCre Introduction and Examples

1 Introduction

DegCre associates differentially expressed genes (DEGs) with cis-regulatory elements (CREs) in a probabilistic, non-parametric approach. DegCre is intended to be applied on differential expression and regulatory signal data derived from responses to perturbations such as drug or natural ligand treatmnents. As an example used here, we have obtained data from McDowell et al. (McDowell et al. 2018) which was generated by treating A549 cells with dexamethasone and measuring RNA-seq and ChIP-seq data at several time points. Data from RNA-seq and NR3C1 ChIP-seq at four hours versus control is stored in the list DexNR3C1.

DegCre uses the GenomicRanges framework for handling genomic regions and some calculations. As one input, DegGR, users generate differential expression statistics for genes with methods such as DESeq2 or edgeR. These values should then be associated with gene TSSs such as those available from EPDNew (Dreos et al. 2015; Meylan et al. 2020) in a GRanges. The second input, CreGR, is differential regulatory signal data (in the example, NR3C1 ChIP-seq data) associated with genomic regions in a GRanges such as those generated by csaw.

A complete description of the mathematical basis of the DegCre core algorithms is provided in (Roberts, Cooper, and Myers 2023). DegCre generates a Hits object of all associations between DegGR and CreGR within a specified maximum distance. Associations are then binned by TSS-to-CRE distance according to an algorithm that balances resolution (many bins with few members) versus minimization of the deviance of each bin’s CRE p-value distribution from the global distribution, selecting an optimal bin size.

Next, DegCre applies a non-parametric algorithm to find concordance between DEG and CRE differential effects within bins and derives an association probability. For all association probabilities involving one given CRE, the probabilities are adjusted to favor associations across shorter distances. An FDR of the association probability is then estimated. Results are returned in list containing a Hits object and both input GRanges.

2 Generating DegCre associations

To begin, we load the necessary packages:

library(GenomicRanges)
library(InteractionSet)
library(plotgardener)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(DegCre)

DegCre is run on the two input GRanges, DegGR and CreGR, and differential p-values associated with each. To require the effect directions be concordant between the changes in gene expression and CRE signal, users must also supply log fold-changes for each and set reqEffectDirConcord=TRUE (default value). As example data, we use DexNR3C1 derived from McDowell et al. (McDowell et al. 2018) using DESeq2 and csaw:

data(DexNR3C1)
lapply(DexNR3C1,head)
#> $DegGR
#> GRanges object with 6 ranges and 7 metadata columns:
#>       seqnames          ranges strand | promGeneName    GeneSymb
#>          <Rle>       <IRanges>  <Rle> |  <character> <character>
#>   [1]     chr1   959255-959256      - |      NOC2L_1       NOC2L
#>   [2]     chr1   960632-960633      + |     KLHL17_1      KLHL17
#>   [3]     chr1   966481-966482      + |    PLEKHN1_1     PLEKHN1
#>   [4]     chr1   976680-976681      - |      PERM1_1       PERM1
#>   [5]     chr1 1000096-1000097      - |       HES4_1        HES4
#>   [6]     chr1 1000510-1000511      + |      ISG15_2       ISG15
#>                   GeneID   baseMean      logFC       pVal      pAdj
#>              <character>  <numeric>  <numeric>  <numeric> <numeric>
#>   [1]  ENSG00000188976.9 1404.01811 -0.0148658 0.89202335 0.9804131
#>   [2] ENSG00000187961.12  182.81254 -0.5189710 0.00183145 0.0235433
#>   [3]  ENSG00000187583.9   56.95208 -0.1598313 0.50313987 0.8747718
#>   [4]  ENSG00000187642.8    1.41588 -1.0589853 0.49900530        NA
#>   [5]  ENSG00000188290.9   76.33487 -0.7547663 0.00184199 0.0236628
#>   [6]  ENSG00000187608.7   97.50041 -0.0390943 0.88895013 0.9803767
#>   -------
#>   seqinfo: 23 sequences from an unspecified genome; no seqlengths
#> 
#> $CreGR
#> GRanges object with 6 ranges and 3 metadata columns:
#>       seqnames          ranges strand |     logFC      pVal      pAdj
#>          <Rle>       <IRanges>  <Rle> | <numeric> <numeric> <numeric>
#>   [1]     chr1   941905-941960      * |  1.875569 0.3827979  0.947530
#>   [2]     chr1   943129-943526      * |  3.713827 0.0154015  0.120363
#>   [3]     chr1 1686385-1686476      * |  2.030911 0.4654492  1.000000
#>   [4]     chr1 1741105-1741250      * |  2.061714 0.2426531  0.732045
#>   [5]     chr1 1777591-1777610      * | -0.366914 0.6429142  1.000000
#>   [6]     chr1 1777645-1777700      * |  0.241421 1.0000000  1.000000
#>   -------
#>   seqinfo: 23 sequences from an unspecified genome

We now run DegCre on this data with default values:

myDegCreResList <- runDegCre(DegGR=DexNR3C1$DegGR,
                             DegP=DexNR3C1$DegGR$pVal,
                             DegLfc=DexNR3C1$DegGR$logFC,
                             CreGR=DexNR3C1$CreGR,
                             CreP=DexNR3C1$CreGR$pVal,
                             CreLfc=DexNR3C1$CreGR$logFC,
                             reqEffectDirConcord=TRUE,
                             verbose=FALSE)

DegCre produces a list object with several items. These include the inputs DegGR and CreGR. It also contains a Hits object degCreHits. The queryHits of degCreHits index DegGR and the subjectHits index CreGR. degCreHits also has several metadata columns that include the key resuls of the algorithm. These include assocProb which is the probabilty of the association being true, and assocProbFDR which is the FDR of assocProb being greater than a null model based on TSS to CRE distance alone. Also, the results of the distance bin optimization heuristic, binHeurOutputs, and the input DEG significance threshold alphaVal are returned.

names(myDegCreResList)
#> [1] "degCreHits"     "binHeurOutputs" "alphaVal"       "DegGR"         
#> [5] "CreGR"
head(myDegCreResList$degCreHits)
#> Hits object with 6 hits and 10 metadata columns:
#>       queryHits subjectHits | assocDist assocProb assocProbFDR rawAssocProb
#>       <integer>   <integer> | <integer> <numeric>    <numeric>    <numeric>
#>   [1]         1           5 |    818334         0            1    0.0458060
#>   [2]         2           5 |    816957         0            1    0.0458060
#>   [3]         3           5 |    811108         0            1    0.0459890
#>   [4]         4           5 |    800909         0            1    0.0460580
#>   [5]         5           5 |    777493         0            1    0.0643178
#>   [6]         6           5 |    777079         0            1    0.0643178
#>            CreP       DegP   DegPadj binAssocDist    numObs distBinId
#>       <numeric>  <numeric> <numeric>    <numeric> <numeric> <integer>
#>   [1]  0.642914 0.89202335         1       821065      3896       167
#>   [2]  0.642914 0.00183145         1       821065      3896       167
#>   [3]  0.642914 0.50313987         1       815644      3896       166
#>   [4]  0.642914 0.49900530         1       804825      3896       164
#>   [5]  0.642914 0.00184199         1       778623      3896       159
#>   [6]  0.642914 0.88895013         1       778623      3896       159
#>   -------
#>   queryLength: 28196 / subjectLength: 51750

To find associations driven by non-random changes in CRE signal, we subset the degCreHits by assocProbFDR. We can also further subset by assocProb to find those also more likely to confirm in a suitable validation experiment:

passFDR <- which(mcols(myDegCreResList$degCreHits)$assocProbFDR <= 0.05)
passProb <- which(mcols(myDegCreResList$degCreHits)$assocProb >= 0.25)

keepDegCreHits <- myDegCreResList$degCreHits[intersect(passFDR,passProb)]

keepDegCreHits
#> Hits object with 559 hits and 10 metadata columns:
#>         queryHits subjectHits | assocDist assocProb assocProbFDR rawAssocProb
#>         <integer>   <integer> | <integer> <numeric>    <numeric>    <numeric>
#>     [1]       110          13 |       133  0.300401  0.00000e+00     0.300401
#>     [2]       111          13 |      1110  0.300401  0.00000e+00     0.300401
#>     [3]       148          28 |       989  0.429231  0.00000e+00     0.429231
#>     [4]       148          69 |    316979  0.660000  4.44579e-08     0.660000
#>     [5]       154          57 |    879235  0.371250  8.63207e-05     0.495000
#>     ...       ...         ... .       ...       ...          ...          ...
#>   [555]     27797       51609 |     15772  0.281560  1.16942e-09     0.281560
#>   [556]     27797       51619 |     25192  0.330000  3.33047e-09     0.330000
#>   [557]     27798       51611 |         0  0.313685  0.00000e+00     0.313685
#>   [558]     27798       51612 |      1022  0.529535  1.36602e-12     0.529535
#>   [559]     27798       51619 |     24638  0.330000  3.33047e-09     0.330000
#>                CreP         DegP      DegPadj binAssocDist    numObs distBinId
#>           <numeric>    <numeric>    <numeric>    <numeric> <numeric> <integer>
#>     [1] 2.67114e-02  9.18081e-10  2.37902e-05         1596      3896         1
#>     [2] 2.67114e-02  9.18081e-10  2.37902e-05         1596      3896         1
#>     [3] 8.83113e-04  1.41619e-24  3.66977e-20         1596      3896         1
#>     [4] 2.64168e-06  1.41619e-24  3.66977e-20       317788      3896        69
#>     [5] 9.41473e-07  3.36800e-11  8.72750e-07       880878      3896       178
#>     ...         ...          ...          ...          ...       ...       ...
#>   [555] 3.95196e-04 2.42291e-131 6.27849e-127        16085      3896         5
#>   [556] 7.13843e-05 2.42291e-131 6.27849e-127        28796      3896         8
#>   [557] 1.72325e-02 2.42291e-131 6.27849e-127         1596      3896         1
#>   [558] 4.74327e-05 2.42291e-131 6.27849e-127         1596      3896         1
#>   [559] 7.13843e-05 2.42291e-131 6.27849e-127        28796      3896         8
#>   -------
#>   queryLength: 28196 / subjectLength: 51750

We can then view the subsets of DegGR and CreGR that comprise the passing associations:


myDegCreResList$DegGR[queryHits(keepDegCreHits)]
#> GRanges object with 559 ranges and 7 metadata columns:
#>         seqnames              ranges strand | promGeneName    GeneSymb
#>            <Rle>           <IRanges>  <Rle> |  <character> <character>
#>     [1]     chr1     6259438-6259439      - |     GPR153_2      GPR153
#>     [2]     chr1     6261063-6261064      - |     GPR153_1      GPR153
#>     [3]     chr1     8026308-8026309      - |     ERRFI1_1      ERRFI1
#>     [4]     chr1     8026308-8026309      - |     ERRFI1_1      ERRFI1
#>     [5]     chr1     9069634-9069635      - |     SLC2A5_1      SLC2A5
#>     ...      ...                 ...    ... .          ...         ...
#>   [555]     chrX 107716507-107716508      - |    TSC22D3_2     TSC22D3
#>   [556]     chrX 107716507-107716508      - |    TSC22D3_2     TSC22D3
#>   [557]     chrX 107717061-107717062      - |    TSC22D3_1     TSC22D3
#>   [558]     chrX 107717061-107717062      - |    TSC22D3_1     TSC22D3
#>   [559]     chrX 107717061-107717062      - |    TSC22D3_1     TSC22D3
#>                     GeneID  baseMean     logFC         pVal         pAdj
#>                <character> <numeric> <numeric>    <numeric>    <numeric>
#>     [1]  ENSG00000158292.6   98.8953   1.86950  9.18081e-10  2.37902e-05
#>     [2]  ENSG00000158292.6   98.8953   1.86950  9.18081e-10  2.37902e-05
#>     [3] ENSG00000116285.11 7231.1025   1.76693  1.41619e-24  3.66977e-20
#>     [4] ENSG00000116285.11 7231.1025   1.76693  1.41619e-24  3.66977e-20
#>     [5] ENSG00000142583.16   34.6201   2.17051  3.36800e-11  8.72750e-07
#>     ...                ...       ...       ...          ...          ...
#>   [555] ENSG00000157514.15   2076.06   3.84171 2.42291e-131 6.27849e-127
#>   [556] ENSG00000157514.15   2076.06   3.84171 2.42291e-131 6.27849e-127
#>   [557] ENSG00000157514.15   2076.06   3.84171 2.42291e-131 6.27849e-127
#>   [558] ENSG00000157514.15   2076.06   3.84171 2.42291e-131 6.27849e-127
#>   [559] ENSG00000157514.15   2076.06   3.84171 2.42291e-131 6.27849e-127
#>   -------
#>   seqinfo: 23 sequences from an unspecified genome; no seqlengths

myDegCreResList$CreGR[subjectHits(keepDegCreHits)]
#> GRanges object with 559 ranges and 3 metadata columns:
#>         seqnames              ranges strand |     logFC        pVal        pAdj
#>            <Rle>           <IRanges>  <Rle> | <numeric>   <numeric>   <numeric>
#>     [1]     chr1     6259573-6259952      * |   5.14967 2.67114e-02 0.171365372
#>     [2]     chr1     6259573-6259952      * |   5.14967 2.67114e-02 0.171365372
#>     [3]     chr1     8027299-8027912      * |   6.35323 8.83113e-04 0.020694028
#>     [4]     chr1     8343289-8344100      * |   8.30635 2.64168e-06 0.000865390
#>     [5]     chr1     8189839-8190398      * |   8.45963 9.41473e-07 0.000490121
#>     ...      ...                 ...    ... .       ...         ...         ...
#>   [555]     chrX 107700193-107700734      * |   6.71290 3.95196e-04  0.01459899
#>   [556]     chrX 107741701-107742296      * |   7.36680 7.13843e-05  0.00568158
#>   [557]     chrX 107716969-107717312      * |   5.23997 1.72325e-02  0.12909414
#>   [558]     chrX 107718085-107718644      * |   7.46588 4.74327e-05  0.00435653
#>   [559]     chrX 107741701-107742296      * |   7.36680 7.13843e-05  0.00568158
#>   -------
#>   seqinfo: 23 sequences from an unspecified genome

The choice of the threshold for DEG significance (DEG \(\alpha\)) is important in DegCre calculations. Accordingly, DegCre can be run over a set of thresholds to determine an optimal value. Here we select DEG \(\alpha\) by finding the value that maximizes a normalized Precision-Recall Area Under the Curve (AUC):

alphaOptList <- optimizeAlphaDegCre(DegGR=DexNR3C1$DegGR,
                                    DegP=DexNR3C1$DegGR$pVal,
                                    DegLfc=DexNR3C1$DegGR$logFC,
                                    CreGR=DexNR3C1$CreGR,
                                    CreP=DexNR3C1$CreGR$pVal,
                                    CreLfc=DexNR3C1$CreGR$logFC,
                                    reqEffectDirConcord=TRUE,
                                    verbose=FALSE)

alphaOptList$alphaPRMat
#>             alphaVal         AUC    deltaAUC normDeltaAUC
#> alpha_0.005    0.005 0.009187438 0.006378657  0.006396623
#> alpha_0.01     0.010 0.009699195 0.006606708  0.006627203
#> alpha_0.02     0.020 0.009783394 0.006607637  0.006628688
#> alpha_0.03     0.030 0.010063426 0.006787191  0.006809501
#> alpha_0.05     0.050 0.010220304 0.006841781  0.006864975
#> alpha_0.1      0.100 0.009113340 0.006014935  0.006033630

bestAlphaId <- which.max(alphaOptList$alphaPRMat[,4])

bestDegCreResList <- alphaOptList$degCreResListsByAlpha[[bestAlphaId]]

3 Visualizing DegCre results

With an optimal DegCre result obtained, we can plot the association probabilities versus binned association distance to see the relationship between the two:

par(mai=c(0.4,0.55,0.05,0.05))
par(mgp=c(1,0.4,0))
par(bty="l")

binStats <- plotDegCreAssocProbVsDist(degCreResList=bestDegCreResList,
                                      assocProbFDRThresh=0.05)

The total number of associations passing a reasonable FDR (0.05 in the figure) drops fairly quickly as bin TSS to CRE distance increases (top panel). Similarly, the median association probability (black line in lower panel) drops with distance as well. The blue range is the interquartile range of the association probabilities. The red line in the above plot represents the null association probability, calculated as the probability if all CRE p-values in a given bin were identical.

The binning of the associations by distance is also important to the operation of the DegCre algorithm. The choice of the bin size (number of associations per bin) is done by an optimization heuristic that tries a wide range of bin sizes and calculates the KS test statistic for each bin’s CRE p-value distribution compared against the global (unbinned) CRE p-value distribution. The algorithm finds the smallest bin size for which the median KS-statistic is within a specified fraction, fracMinKsMedianThresh, of the range of that for all tested bin sizes. We visualize the results of this process:

par(mai=c(0.5,0.5,0.05,0.3))
par(mgp=c(1,0.4,0))
par(bty="l")
par(cex.lab=0.6)
par(cex.axis=0.5)
plotDegCreBinHeuristic(degCreResList=bestDegCreResList)

The red dot indicates the chosen bin size run with the default value of ‘fracMinKsMedianThresh = 0.2’.

Another way to evaluate the performance of the DegCre on a data set is to make a Precision-Recall (PR) curve. To calculate these values, we define as a perfect set of associations, i.e. one that would generate a PR Area Under the Curve (AUC) of 1, as containing associations to all significant DEGs with uniform association probabilities of 1. Such a set will not result from real data, but it serves as a useful benchmark. Comparison of relative values of PR AUCs for DegCre results will be most informative. We plot the PR curve for the example data:

par(mai=c(0.5,0.5,0.05,0.2))
par(mgp=c(1,0.4,0))
par(cex.lab=0.6)
par(cex.axis=0.5)
par(bty="l")
degCrePRAUC(degCreResList=bestDegCreResList)

The black line is the performance of the DegCre associations and the red region is values obtained from random shuffles.

4 Obtaining the number of associations per DEG

Since DegCre calculates association probabilities, one can estimate the number of expected associations per significant DEG by summing the total associations that pass a chosen FDR. DEGs with many expected associations are more likely to have true associations. This can be done with the getExpectAssocPerDEG() function:

expectAssocPerDegDf <- getExpectAssocPerDEG(degCreResList=bestDegCreResList,
                                            geneNameColName="GeneSymb",
                                            assocAlpha=0.05)

The result is a DataFrame with one entry per gene, sorted by the expected number of DEGs, `expectAssocs’.

head(expectAssocPerDegDf)
#> DataFrame with 6 rows and 11 columns
#>   promGeneName    GeneSymb             GeneID  baseMean     logFC        pVal
#>    <character> <character>        <character> <numeric> <numeric>   <numeric>
#> 1      ITPKC_3       ITPKC  ENSG00000086544.2  658.0739  1.175359 1.07581e-18
#> 2       SIX3_1        SIX3  ENSG00000138083.4   43.6237  1.735667 5.28313e-11
#> 3      FOSL2_1       FOSL2 ENSG00000075426.10 7380.5301  0.795594 6.41842e-12
#> 4      ZFP36_1       ZFP36  ENSG00000128016.5 1230.1540  2.428890 3.93093e-38
#> 5      HMGA2_5       HMGA2 ENSG00000149948.12 1588.2157  0.830439 1.10844e-06
#> 6     AKAP13_1      AKAP13 ENSG00000170776.18 7365.4462  1.926333 2.51488e-36
#>          pAdj expectAssocs   nAssocs assocAlpha  degAlpha
#>     <numeric>    <numeric> <numeric>  <numeric> <numeric>
#> 1 2.78774e-14      29.1690       308       0.05      0.05
#> 2 1.36902e-06      16.6287       143       0.05      0.05
#> 3 1.66320e-07      16.5910       182       0.05      0.05
#> 4 1.01862e-33      13.3113       130       0.05      0.05
#> 5 2.87229e-02      12.4434        77       0.05      0.05
#> 6 6.51680e-32      12.3565       101       0.05      0.05

This result can be plotted as a histogram by the function plotExpectedAssocsPerDeg():

par(mai=c(0.5,0.5,0.15,0.05))
par(mgp=c(1,0.4,0))
par(bty="l")
par(cex.lab=0.6)
par(cex.axis=0.5)
plotExpectedAssocsPerDeg(expectAssocPerDegDf=expectAssocPerDegDf)

The dashed line indicates the median expected associations per DEG.

5 Browser views of DegCre results for genes or regions

One of the genes with a high number of expected associations and also a known target of NR3C1 (Glucocorticoid receptor) is ERRFI1. Our test data DexNR3C1 is derived from ChIP-seq of NR3C1 in cells treated with the powerful glucocorticoid, dexamethasone, so it is not surprising to find it with many expected associations. To visualize these associations, we can make a browser plot of the associations and underlying data with plotBrowserDegCre() which relies on the plotgardener package.

browserOuts <- plotBrowserDegCre(degCreResList=bestDegCreResList,
                                 geneName="ERRFI1",
                                 geneNameColName="GeneSymb",
                                 CreSignalName="NR3C1",
                                 plotWidth=3.5,
                                 plotHeight=2)

We now want to zoom in closer to the TSS of ERRFI1. To do this we specify the region we want to plot as a GRanges of length 1:

zoomGR <- GRanges(seqnames="chr1",
                  ranges=IRanges(start=7900e3,end=8400e3))

We then supply the GRanges to the plotRegionGR argument of plotBrowserDegCre():

zoomBrowserOuts <- plotBrowserDegCre(degCreResList=bestDegCreResList,
                                                   plotRegionGR=zoomGR,
                                                   geneNameColName="GeneSymb",
                                                   CreSignalName="NR3C1",
                                                   plotWidth=3.5,
                                                   plotHeight=2)

Since we did not specify geneName it is not highlighted. We can force highlights by supplying a DataFrame of gene names and colors to the geneHighlightDf argument. This DataFrame should conform to the requirements of plotgardener::plotGenes().

geneHighDf <- data.frame(gene=bestDegCreResList$DegGR$GeneSymb,
                         col="gray")
geneHighDf[which(geneHighDf$gene=="ERRFI1"),2] <- "black"

Then geneHighDf is supplied to the geneHighlightDf argument to produce:

highLightBrowserOuts <- plotBrowserDegCre(degCreResList=bestDegCreResList,
                                          plotRegionGR=zoomGR,
                                          CreSignalName="NR3C1",
                                          geneNameColName="GeneSymb",
                                          geneHighlightDf=geneHighDf,
                                          plotWidth=3.5,
                                          plotHeight=2)

6 Conversion of DegCre results to other formats

The list of results that DegCre() produces has many useful features for downstream analysis. However, conversion to other formats will be useful in some situations. The function convertdegCreResListToGInteraction() will convert the DegCre list to a GInteractions object from the InteractionSet package:

degCreGInter <-
    convertdegCreResListToGInteraction(degCreResList=bestDegCreResList,
                                       assocAlpha=0.05)

degCreGInter
#> GInteractions object with 10650 interactions and 20 metadata columns:
#>           seqnames1             ranges1     seqnames2             ranges2 |
#>               <Rle>           <IRanges>         <Rle>           <IRanges> |
#>       [1]      chr1     6180123-6180124 ---      chr1     6157801-6158216 |
#>       [2]      chr1     6180123-6180124 ---      chr1     5982949-5983094 |
#>       [3]      chr1     6180123-6180124 ---      chr1     5727367-5727386 |
#>       [4]      chr1     6259438-6259439 ---      chr1     6259573-6259952 |
#>       [5]      chr1     6261063-6261064 ---      chr1     6259573-6259952 |
#>       ...       ...                 ... ...       ...                 ... .
#>   [10646]      chrX 110003113-110003114 ---      chrX 109840609-109840934 |
#>   [10647]      chrX 110003113-110003114 ---      chrX 110173231-110173646 |
#>   [10648]      chrX 110003113-110003114 ---      chrX 109830871-109831214 |
#>   [10649]      chrX 110003113-110003114 ---      chrX 110180791-110181224 |
#>   [10650]      chrX 110003113-110003114 ---      chrX 109739917-109740692 |
#>           assocDist assocProb assocProbFDR rawAssocProb        CreP        DegP
#>           <integer> <numeric>    <numeric>    <numeric>   <numeric>   <numeric>
#>       [1]     21906 0.1627315  1.88664e-03    0.1627315 0.000925164 3.15116e-07
#>       [2]    197028 0.1055556  9.02893e-05    0.1055556 0.015637885 3.15116e-07
#>       [3]    452736 0.0797052  6.41878e-04    0.0797052 0.012445876 3.15116e-07
#>       [4]       133 0.3063931  0.00000e+00    0.3063931 0.026711376 9.18081e-10
#>       [5]      1110 0.3063931  0.00000e+00    0.3063931 0.026711376 9.18081e-10
#>       ...       ...       ...          ...          ...         ...         ...
#>   [10646]    162178 0.0760000   0.00507117    0.0760000 1.65078e-01 5.43159e-21
#>   [10647]    170116 0.0935961   0.03435877    0.0935961 9.76967e-03 5.43159e-21
#>   [10648]    171898 0.0926250   0.03434557    0.0926250 6.30914e-03 5.43159e-21
#>   [10649]    177676 0.0941441   0.02537595    0.0941441 1.01861e-02 5.43159e-21
#>   [10650]    262420 0.1225806   0.01059937    0.1225806 1.03791e-05 5.43159e-21
#>               DegPadj binAssocDist    numObs distBinId Deg_promGeneName
#>             <numeric>    <numeric> <numeric> <integer>      <character>
#>       [1] 8.16560e-03        24489      3896         7           CHD5_1
#>       [2] 8.16560e-03       199266      3896        45           CHD5_1
#>       [3] 8.16560e-03       453947      3896        96           CHD5_1
#>       [4] 2.37902e-05         1596      3896         1         GPR153_2
#>       [5] 2.37902e-05         1596      3896         1         GPR153_1
#>       ...         ...          ...       ...       ...              ...
#>   [10646] 1.40749e-16       165477      3896        38        TMEM164_1
#>   [10647] 1.40749e-16       170366      3896        39        TMEM164_1
#>   [10648] 1.40749e-16       175209      3896        40        TMEM164_1
#>   [10649] 1.40749e-16       180010      3896        41        TMEM164_1
#>   [10650] 1.40749e-16       262767      3896        58        TMEM164_1
#>           Deg_GeneSymb         Deg_GeneID Deg_baseMean Deg_logFC    Deg_pVal
#>            <character>        <character>    <numeric> <numeric>   <numeric>
#>       [1]         CHD5 ENSG00000116254.16      52.5996   1.50353 3.15116e-07
#>       [2]         CHD5 ENSG00000116254.16      52.5996   1.50353 3.15116e-07
#>       [3]         CHD5 ENSG00000116254.16      52.5996   1.50353 3.15116e-07
#>       [4]       GPR153  ENSG00000158292.6      98.8953   1.86950 9.18081e-10
#>       [5]       GPR153  ENSG00000158292.6      98.8953   1.86950 9.18081e-10
#>       ...          ...                ...          ...       ...         ...
#>   [10646]      TMEM164 ENSG00000157600.10      3934.32   1.13948 5.43159e-21
#>   [10647]      TMEM164 ENSG00000157600.10      3934.32   1.13948 5.43159e-21
#>   [10648]      TMEM164 ENSG00000157600.10      3934.32   1.13948 5.43159e-21
#>   [10649]      TMEM164 ENSG00000157600.10      3934.32   1.13948 5.43159e-21
#>   [10650]      TMEM164 ENSG00000157600.10      3934.32   1.13948 5.43159e-21
#>              Deg_pAdj Cre_logFC    Cre_pVal   Cre_pAdj
#>             <numeric> <numeric>   <numeric>  <numeric>
#>       [1] 8.16560e-03   6.06528 0.000925164  0.0210432
#>       [2] 8.16560e-03   5.02309 0.015637885  0.1215536
#>       [3] 8.16560e-03   4.36742 0.012445876  0.1047255
#>       [4] 2.37902e-05   5.14967 0.026711376  0.1713654
#>       [5] 2.37902e-05   5.14967 0.026711376  0.1713654
#>       ...         ...       ...         ...        ...
#>   [10646] 1.40749e-16   2.43940 1.65078e-01 0.57537625
#>   [10647] 1.40749e-16   5.42918 9.76967e-03 0.08969937
#>   [10648] 1.40749e-16   5.48361 6.30914e-03 0.06738242
#>   [10649] 1.40749e-16   5.41784 1.01861e-02 0.09198129
#>   [10650] 1.40749e-16   7.97116 1.03791e-05 0.00183883
#>   -------
#>   regions: 8511 ranges and 0 metadata columns
#>   seqinfo: 23 sequences from an unspecified genome

The DegCre results list can can also be converted to a DataFrame in roughly a BEDPE format. This is useful for writing out DegCre results as text files:

degCreDf <- convertDegCreDataFrame(degCreResList=bestDegCreResList,
                                   assocAlpha=0.05)

head(degCreDf)
#>   Deg_chr Deg_start Deg_end Deg_strand Cre_chr Cre_start Cre_end Cre_strand
#> 1    chr1   6180123 6180124          -    chr1   6157801 6158216          *
#> 2    chr1   6180123 6180124          -    chr1   5982949 5983094          *
#> 3    chr1   6180123 6180124          -    chr1   5727367 5727386          *
#> 4    chr1   6259438 6259439          -    chr1   6259573 6259952          *
#> 5    chr1   6261063 6261064          -    chr1   6259573 6259952          *
#> 6    chr1   6419918 6419919          -    chr1   6412897 6413132          *
#>   assocDist  assocProb assocProbFDR rawAssocProb binAssocDist numObs distBinId
#> 1     21906 0.16273148 1.886640e-03   0.16273148        24489   3896         7
#> 2    197028 0.10555556 9.028929e-05   0.10555556       199266   3896        45
#> 3    452736 0.07970522 6.418779e-04   0.07970522       453947   3896        96
#> 4       133 0.30639313 0.000000e+00   0.30639313         1596   3896         1
#> 5      1110 0.30639313 0.000000e+00   0.30639313         1596   3896         1
#> 6      6785 0.23011457 0.000000e+00   0.23011457         8029   3896         3
#>   Deg_promGeneName Deg_GeneSymb         Deg_GeneID Deg_baseMean Deg_logFC
#> 1           CHD5_1         CHD5 ENSG00000116254.16     52.59957  1.503528
#> 2           CHD5_1         CHD5 ENSG00000116254.16     52.59957  1.503528
#> 3           CHD5_1         CHD5 ENSG00000116254.16     52.59957  1.503528
#> 4         GPR153_2       GPR153  ENSG00000158292.6     98.89534  1.869499
#> 5         GPR153_1       GPR153  ENSG00000158292.6     98.89534  1.869499
#> 6           HES2_1         HES2 ENSG00000069812.10     34.96616  2.055450
#>       Deg_pVal     Deg_pAdj Cre_logFC     Cre_pVal   Cre_pAdj
#> 1 3.151162e-07 8.165605e-03  6.065277 0.0009251642 0.02104324
#> 2 3.151162e-07 8.165605e-03  5.023088 0.0156378846 0.12155357
#> 3 3.151162e-07 8.165605e-03  4.367424 0.0124458764 0.10472552
#> 4 9.180810e-10 2.379023e-05  5.149673 0.0267113763 0.17136537
#> 5 9.180810e-10 2.379023e-05  5.149673 0.0267113763 0.17136537
#> 6 1.344031e-09 3.482787e-05  3.273808 0.0255184446 0.16628814

7 Session Info

sessionInfo()
#> R Under development (unstable) (2024-01-16 r85808)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] DegCre_0.99.4                           
#>  [2] TxDb.Hsapiens.UCSC.hg38.knownGene_3.18.0
#>  [3] GenomicFeatures_1.55.3                  
#>  [4] AnnotationDbi_1.65.2                    
#>  [5] plotgardener_1.9.3                      
#>  [6] InteractionSet_1.31.0                   
#>  [7] SummarizedExperiment_1.33.3             
#>  [8] Biobase_2.63.0                          
#>  [9] MatrixGenerics_1.15.0                   
#> [10] matrixStats_1.2.0                       
#> [11] GenomicRanges_1.55.2                    
#> [12] GenomeInfoDb_1.39.6                     
#> [13] IRanges_2.37.1                          
#> [14] S4Vectors_0.41.3                        
#> [15] BiocGenerics_0.49.1                     
#> 
#> loaded via a namespace (and not attached):
#>  [1] DBI_1.2.1                bitops_1.0-7             httr2_1.0.0             
#>  [4] biomaRt_2.59.1           rlang_1.1.3              magrittr_2.0.3          
#>  [7] compiler_4.4.0           RSQLite_2.3.5            png_0.1-8               
#> [10] vctrs_0.6.5              stringr_1.5.1            pkgconfig_2.0.3         
#> [13] crayon_1.5.2             fastmap_1.1.1            dbplyr_2.4.0            
#> [16] XVector_0.43.1           utf8_1.2.4               Rsamtools_2.19.3        
#> [19] rmarkdown_2.25           strawr_0.0.91            purrr_1.0.2             
#> [22] bit_4.0.5                xfun_0.42                zlibbioc_1.49.0         
#> [25] cachem_1.0.8             jsonlite_1.8.8           progress_1.2.3          
#> [28] blob_1.2.4               highr_0.10               DelayedArray_0.29.1     
#> [31] BiocParallel_1.37.0      parallel_4.4.0           prettyunits_1.2.0       
#> [34] R6_2.5.1                 plyranges_1.23.0         bslib_0.6.1             
#> [37] stringi_1.8.3            RColorBrewer_1.1-3       rtracklayer_1.63.0      
#> [40] jquerylib_0.1.4          Rcpp_1.0.12              knitr_1.45              
#> [43] Matrix_1.6-5             tidyselect_1.2.0         abind_1.4-5             
#> [46] yaml_2.3.8               codetools_0.2-19         curl_5.2.0              
#> [49] lattice_0.22-5           tibble_3.2.1             withr_3.0.0             
#> [52] KEGGREST_1.43.0          evaluate_0.23            gridGraphics_0.5-1      
#> [55] BiocFileCache_2.11.1     xml2_1.3.6               Biostrings_2.71.2       
#> [58] pillar_1.9.0             filelock_1.0.3           generics_0.1.3          
#> [61] RCurl_1.98-1.14          hms_1.1.3                ggplot2_3.4.4           
#> [64] munsell_0.5.0            scales_1.3.0             glue_1.7.0              
#> [67] tools_4.4.0              BiocIO_1.13.0            data.table_1.15.0       
#> [70] GenomicAlignments_1.39.2 fs_1.6.3                 XML_3.99-0.16.1         
#> [73] grid_4.4.0               colorspace_2.1-0         GenomeInfoDbData_1.2.11 
#> [76] restfulr_0.0.15          cli_3.6.2                rappdirs_0.3.3          
#> [79] fansi_1.0.6              S4Arrays_1.3.3           dplyr_1.1.4             
#> [82] gtable_0.3.4             yulab.utils_0.1.4        sass_0.4.8              
#> [85] digest_0.6.34            SparseArray_1.3.4        ggplotify_0.1.2         
#> [88] org.Hs.eg.db_3.18.0      rjson_0.2.21             memoise_2.0.1           
#> [91] htmltools_0.5.7          lifecycle_1.0.4          httr_1.4.7              
#> [94] bit64_4.0.5

8 References

Dreos, Rene, Giovanna Ambrosini, Rouayda Cavin Perier, and Philipp Bucher. 2015. “The Eukaryotic Promoter Database: Expansion of Epdnew and New Promoter Analysis Tools.” Nucleic Acids Research 43 (D1): D92–D96. https://doi.org/10.1093/nar/gku1111.

McDowell, Ian C., Alejandro Barrera, Anthony M. D’Ippolito, Christopher M. Vockley, Linda K. Hong, Sarah M. Leichter, Luke C. Bartelt, et al. 2018. “Glucocorticoid Receptor Recruits to Enhancers and Drives Activation by Motif-Directed Binding.” Genome Research 28 (9): 1272–84. https://doi.org/10.1101/gr.233346.117.

Meylan, Patrick, Rene Dreos, Giovanna Ambrosini, Romain Groux, and Philipp Bucher. 2020. “EPD in 2020: Enhanced Data Visualization and Extension to ncRNA Promoters.” Nucleic Acids Research 48 (D1): D65–D69. https://doi.org/10.1093/nar/gkz1014.

Roberts, Brian S., Gregory M. Cooper, and Richard M. Myers. 2023. “DegCre: Probabilistic Association of Differential Gene Expression with Regulatory Regions.” bioRxiv. https://doi.org/10.1101/2023.10.04.560923.