DegCre associates differentially expressed genes (DEGs) with cis-regulatory elements (CREs) in a probabilistic, non-parametric approach. DegCre is intended to be applied on differential expression and regulatory signal data derived from responses to perturbations such as drug or natural ligand treatmnents. As an example used here, we have obtained data from McDowell et al. (McDowell et al. 2018) which was generated by treating A549 cells with dexamethasone and measuring RNA-seq and ChIP-seq data at several time points. Data from RNA-seq and NR3C1 ChIP-seq at four hours versus control is stored in the list DexNR3C1
.
DegCre uses the GenomicRanges framework for handling genomic regions and some calculations. As one input, DegGR
, users generate differential expression statistics for genes with methods such as DESeq2 or edgeR. These values should then be associated with gene TSSs such as those available from EPDNew (Dreos et al. 2015; Meylan et al. 2020) in a GRanges
. The second input, CreGR
, is differential regulatory signal data (in the example, NR3C1 ChIP-seq data) associated with genomic regions in a GRanges
such as those generated by csaw.
A complete description of the mathematical basis of the DegCre core algorithms is provided in (Roberts, Cooper, and Myers 2023). DegCre generates a Hits
object of all associations between DegGR
and CreGR
within a specified maximum distance. Associations are then binned by TSS-to-CRE distance according to an algorithm that balances resolution (many bins with few members) versus minimization of the deviance of each bin’s CRE p-value distribution from the global distribution, selecting an optimal bin size.
Next, DegCre applies a non-parametric algorithm to find concordance between DEG and CRE differential effects within bins and derives an association probability. For all association probabilities involving one given CRE, the probabilities are adjusted to favor associations across shorter distances. An FDR of the association probability is then estimated. Results are returned in list containing a Hits
object and both input GRanges
.
To begin, we load the necessary packages:
library(GenomicRanges)
library(InteractionSet)
library(plotgardener)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(DegCre)
DegCre is run on the two input GRanges
, DegGR
and CreGR
, and differential p-values associated with each. To require the effect directions be concordant between the changes in gene expression and CRE signal, users must also supply log fold-changes for each and set reqEffectDirConcord=TRUE
(default value). As example data, we use DexNR3C1
derived from McDowell et al. (McDowell et al. 2018) using DESeq2 and csaw:
data(DexNR3C1)
lapply(DexNR3C1,head)
#> $DegGR
#> GRanges object with 6 ranges and 7 metadata columns:
#> seqnames ranges strand | promGeneName GeneSymb
#> <Rle> <IRanges> <Rle> | <character> <character>
#> [1] chr1 959255-959256 - | NOC2L_1 NOC2L
#> [2] chr1 960632-960633 + | KLHL17_1 KLHL17
#> [3] chr1 966481-966482 + | PLEKHN1_1 PLEKHN1
#> [4] chr1 976680-976681 - | PERM1_1 PERM1
#> [5] chr1 1000096-1000097 - | HES4_1 HES4
#> [6] chr1 1000510-1000511 + | ISG15_2 ISG15
#> GeneID baseMean logFC pVal pAdj
#> <character> <numeric> <numeric> <numeric> <numeric>
#> [1] ENSG00000188976.9 1404.01811 -0.0148658 0.89202335 0.9804131
#> [2] ENSG00000187961.12 182.81254 -0.5189710 0.00183145 0.0235433
#> [3] ENSG00000187583.9 56.95208 -0.1598313 0.50313987 0.8747718
#> [4] ENSG00000187642.8 1.41588 -1.0589853 0.49900530 NA
#> [5] ENSG00000188290.9 76.33487 -0.7547663 0.00184199 0.0236628
#> [6] ENSG00000187608.7 97.50041 -0.0390943 0.88895013 0.9803767
#> -------
#> seqinfo: 23 sequences from an unspecified genome; no seqlengths
#>
#> $CreGR
#> GRanges object with 6 ranges and 3 metadata columns:
#> seqnames ranges strand | logFC pVal pAdj
#> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric>
#> [1] chr1 941905-941960 * | 1.875569 0.3827979 0.947530
#> [2] chr1 943129-943526 * | 3.713827 0.0154015 0.120363
#> [3] chr1 1686385-1686476 * | 2.030911 0.4654492 1.000000
#> [4] chr1 1741105-1741250 * | 2.061714 0.2426531 0.732045
#> [5] chr1 1777591-1777610 * | -0.366914 0.6429142 1.000000
#> [6] chr1 1777645-1777700 * | 0.241421 1.0000000 1.000000
#> -------
#> seqinfo: 23 sequences from an unspecified genome
We now run DegCre on this data with default values:
myDegCreResList <- runDegCre(DegGR=DexNR3C1$DegGR,
DegP=DexNR3C1$DegGR$pVal,
DegLfc=DexNR3C1$DegGR$logFC,
CreGR=DexNR3C1$CreGR,
CreP=DexNR3C1$CreGR$pVal,
CreLfc=DexNR3C1$CreGR$logFC,
reqEffectDirConcord=TRUE,
verbose=FALSE)
DegCre produces a list object with several items. These include the inputs DegGR
and CreGR
. It also contains a Hits
object degCreHits
. The queryHits
of degCreHits
index DegGR
and the subjectHits
index CreGR
. degCreHits
also has several metadata columns that include the key resuls of the algorithm. These include assocProb
which is the probabilty of the association being true, and assocProbFDR
which is the FDR of assocProb
being greater than a null model based on TSS to CRE distance alone. Also, the results of the distance bin optimization heuristic, binHeurOutputs
, and the input DEG significance threshold alphaVal
are returned.
names(myDegCreResList)
#> [1] "degCreHits" "binHeurOutputs" "alphaVal" "DegGR"
#> [5] "CreGR"
head(myDegCreResList$degCreHits)
#> Hits object with 6 hits and 10 metadata columns:
#> queryHits subjectHits | assocDist assocProb assocProbFDR rawAssocProb
#> <integer> <integer> | <integer> <numeric> <numeric> <numeric>
#> [1] 1 5 | 818334 0 1 0.0458060
#> [2] 2 5 | 816957 0 1 0.0458060
#> [3] 3 5 | 811108 0 1 0.0459890
#> [4] 4 5 | 800909 0 1 0.0460580
#> [5] 5 5 | 777493 0 1 0.0643178
#> [6] 6 5 | 777079 0 1 0.0643178
#> CreP DegP DegPadj binAssocDist numObs distBinId
#> <numeric> <numeric> <numeric> <numeric> <numeric> <integer>
#> [1] 0.642914 0.89202335 1 821065 3896 167
#> [2] 0.642914 0.00183145 1 821065 3896 167
#> [3] 0.642914 0.50313987 1 815644 3896 166
#> [4] 0.642914 0.49900530 1 804825 3896 164
#> [5] 0.642914 0.00184199 1 778623 3896 159
#> [6] 0.642914 0.88895013 1 778623 3896 159
#> -------
#> queryLength: 28196 / subjectLength: 51750
To find associations driven by non-random changes in CRE signal, we subset the degCreHits
by assocProbFDR
. We can also further subset by assocProb
to find those also more likely to confirm in a suitable validation experiment:
passFDR <- which(mcols(myDegCreResList$degCreHits)$assocProbFDR <= 0.05)
passProb <- which(mcols(myDegCreResList$degCreHits)$assocProb >= 0.25)
keepDegCreHits <- myDegCreResList$degCreHits[intersect(passFDR,passProb)]
keepDegCreHits
#> Hits object with 559 hits and 10 metadata columns:
#> queryHits subjectHits | assocDist assocProb assocProbFDR rawAssocProb
#> <integer> <integer> | <integer> <numeric> <numeric> <numeric>
#> [1] 110 13 | 133 0.300401 0.00000e+00 0.300401
#> [2] 111 13 | 1110 0.300401 0.00000e+00 0.300401
#> [3] 148 28 | 989 0.429231 0.00000e+00 0.429231
#> [4] 148 69 | 316979 0.660000 4.44579e-08 0.660000
#> [5] 154 57 | 879235 0.371250 8.63207e-05 0.495000
#> ... ... ... . ... ... ... ...
#> [555] 27797 51609 | 15772 0.281560 1.16942e-09 0.281560
#> [556] 27797 51619 | 25192 0.330000 3.33047e-09 0.330000
#> [557] 27798 51611 | 0 0.313685 0.00000e+00 0.313685
#> [558] 27798 51612 | 1022 0.529535 1.36602e-12 0.529535
#> [559] 27798 51619 | 24638 0.330000 3.33047e-09 0.330000
#> CreP DegP DegPadj binAssocDist numObs distBinId
#> <numeric> <numeric> <numeric> <numeric> <numeric> <integer>
#> [1] 2.67114e-02 9.18081e-10 2.37902e-05 1596 3896 1
#> [2] 2.67114e-02 9.18081e-10 2.37902e-05 1596 3896 1
#> [3] 8.83113e-04 1.41619e-24 3.66977e-20 1596 3896 1
#> [4] 2.64168e-06 1.41619e-24 3.66977e-20 317788 3896 69
#> [5] 9.41473e-07 3.36800e-11 8.72750e-07 880878 3896 178
#> ... ... ... ... ... ... ...
#> [555] 3.95196e-04 2.42291e-131 6.27849e-127 16085 3896 5
#> [556] 7.13843e-05 2.42291e-131 6.27849e-127 28796 3896 8
#> [557] 1.72325e-02 2.42291e-131 6.27849e-127 1596 3896 1
#> [558] 4.74327e-05 2.42291e-131 6.27849e-127 1596 3896 1
#> [559] 7.13843e-05 2.42291e-131 6.27849e-127 28796 3896 8
#> -------
#> queryLength: 28196 / subjectLength: 51750
We can then view the subsets of DegGR
and CreGR
that comprise the passing associations:
myDegCreResList$DegGR[queryHits(keepDegCreHits)]
#> GRanges object with 559 ranges and 7 metadata columns:
#> seqnames ranges strand | promGeneName GeneSymb
#> <Rle> <IRanges> <Rle> | <character> <character>
#> [1] chr1 6259438-6259439 - | GPR153_2 GPR153
#> [2] chr1 6261063-6261064 - | GPR153_1 GPR153
#> [3] chr1 8026308-8026309 - | ERRFI1_1 ERRFI1
#> [4] chr1 8026308-8026309 - | ERRFI1_1 ERRFI1
#> [5] chr1 9069634-9069635 - | SLC2A5_1 SLC2A5
#> ... ... ... ... . ... ...
#> [555] chrX 107716507-107716508 - | TSC22D3_2 TSC22D3
#> [556] chrX 107716507-107716508 - | TSC22D3_2 TSC22D3
#> [557] chrX 107717061-107717062 - | TSC22D3_1 TSC22D3
#> [558] chrX 107717061-107717062 - | TSC22D3_1 TSC22D3
#> [559] chrX 107717061-107717062 - | TSC22D3_1 TSC22D3
#> GeneID baseMean logFC pVal pAdj
#> <character> <numeric> <numeric> <numeric> <numeric>
#> [1] ENSG00000158292.6 98.8953 1.86950 9.18081e-10 2.37902e-05
#> [2] ENSG00000158292.6 98.8953 1.86950 9.18081e-10 2.37902e-05
#> [3] ENSG00000116285.11 7231.1025 1.76693 1.41619e-24 3.66977e-20
#> [4] ENSG00000116285.11 7231.1025 1.76693 1.41619e-24 3.66977e-20
#> [5] ENSG00000142583.16 34.6201 2.17051 3.36800e-11 8.72750e-07
#> ... ... ... ... ... ...
#> [555] ENSG00000157514.15 2076.06 3.84171 2.42291e-131 6.27849e-127
#> [556] ENSG00000157514.15 2076.06 3.84171 2.42291e-131 6.27849e-127
#> [557] ENSG00000157514.15 2076.06 3.84171 2.42291e-131 6.27849e-127
#> [558] ENSG00000157514.15 2076.06 3.84171 2.42291e-131 6.27849e-127
#> [559] ENSG00000157514.15 2076.06 3.84171 2.42291e-131 6.27849e-127
#> -------
#> seqinfo: 23 sequences from an unspecified genome; no seqlengths
myDegCreResList$CreGR[subjectHits(keepDegCreHits)]
#> GRanges object with 559 ranges and 3 metadata columns:
#> seqnames ranges strand | logFC pVal pAdj
#> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric>
#> [1] chr1 6259573-6259952 * | 5.14967 2.67114e-02 0.171365372
#> [2] chr1 6259573-6259952 * | 5.14967 2.67114e-02 0.171365372
#> [3] chr1 8027299-8027912 * | 6.35323 8.83113e-04 0.020694028
#> [4] chr1 8343289-8344100 * | 8.30635 2.64168e-06 0.000865390
#> [5] chr1 8189839-8190398 * | 8.45963 9.41473e-07 0.000490121
#> ... ... ... ... . ... ... ...
#> [555] chrX 107700193-107700734 * | 6.71290 3.95196e-04 0.01459899
#> [556] chrX 107741701-107742296 * | 7.36680 7.13843e-05 0.00568158
#> [557] chrX 107716969-107717312 * | 5.23997 1.72325e-02 0.12909414
#> [558] chrX 107718085-107718644 * | 7.46588 4.74327e-05 0.00435653
#> [559] chrX 107741701-107742296 * | 7.36680 7.13843e-05 0.00568158
#> -------
#> seqinfo: 23 sequences from an unspecified genome
The choice of the threshold for DEG significance (DEG \(\alpha\)) is important in DegCre calculations. Accordingly, DegCre can be run over a set of thresholds to determine an optimal value. Here we select DEG \(\alpha\) by finding the value that maximizes a normalized Precision-Recall Area Under the Curve (AUC):
alphaOptList <- optimizeAlphaDegCre(DegGR=DexNR3C1$DegGR,
DegP=DexNR3C1$DegGR$pVal,
DegLfc=DexNR3C1$DegGR$logFC,
CreGR=DexNR3C1$CreGR,
CreP=DexNR3C1$CreGR$pVal,
CreLfc=DexNR3C1$CreGR$logFC,
reqEffectDirConcord=TRUE,
verbose=FALSE)
alphaOptList$alphaPRMat
#> alphaVal AUC deltaAUC normDeltaAUC
#> alpha_0.005 0.005 0.009187438 0.006362677 0.006380701
#> alpha_0.01 0.010 0.009699195 0.006609697 0.006630181
#> alpha_0.02 0.020 0.009783394 0.006609131 0.006630177
#> alpha_0.03 0.030 0.010063426 0.006785667 0.006807982
#> alpha_0.05 0.050 0.010220304 0.006829475 0.006852711
#> alpha_0.1 0.100 0.009113340 0.006023078 0.006041749
bestAlphaId <- which.max(alphaOptList$alphaPRMat[,4])
bestDegCreResList <- alphaOptList$degCreResListsByAlpha[[bestAlphaId]]
With an optimal DegCre result obtained, we can plot the association probabilities versus binned association distance to see the relationship between the two:
par(mai=c(0.4,0.55,0.05,0.05))
par(mgp=c(1,0.4,0))
par(bty="l")
binStats <- plotDegCreAssocProbVsDist(degCreResList=bestDegCreResList,
assocProbFDRThresh=0.05)
The total number of associations passing a reasonable FDR (0.05 in the figure) drops fairly quickly as bin TSS to CRE distance increases (top panel). Similarly, the median association probability (black line in lower panel) drops with distance as well. The blue range is the interquartile range of the association probabilities. The red line in the above plot represents the null association probability, calculated as the probability if all CRE p-values in a given bin were identical.
The binning of the associations by distance is also important to the operation of the DegCre algorithm. The choice of the bin size (number of associations per bin) is done by an optimization heuristic that tries a wide range of bin sizes and calculates the KS test statistic for each bin’s CRE p-value distribution compared against the global (unbinned) CRE p-value distribution. The algorithm finds the smallest bin size for which the median KS-statistic is within a specified fraction, fracMinKsMedianThresh
, of the range of that for all tested bin sizes. We visualize the results of this process:
par(mai=c(0.5,0.5,0.05,0.3))
par(mgp=c(1,0.4,0))
par(bty="l")
par(cex.lab=0.6)
par(cex.axis=0.5)
plotDegCreBinHeuristic(degCreResList=bestDegCreResList)
The red dot indicates the chosen bin size run with the default value of ‘fracMinKsMedianThresh = 0.2’.
Another way to evaluate the performance of the DegCre on a data set is to make a Precision-Recall (PR) curve. To calculate these values, we define as a perfect set of associations, i.e. one that would generate a PR Area Under the Curve (AUC) of 1, as containing associations to all significant DEGs with uniform association probabilities of 1. Such a set will not result from real data, but it serves as a useful benchmark. Comparison of relative values of PR AUCs for DegCre results will be most informative. We plot the PR curve for the example data:
par(mai=c(0.5,0.5,0.05,0.2))
par(mgp=c(1,0.4,0))
par(cex.lab=0.6)
par(cex.axis=0.5)
par(bty="l")
degCrePRAUC(degCreResList=bestDegCreResList)
The black line is the performance of the DegCre associations and the red regions in values obtained from random shuffles.
Since DegCre calculates association probabilities, one can estimate the number of expected associations per significant DEG by summing the total associations that pass a chosen FDR. DEGs with many expected associations are more likely to have true associations. This can be done with the getExpectAssocPerDEG()
function:
expectAssocPerDegDf <- getExpectAssocPerDEG(degCreResList=bestDegCreResList,
geneNameColName="GeneSymb",
assocAlpha=0.05)
The result is a DataFrame
with one entry per gene, sorted by the expected number of DEGs, `expectAssocs’.
head(expectAssocPerDegDf)
#> DataFrame with 6 rows and 11 columns
#> promGeneName GeneSymb GeneID baseMean logFC pVal
#> <character> <character> <character> <numeric> <numeric> <numeric>
#> 1 ITPKC_3 ITPKC ENSG00000086544.2 658.0739 1.175359 1.07581e-18
#> 2 SIX3_1 SIX3 ENSG00000138083.4 43.6237 1.735667 5.28313e-11
#> 3 FOSL2_1 FOSL2 ENSG00000075426.10 7380.5301 0.795594 6.41842e-12
#> 4 ZFP36_1 ZFP36 ENSG00000128016.5 1230.1540 2.428890 3.93093e-38
#> 5 HMGA2_5 HMGA2 ENSG00000149948.12 1588.2157 0.830439 1.10844e-06
#> 6 AKAP13_1 AKAP13 ENSG00000170776.18 7365.4462 1.926333 2.51488e-36
#> pAdj expectAssocs nAssocs assocAlpha degAlpha
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> 1 2.78774e-14 29.1690 308 0.05 0.05
#> 2 1.36902e-06 16.6287 143 0.05 0.05
#> 3 1.66320e-07 16.5910 182 0.05 0.05
#> 4 1.01862e-33 13.3113 130 0.05 0.05
#> 5 2.87229e-02 12.4434 77 0.05 0.05
#> 6 6.51680e-32 12.3565 101 0.05 0.05
This result can be plotted as a histogram by the function plotExpectedAssocsPerDeg()
:
par(mai=c(0.5,0.5,0.15,0.05))
par(mgp=c(1,0.4,0))
par(bty="l")
par(cex.lab=0.6)
par(cex.axis=0.5)
plotExpectedAssocsPerDeg(expectAssocPerDegDf=expectAssocPerDegDf)
The dashed line indicates the median expected associations per DEG.
One of the genes with a high number of expected associations and also a known target of NR3C1 (Glucocorticoid receptor) is ERRFI1. Our test data DexNR3C1
is derived from ChIP-seq of NR3C1 in cells treated with the powerful glucocorticoid, dexamethasone, so it is not surprising to find it with many expected associations. To visualize these associations, we can make a browser plot of the associations and underlying data with plotBrowserDegCre()
which relies on the plotgardener package.
browserOuts <- plotBrowserDegCre(degCreResList=bestDegCreResList,
geneName="ERRFI1",
geneNameColName="GeneSymb",
CreSignalName="NR3C1",
plotWidth=3.5,
plotHeight=2)
We now want to zoom in closer to the TSS of ERRFI1. To do this we specify the region we want to plot as a GRanges
of length 1:
We then supply the GRanges
to the plotRegionGR
argument of plotBrowserDegCre()
:
{r plot browser2, echo=TRUE, dpi=200, fig.height=2, fig.width=3.5, message=FALSE, warning=FALSE, fig.align='center'} zoomBrowserOuts <- plotBrowserDegCre(degCreResList=bestDegCreResList, plotRegionGR=zoomGR, geneNameColName="GeneSymb", CreSignalName="NR3C1", plotWidth=3.5, plotHeight=2)
Since we did not specify geneName
it is not highlighted. We can force highlights by supplying a DataFrame
of gene names and colors to the geneHighlightDf
argument. This DataFrame
should conform to the requirements of plotgardener::plotGenes()
.
geneHighDf <- data.frame(gene=bestDegCreResList$DegGR$GeneSymb,
col="gray")
geneHighDf[which(geneHighDf$gene=="ERRFI1"),2] <- "black"
Then geneHighDf
is supplied to the geneHighlightDf
argument to produce:
{r plot browser3, echo=TRUE, dpi=200, fig.height=2, fig.width=3.5, message=FALSE, warning=FALSE, fig.align='center'} highLightBrowserOuts <- plotBrowserDegCre(degCreResList=bestDegCreResList, plotRegionGR=zoomGR, CreSignalName="NR3C1", geneNameColName="GeneSymb", geneHighlightDf=geneHighDf, plotWidth=3.5, plotHeight=2)
The list of results that DegCre()
produces has many useful features for downstream analysis. However, conversion to other formats will be useful in some situations. The function convertdegCreResListToGInteraction()
will convert the DegCre list to a GInteractions
object from the InteractionSet package:
degCreGInter <-
convertdegCreResListToGInteraction(degCreResList=bestDegCreResList,
assocAlpha=0.05)
degCreGInter
#> GInteractions object with 10650 interactions and 20 metadata columns:
#> seqnames1 ranges1 seqnames2 ranges2 |
#> <Rle> <IRanges> <Rle> <IRanges> |
#> [1] chr1 6180123-6180124 --- chr1 6157801-6158216 |
#> [2] chr1 6180123-6180124 --- chr1 5982949-5983094 |
#> [3] chr1 6180123-6180124 --- chr1 5727367-5727386 |
#> [4] chr1 6259438-6259439 --- chr1 6259573-6259952 |
#> [5] chr1 6261063-6261064 --- chr1 6259573-6259952 |
#> ... ... ... ... ... ... .
#> [10646] chrX 110003113-110003114 --- chrX 109840609-109840934 |
#> [10647] chrX 110003113-110003114 --- chrX 110173231-110173646 |
#> [10648] chrX 110003113-110003114 --- chrX 109830871-109831214 |
#> [10649] chrX 110003113-110003114 --- chrX 110180791-110181224 |
#> [10650] chrX 110003113-110003114 --- chrX 109739917-109740692 |
#> assocDist assocProb assocProbFDR rawAssocProb CreP DegP
#> <integer> <numeric> <numeric> <numeric> <numeric> <numeric>
#> [1] 21906 0.1627315 1.88664e-03 0.1627315 0.000925164 3.15116e-07
#> [2] 197028 0.1055556 9.02893e-05 0.1055556 0.015637885 3.15116e-07
#> [3] 452736 0.0797052 6.41878e-04 0.0797052 0.012445876 3.15116e-07
#> [4] 133 0.3063931 0.00000e+00 0.3063931 0.026711376 9.18081e-10
#> [5] 1110 0.3063931 0.00000e+00 0.3063931 0.026711376 9.18081e-10
#> ... ... ... ... ... ... ...
#> [10646] 162178 0.0760000 0.00507117 0.0760000 1.65078e-01 5.43159e-21
#> [10647] 170116 0.0935961 0.03435877 0.0935961 9.76967e-03 5.43159e-21
#> [10648] 171898 0.0926250 0.03434557 0.0926250 6.30914e-03 5.43159e-21
#> [10649] 177676 0.0941441 0.02537595 0.0941441 1.01861e-02 5.43159e-21
#> [10650] 262420 0.1225806 0.01059937 0.1225806 1.03791e-05 5.43159e-21
#> DegPadj binAssocDist numObs distBinId Deg_promGeneName
#> <numeric> <numeric> <numeric> <integer> <character>
#> [1] 8.16560e-03 24489 3896 7 CHD5_1
#> [2] 8.16560e-03 199266 3896 45 CHD5_1
#> [3] 8.16560e-03 453947 3896 96 CHD5_1
#> [4] 2.37902e-05 1596 3896 1 GPR153_2
#> [5] 2.37902e-05 1596 3896 1 GPR153_1
#> ... ... ... ... ... ...
#> [10646] 1.40749e-16 165477 3896 38 TMEM164_1
#> [10647] 1.40749e-16 170366 3896 39 TMEM164_1
#> [10648] 1.40749e-16 175209 3896 40 TMEM164_1
#> [10649] 1.40749e-16 180010 3896 41 TMEM164_1
#> [10650] 1.40749e-16 262767 3896 58 TMEM164_1
#> Deg_GeneSymb Deg_GeneID Deg_baseMean Deg_logFC Deg_pVal
#> <character> <character> <numeric> <numeric> <numeric>
#> [1] CHD5 ENSG00000116254.16 52.5996 1.50353 3.15116e-07
#> [2] CHD5 ENSG00000116254.16 52.5996 1.50353 3.15116e-07
#> [3] CHD5 ENSG00000116254.16 52.5996 1.50353 3.15116e-07
#> [4] GPR153 ENSG00000158292.6 98.8953 1.86950 9.18081e-10
#> [5] GPR153 ENSG00000158292.6 98.8953 1.86950 9.18081e-10
#> ... ... ... ... ... ...
#> [10646] TMEM164 ENSG00000157600.10 3934.32 1.13948 5.43159e-21
#> [10647] TMEM164 ENSG00000157600.10 3934.32 1.13948 5.43159e-21
#> [10648] TMEM164 ENSG00000157600.10 3934.32 1.13948 5.43159e-21
#> [10649] TMEM164 ENSG00000157600.10 3934.32 1.13948 5.43159e-21
#> [10650] TMEM164 ENSG00000157600.10 3934.32 1.13948 5.43159e-21
#> Deg_pAdj Cre_logFC Cre_pVal Cre_pAdj
#> <numeric> <numeric> <numeric> <numeric>
#> [1] 8.16560e-03 6.06528 0.000925164 0.0210432
#> [2] 8.16560e-03 5.02309 0.015637885 0.1215536
#> [3] 8.16560e-03 4.36742 0.012445876 0.1047255
#> [4] 2.37902e-05 5.14967 0.026711376 0.1713654
#> [5] 2.37902e-05 5.14967 0.026711376 0.1713654
#> ... ... ... ... ...
#> [10646] 1.40749e-16 2.43940 1.65078e-01 0.57537625
#> [10647] 1.40749e-16 5.42918 9.76967e-03 0.08969937
#> [10648] 1.40749e-16 5.48361 6.30914e-03 0.06738242
#> [10649] 1.40749e-16 5.41784 1.01861e-02 0.09198129
#> [10650] 1.40749e-16 7.97116 1.03791e-05 0.00183883
#> -------
#> regions: 8511 ranges and 0 metadata columns
#> seqinfo: 23 sequences from an unspecified genome
The DegCre results list can can also be converted to a DataFrame
in roughly a BEDPE format. This is useful for writing out DegCre results as text files:
degCreDf <- convertDegCreDataFrame(degCreResList=bestDegCreResList,
assocAlpha=0.05)
head(degCreDf)
#> Deg_chr Deg_start Deg_end Deg_strand Cre_chr Cre_start Cre_end Cre_strand
#> 1 chr1 6180123 6180124 - chr1 6157801 6158216 *
#> 2 chr1 6180123 6180124 - chr1 5982949 5983094 *
#> 3 chr1 6180123 6180124 - chr1 5727367 5727386 *
#> 4 chr1 6259438 6259439 - chr1 6259573 6259952 *
#> 5 chr1 6261063 6261064 - chr1 6259573 6259952 *
#> 6 chr1 6419918 6419919 - chr1 6412897 6413132 *
#> assocDist assocProb assocProbFDR rawAssocProb binAssocDist numObs distBinId
#> 1 21906 0.16273148 1.886640e-03 0.16273148 24489 3896 7
#> 2 197028 0.10555556 9.028929e-05 0.10555556 199266 3896 45
#> 3 452736 0.07970522 6.418779e-04 0.07970522 453947 3896 96
#> 4 133 0.30639313 0.000000e+00 0.30639313 1596 3896 1
#> 5 1110 0.30639313 0.000000e+00 0.30639313 1596 3896 1
#> 6 6785 0.23011457 0.000000e+00 0.23011457 8029 3896 3
#> Deg_promGeneName Deg_GeneSymb Deg_GeneID Deg_baseMean Deg_logFC
#> 1 CHD5_1 CHD5 ENSG00000116254.16 52.59957 1.503528
#> 2 CHD5_1 CHD5 ENSG00000116254.16 52.59957 1.503528
#> 3 CHD5_1 CHD5 ENSG00000116254.16 52.59957 1.503528
#> 4 GPR153_2 GPR153 ENSG00000158292.6 98.89534 1.869499
#> 5 GPR153_1 GPR153 ENSG00000158292.6 98.89534 1.869499
#> 6 HES2_1 HES2 ENSG00000069812.10 34.96616 2.055450
#> Deg_pVal Deg_pAdj Cre_logFC Cre_pVal Cre_pAdj
#> 1 3.151162e-07 8.165605e-03 6.065277 0.0009251642 0.02104324
#> 2 3.151162e-07 8.165605e-03 5.023088 0.0156378846 0.12155357
#> 3 3.151162e-07 8.165605e-03 4.367424 0.0124458764 0.10472552
#> 4 9.180810e-10 2.379023e-05 5.149673 0.0267113763 0.17136537
#> 5 9.180810e-10 2.379023e-05 5.149673 0.0267113763 0.17136537
#> 6 1.344031e-09 3.482787e-05 3.273808 0.0255184446 0.16628814
sessionInfo()
#> R Under development (unstable) (2024-01-16 r85808)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] DegCre_0.99.2
#> [2] TxDb.Hsapiens.UCSC.hg38.knownGene_3.18.0
#> [3] GenomicFeatures_1.55.3
#> [4] AnnotationDbi_1.65.2
#> [5] plotgardener_1.9.3
#> [6] InteractionSet_1.31.0
#> [7] SummarizedExperiment_1.33.3
#> [8] Biobase_2.63.0
#> [9] MatrixGenerics_1.15.0
#> [10] matrixStats_1.2.0
#> [11] GenomicRanges_1.55.2
#> [12] GenomeInfoDb_1.39.5
#> [13] IRanges_2.37.1
#> [14] S4Vectors_0.41.3
#> [15] BiocGenerics_0.49.1
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.2.1 bitops_1.0-7 httr2_1.0.0
#> [4] biomaRt_2.59.1 rlang_1.1.3 magrittr_2.0.3
#> [7] compiler_4.4.0 RSQLite_2.3.5 png_0.1-8
#> [10] vctrs_0.6.5 stringr_1.5.1 pkgconfig_2.0.3
#> [13] crayon_1.5.2 fastmap_1.1.1 dbplyr_2.4.0
#> [16] XVector_0.43.1 utf8_1.2.4 Rsamtools_2.19.3
#> [19] rmarkdown_2.25 strawr_0.0.91 purrr_1.0.2
#> [22] bit_4.0.5 xfun_0.41 zlibbioc_1.49.0
#> [25] cachem_1.0.8 jsonlite_1.8.8 progress_1.2.3
#> [28] blob_1.2.4 highr_0.10 DelayedArray_0.29.1
#> [31] BiocParallel_1.37.0 parallel_4.4.0 prettyunits_1.2.0
#> [34] R6_2.5.1 plyranges_1.23.0 bslib_0.6.1
#> [37] stringi_1.8.3 RColorBrewer_1.1-3 rtracklayer_1.63.0
#> [40] jquerylib_0.1.4 Rcpp_1.0.12 knitr_1.45
#> [43] Matrix_1.6-5 tidyselect_1.2.0 abind_1.4-5
#> [46] yaml_2.3.8 codetools_0.2-19 curl_5.2.0
#> [49] lattice_0.22-5 tibble_3.2.1 withr_3.0.0
#> [52] KEGGREST_1.43.0 evaluate_0.23 gridGraphics_0.5-1
#> [55] BiocFileCache_2.11.1 xml2_1.3.6 Biostrings_2.71.2
#> [58] pillar_1.9.0 filelock_1.0.3 generics_0.1.3
#> [61] RCurl_1.98-1.14 hms_1.1.3 ggplot2_3.4.4
#> [64] munsell_0.5.0 scales_1.3.0 glue_1.7.0
#> [67] tools_4.4.0 BiocIO_1.13.0 data.table_1.15.0
#> [70] GenomicAlignments_1.39.2 fs_1.6.3 XML_3.99-0.16.1
#> [73] grid_4.4.0 colorspace_2.1-0 GenomeInfoDbData_1.2.11
#> [76] restfulr_0.0.15 cli_3.6.2 rappdirs_0.3.3
#> [79] fansi_1.0.6 S4Arrays_1.3.3 dplyr_1.1.4
#> [82] gtable_0.3.4 yulab.utils_0.1.4 sass_0.4.8
#> [85] digest_0.6.34 SparseArray_1.3.3 ggplotify_0.1.2
#> [88] org.Hs.eg.db_3.18.0 rjson_0.2.21 memoise_2.0.1
#> [91] htmltools_0.5.7 lifecycle_1.0.4 httr_1.4.7
#> [94] bit64_4.0.5
Dreos, Rene, Giovanna Ambrosini, Rouayda Cavin Perier, and Philipp Bucher. 2015. “The Eukaryotic Promoter Database: Expansion of Epdnew and New Promoter Analysis Tools.” Nucleic Acids Research 43 (D1): D92–D96. https://doi.org/10.1093/nar/gku1111.
McDowell, Ian C., Alejandro Barrera, Anthony M. D’Ippolito, Christopher M. Vockley, Linda K. Hong, Sarah M. Leichter, Luke C. Bartelt, et al. 2018. “Glucocorticoid Receptor Recruits to Enhancers and Drives Activation by Motif-Directed Binding.” Genome Research 28 (9): 1272–84. https://doi.org/10.1101/gr.233346.117.
Meylan, Patrick, Rene Dreos, Giovanna Ambrosini, Romain Groux, and Philipp Bucher. 2020. “EPD in 2020: Enhanced Data Visualization and Extension to ncRNA Promoters.” Nucleic Acids Research 48 (D1): D65–D69. https://doi.org/10.1093/nar/gkz1014.
Roberts, Brian S., Gregory M. Cooper, and Richard M. Myers. 2023. “DegCre: Probabilistic Association of Differential Gene Expression with Regulatory Regions.” bioRxiv. https://doi.org/10.1101/2023.10.04.560923.