Center for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
Bioinformatics Research Group in Epidemiolgy (BRGE)
(http://www.creal.cat/brge.htm)
In this vignette, the workflow to analyse methylation and expression data will be presented. The package also allows considering SNP data since it has been demonstrated the existence of meQTL and eQTL that could confound the results. The example data will be taken from MEALData
package (available at BRGE web), a data package that contains a subset from GEO GSE53261. This dataset contains data from untransformed human fibroblasts. The script with the processing steps from the GEO to the final datasets can be found in the folder scripts of MEALData package.
Let’s start by loading the required packages.
library(MEAL)
library(MEALData)
library(GenomicRanges)
There are four objects in MEALData: betavals, pheno, eset and snps. Let’s take a look to each of these objects.
betavals and pheno belongs to the methylation experiment. betavals is a matrix of beta values corresponding to an Infinium HumanMethylation 450K BeadChip and pheno a data.frame with the phenotypes.
betavals[1:5, 1:5]
## GM02704 GM02706 GM01650 GM01653 GM02640
## cg00000029 0.2864929 0.2069152 0.5525335 0.5176491 0.2791274
## cg00000108 0.9293251 0.9429289 0.8961195 0.9325227 0.9286574
## cg00000109 0.7168331 0.7419590 0.7320755 0.5347986 0.8196660
## cg00000165 0.2935756 0.2475510 0.4311649 0.3937953 0.3349943
## cg00000236 0.9026300 0.9034649 0.8755394 0.8710790 0.8867925
dim(betavals)
## [1] 451448 62
summary(pheno)
## gender source
## female:31 Coriell cell line:26
## male :31 other :36
betavals contains data from 451448 probes and 62 samples. Pheno contains the same number of samples and two variables: gender and source. There is the same number of cells coming from male and female donnors. Source is composed of Coriell cell lines and cells with unknown source.
eset is an ExpressionSet
with the expression data corresponding to an Illumina HumanRef-8 v3.0 expression beadchip.
eset
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 21916 features, 64 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: GM00016 GM00022 ... WG3466 (64 total)
## varLabels: gender
## varMetadata: labelDescription
## featureData
## featureNames: ILMN_1343291 ILMN_1651209 ... ILMN_2415949 (21916
## total)
## fvarLabels: chr start end genes
## fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: GPL6883
summary(pData(eset))
## gender
## female:32
## male :32
In this case, there are 21916 expression probes and 64 samples. As it can be seen, there are two more samples in the expression set. This is very common in datasets downloaded from GEO and it is not a problem if the analysis is going to be performed separately. On the other hand, the phenotype data only contains the variable gender and the number of female and male donnors is balanced.
Finally, let’s take a look at snps data. snps is a list containing two objects, the genotypes and the mapping.
str(snps)
## List of 2
## $ genotypes: num [1:100000, 1:98] 0 0.0952 0 0 0 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:100000] "rs2895914-126_B_R_1514138430" "rs9531337-119_T_F_1502299533" "rs8125266-126_B_F_1502281523" "rs7236498-126_B_R_1502260358" ...
## .. ..$ : chr [1:98] "WG2275" "WG2276" "WG2274" "WG2066" ...
## $ map :'data.frame': 100000 obs. of 5 variables:
## ..$ Chromosome: chr [1:100000] "12" "13" "20" "18" ...
## ..$ snp.name : Factor w/ 1199188 levels "200003","200006",..: 655945 1144767 1094637 986877 392450 931588 678765 822223 980058 429353 ...
## ..$ position : num [1:100000] 81254953 35479506 59293703 21765775 7472441 ...
## ..$ SNP : Factor w/ 12 levels "[A/C]","[A/G]",..: 7 1 7 7 7 2 1 2 2 7 ...
## ..$ chromosome: chr [1:100000] "chr12" "chr13" "chr20" "chr18" ...
Genotypes are enclosed in a matrix and can be retrieved by:
snps$genotypes[1:5, 1:5]
## WG2275 WG2276 WG2274 WG2066 WG2065
## rs2895914-126_B_R_1514138430 0.0000000 2 0 2 2
## rs9531337-119_T_F_1502299533 0.0952381 2 2 2 2
## rs8125266-126_B_F_1502281523 0.0000000 0 1 2 2
## rs7236498-126_B_R_1502260358 0.0000000 1 1 1 1
## rs1593933-127_B_R_1501677465 0.0000000 0 0 0 1
Mapping of the SNP probes is can be retrieved by typing snps$map
. Here, the original object has been modified to fit the requirements of MEAL objects. Snps annotation data.frame must contain a column called position
and a column called chromosome
with the name of the chromosome.
head(snps$map)
## Chromosome snp.name position SNP
## rs2895914-126_B_R_1514138430 12 rs2895914 81254953 [T/C]
## rs9531337-119_T_F_1502299533 13 rs9531337 35479506 [A/C]
## rs8125266-126_B_F_1502281523 20 rs8125266 59293703 [T/C]
## rs7236498-126_B_R_1502260358 18 rs7236498 21765775 [T/C]
## rs1593933-127_B_R_1501677465 5 rs1593933 7472441 [T/C]
## rs6845251-127_T_R_1501827315 4 rs6845251 184359014 [A/G]
## chromosome
## rs2895914-126_B_R_1514138430 chr12
## rs9531337-119_T_F_1502299533 chr13
## rs8125266-126_B_F_1502281523 chr20
## rs7236498-126_B_R_1502260358 chr18
## rs1593933-127_B_R_1501677465 chr5
## rs6845251-127_T_R_1501827315 chr4
All in all, this list can be reused for other analyses, just changing the SNP data but maintaining the structure.
Let’s illustrate the use of this package taking the gender of the donnor as the outcome of interest. Methylation and expression analyses will be described including hypothesis testing and visualitzation.
The first analysis will be the methylation analysis. This demonstration will be focused on the analysis workflow while a more extended description of the functions and their arguments can be found at MEAL vignette.
Let’s start the analysis. The first step is to create the MethylationSet
with the betas, the phenotype and the snps.
methSet <- prepareMethylationSet(matrix = betavals, phenotypes = pheno)
table(fData(methSet)$chr)
##
## chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr2
## 43479 22631 26764 22703 11338 13942 14145 20605 25956 5515 23896 32322
## chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX chrY
## 9674 3976 8044 23469 19036 22727 33557 27898 19596 9211 10558 341
As it can be seen, more than 10000 probes are placed in sexual chromosomes and they will be very affected by the gender of the sample. These probes, that we know that will be differentiated and that don’t provide any interesting information, can confound the multitesting analysis by adding a huge ammount of really small p-values that complicate the adjustement. For these reasons, it is very recommended to remove them, specially in this situation that we want to evaluate the effect of gender on methylation.
Rigth now, MEAL doesn’t include a function to automatically perform this filtering. However, it can be easily done with the following steps:
annot <- fData(methSet)
autosomiccpgs <- rownames(annot)[!annot$chr %in% c("chrX", "chrY")]
methSet <- methSet[autosomiccpgs, ]
methSet
## MethylationSet (storageMode: lockedEnvironment)
## assayData: 440484 features, 62 samples
## element names: meth
## protocolData: none
## phenoData
## rowNames: GM02704 GM02706 ... WG1983 (62 total)
## varLabels: gender source
## varMetadata: labelDescription
## featureData
## featureNames: cg13869341 cg14008030 ... cg02366642 (440484
## total)
## fvarLabels: chromosome position ... DHS (17 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: IlluminaHumanMethylation450kanno.ilmn12.hg19
Once the object is filtered, we are now ready to run the analysis. DAPipeline
are the function in MEAL that performs the analysis. It can work with an ExpressionSet
and a MethylationSet
. In both cases, for each feature, it computes the effect of the variables set using the parameter variable_names, and it allows the inclusion of adjustment variables with the parameter covariable_names. In addition, if a MethylationSet
is used, regional analysis will be performed (such as Bumphunter or DMRcate).
In our case, we will set variable_names to gender, but we could use a vector with all the variables to analyse. Finally, probe_method is set to “ls” (lineal regression) to speed up the demonstration. In order to use robust linear regression, the option is “robust”.
methRes <- DAPipeline(set = methSet, variable_names = "gender", probe_method = "ls")
methRes
## class: AnalysisResults
## original class: MethylationSet
## features(251): cg04462931 cg27540865 ... cg08906898 cg20811988
## samples(62): GM02704 GM02706 ... WG1984 WG1983
## variables: gender
## model variables names: gender
## covariables: none
## Number of bumps: 226774
## Number of blocks: NULL
## Number of regions: 0
## Number of differentially methylated probes: 251
The analysis generates an AnalysisResults
objects with all the results. 239 probes has been detected as differentially methylated. To get an overview of the distribution of these probes, a Manhattan plot can be done:
plotEWAS(methRes)
In figure 1, we can see that the differentially methylated probes are distributed all around the genome. A quick look to the QQplot can show us if there are problems in the model of the analysis.
plotQQ(methRes)
In figure 2, most of the points are on the theoretical line. The last points are quite separated but this can be due to the high number of differentially methylated probes. Consequently, no further adjustement seems to be required.
MEAL incorporates a new class MultiDataSet
, designed to handle data from the same samples but from different omic experiments. This class works with eSet
derived classes. In this tutorial, we will show how to create a MultiDataSet
containing our methylation and SNPs data.
The first step is to create an empty MultiDataSet
.
multi <- new("MultiDataSet")
MultiDataSet
contains a generic function add.set
which allows to add any kind of object to a labeled slot. In addition, there are three predifined functions (add.genexp, add.methy, add.snps) which adds a given object to a given slot. Rigth now, we will add methylation to this new MultiDataSet
.
multi <- add.methy(multi, methSet)
add.methy
has the same parameters as add.genexp
or add.snps
: the first parameter is the MultiDataSet
where the new object will be added and the second is the object to be added. The difference between these funcions is that add.methy
requires a MethylationSet
, add.genexp
an ExpressionSet
and add.snps
a SnpSet
. Rigth now, snps data is in the form of a list with a SnpMatrix
and a annotation, so we should convert it to a SnpSet
prior to including in the MultiDataSet
.
SnpSet <- new("SnpSet", call = snps$genotypes)
fData(SnpSet) <- snps$map
multi <- add.snps(multi, SnpSet)
## Warning in add.set(object, snpSet, "snps"): The samples of multiDataSet
## are not the same that samples in the new set. Only common samples will be
## retained.
One of the most interesting features of MEAL is the deep analysis of a region of the genome with DARegionAnalysis
. This function evaluates the effect of a variable on the pattern of methylation of the whole region (using a redundancy analysis). In addition, if SNPs are included, meQTLs are detected and used to adjust the model.
To show these features, we will look for a region enriched in differentially methylated probes. Thus, we will take a glance to the most significant probes:
head(probeResults(methRes)[[1]], 10)
## intercept gendermale sd t P.Value
## cg04462931 0.88654958 -0.32827193 0.07044937 -37.30759 1.526961e-44
## cg27540865 0.01320814 0.30640554 0.18312809 28.03077 3.426455e-37
## cg20926353 0.57644894 -0.38817128 0.09900076 -25.78547 4.273411e-35
## cg03911306 0.93468579 -0.25549244 0.10692549 -25.78332 4.293921e-35
## cg11643285 0.94724727 -0.12214849 0.08543173 -22.57237 8.107946e-32
## cg14095100 0.31521330 -0.19734347 0.08786228 -20.30986 2.788190e-29
## cg08656326 0.33178828 -0.18272313 0.07811972 -19.24051 5.227778e-28
## cg25568337 0.16694882 -0.07327334 0.05241064 -18.22693 9.386919e-27
## cg03691818 0.08622923 -0.05777246 0.09324709 -18.10084 1.354923e-26
## cg09118400 0.79983697 -0.15652139 0.06864569 -16.71849 8.518652e-25
## adj.P.Val chromosome position genes
## cg04462931 6.726017e-39 chr7 64300061
## cg27540865 7.546494e-32 chr1 243053934
## cg20926353 4.728509e-30 chr9 84303358 TLE1;TLE1
## cg03911306 4.728509e-30 chr3 16648294 DAZL
## cg11643285 7.142841e-27 chr3 16411667 RFTN1
## cg14095100 2.046922e-24 chr9 84303413 TLE1;TLE1
## cg08656326 3.289647e-23 chr9 84304414 TLE1
## cg25568337 5.168484e-22 chr6 157098338 ARID1B;ARID1B;ARID1B
## cg03691818 6.631354e-22 chr12 53085038 KRT77
## cg09118400 3.752330e-20 chr7 17402192
## group
## cg04462931
## cg27540865
## cg20926353 5'UTR;1stExon
## cg03911306 TSS1500
## cg11643285 Body
## cg14095100 5'UTR;1stExon
## cg08656326 TSS1500
## cg25568337 TSS1500;TSS1500;TSS1500
## cg03691818 Body
## cg09118400
From the top 10 probes, 2 are placed in chromosome 3 around position 16500000. Therefore, we will analyse a region of 1Mb enclosing these probes.
Before starting the analysis, we will repeat the Manhattan plot but now highlighting the probes of the range. It should be noticed that MEAL uses GRanges
objects to define the genomic ranges:
range <- GRanges(seqnames = Rle("chr3"),
ranges = IRanges(16000000, end = 17000000))
plotEWAS(methRes, range = range)
In figure 3, the most significant cpgs of chromosome 9 are highlighted. There are more highlighted points in chromosome 9 but there are covered by the others.
Now we are ready to perform the analysis. DARegionAnalysis
works very simmilar to DAPipeline
with the difference that a range is needed. DARegionAnalysis
can work with a MethylationSet
/ ExpressionSet
or with a MultiDataSet
. If a MultiDataSet
is used, SNPs are required and they are included in the analysis. This will be our case:
methRegion <- DARegionAnalysis(set = multi, range = range, variable_names = "gender", probe_method = "ls")
## coercing object of mode numeric to SnpMatrix
methRegion
## class: AnalysisRegionResults
## original class: MethylationSet
## features(156): cg03624438 cg20216309 ... cg24159436 cg18422055
## samples(61): WG2275 WG2276 ... GM03073 GM00439
## variables: gender
## model variables names: gender
## covariables: snpPC1, snpPC2, snpPC3, snpPC4, snpPC5, snpPC6
## Number of bumps: 90
## Number of blocks: 0
## Number of regions: 0
## Number of differentially methylated probes: 7
## Relevant snps(17): rs4643692-121_B_R_1502228399
## rs9880307-127_B_R_1502398918 ... rs9825009-127_T_R_1513914912
## rs11710881-126_T_R_1502098290
## Snps Variance: 0.6982974
## Range:
## Chr: chr3 start: 16000000 end: 17000000
## Rsquared: 0.075
## P-value: 0.001
When SNPs are used, first SNPs that are significantly associated to a cpg are selected. On one hand, these SNPs will be used to compute the correlation with the cpgs. On the other, a PCA with the significant SNPs will be calculated and the six first components used as covariables.
The R2 between the cpgs and the SNPs can be retrieved using regionLM and plotted using plotRegionR2. In this example, we will use cpg number 18 in order to show the plot.
regionLM(methRegion)[["cg11610140"]]
## gender rs4643692.121_B_R_1502228399
## 0.5335796 0.1062914
plotRegionR2(methRegion, "cg11610140")
In figure 4, the plot of the R2 shows that gender explains more than 50% of the variability of the cpg while the SNP doesn’t explain almost anything of the variability.
plotRDA(methRegion)
In figure 5, in the RDA plot, male and female samples are clearly separated so gender changes the methylation pattern of the region. The R2 and the p-value quantitave shows this fact.
The analysis of expression can be performed in the same way than methylation. The main difference is that an ExpressionSet
will be used intead of a MethylationSet
. The hypothesis to be tested will be the same, so the variable used will be source again.
In order to assure consistent results, ExpressionSet
passed to DAPipeline
needs to pass some constraints: its fData should contain columns chromosome, start and end. Let’s check out our object:
eset
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 21916 features, 64 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: GM00016 GM00022 ... WG3466 (64 total)
## varLabels: gender
## varMetadata: labelDescription
## featureData
## featureNames: ILMN_1343291 ILMN_1651209 ... ILMN_2415949 (21916
## total)
## fvarLabels: chr start end genes
## fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: GPL6883
fvarLabels(eset)
## [1] "chr" "start" "end" "genes"
As it can be seen, we have this data in our object. However, the column containing the chromosomes is called chr insted of chromosome. In the following lines, we will fix this and we will run the analysis.
fData(eset)$chromosome <- fData(eset)$chr
expRes <- DAPipeline(eset, variable_names = "gender", probe_method = "ls")
expRes
## class: AnalysisResults
## original class: ExpressionSet
## features(50): ILMN_2382974 ILMN_1789464 ... ILMN_1802923
## ILMN_1775937
## samples(64): GM00016 GM00022 ... WG3465 WG3466
## variables: gender
## model variables names: gender
## covariables: none
## Number of bumps: NULL
## Number of blocks: NULL
## Number of regions: NULL
## Number of differentially expressed probes: 0
plotEWAS(expRes)
In figure 6, there are no significant probes after multiple testing. Let’s take a look at the QQplot.
plotQQ(expRes)
In figure 7, the QQplot shows a great deflation. This situation ussualy happens when there are hidden variables (such as batch) that we are not taking into account. Given that we don’t have any other variables, we could try surrogate variables analysis (SVA). SVA tries to infer these hidden variables, which we can include in our model. DAPipeline
performs this process automatically only by setting sva = TRUE
.
expResSVA <- DAPipeline(eset, variable_names = "gender", probe_method = "ls", sva = TRUE)
## Number of significant surrogate variables is: 12
## Iteration (out of 5 ):1 2 3 4 5
expResSVA
## class: AnalysisResults
## original class: ExpressionSet
## features(50): ILMN_2305225 ILMN_1759495 ... ILMN_1700975
## ILMN_1734194
## samples(64): GM00016 GM00022 ... WG3465 WG3466
## variables: gender
## model variables names: gender
## covariables: none
## Number of bumps: NULL
## Number of blocks: NULL
## Number of regions: NULL
## Number of differentially expressed probes: 1
plotEWAS(expResSVA)
In figure 8, we can see that an expression probe is signicant (after bonferroni correction).
plotQQ(expResSVA)
In figure 9, the QQ-plot show a bit of inflation but the points are close to the theoretical values. Consequently, SVA has improved the analysis and has allowed to detect a differentially expressed probe.
plotVolcano(expResSVA)
head(probeResults(expResSVA)[[1]])
## intercept gendermale sd t P.Value
## ILMN_2305225 9.114120 -0.52570929 0.09640841 -5.452940 1.364261e-06
## ILMN_1759495 9.012309 -0.08977251 0.01873313 -4.792178 1.403132e-05
## ILMN_1789464 7.349925 0.04843598 0.01045058 4.634766 2.412246e-05
## ILMN_1796216 8.854934 -0.13788617 0.03019995 -4.565774 3.052922e-05
## ILMN_1804743 7.685935 0.05865438 0.01325459 4.425213 4.913782e-05
## ILMN_1720844 8.311187 -0.17429851 0.04046218 -4.307690 7.283973e-05
## adj.P.Val chr start end genes chromosome
## ILMN_2305225 0.02989915 chr16 57104605 57104654 NDRG4 chr16
## ILMN_1759495 0.15375516 chr6 43598330 43598379 XPO5 chr6
## ILMN_1789464 0.16726960 chr5 79443077 79443126 SERINC5 chr5
## ILMN_1796216 0.16726960 chr14 76318923 76318972 VASH1 chr14
## ILMN_1804743 0.21538091 chr17 55655692 55658193 USP32 chr17
## ILMN_1720844 0.25095186 chr1 84882502 84882551 SSX2IP chr1
In figure 10, a Volcano plot of the results is shown. Given that none of the points has a log fold change bigger than 1, none of the points is labeled.
It should be noticed that although an ExpressionSet
can be the input for DAPipeline
and a AnalysisResults
object is generated, not all functions will be available. The two functions that might not work properly are plotRegionR2
and plotRegion
.
In this last step, the correlation between expression and methylation will be studied. There are two functions in MEAL
that can compute the relationship between this two omics data. correlationMethExprs
makes pairs of cpg-expression probe and test if they are correlated. On the other hand, multiCorrMethExprs
test if, in a chromosomic region, the methylation pattern can explain the expression pattern.
Both functions requires a MultiDataSet
, so let’s create a new MultiDataSet
with expression and methylation data.
multi2 <- new("MultiDataSet")
multi2 <- add.genexp(multi2, eset)
multi2 <- add.methy(multi2, methSet)
## Warning in add.set(object, methySet, "methylation"): The samples of
## multiDataSet are not the same that samples in the new set. Only common
## samples will be retained.
It should be noticed that add.genexp
share the same constraints for ExpressionSet
, so only objects with fData containing chromosomes, start and end columns will be accepted. Another interesting feature is that MultiDataSet
adding functions ensure that sample names are the same for all sets. To do so, when a new set is added, samples of the new set and samples of the multiset are filtered to only contain the common samples.
The function correlationMethExprs
performs a linear regression between the methylation and the expression values. First, cpgs are paired to expression probes if the expression probe is in a range of \(\pm\) 250kb from the cpg (this parameter is called flank). After that, for each pair, the following regression is performed: \[ e_{ij} = \beta_i m_{ij} + \epsilon \] where \(e_{ij}\) is the expression value of individual j on pair i expression probe, \(m_{ij}\) the methylation value of individual j on pair i methylation probe and \(\beta_i\) the change of expression produced by the methylation on pair i.
In addition, the effect of other variables can be substracted for the methylation or the expression data. To do this, a linear regression of the data versus the variables is performed: \[ x_{ij} = \sum_{k = 1}^p{\beta_{ik} z_{jk}} + r_{ij} \] where \(x_{ij}\) is the methylation value at cpg i for individual j, \(\beta_{ik}\) is the effect of variable z at cpg, \(z_{jk}\) is the value of variable z for individual j and \(r_{ij}\) is the residual value at cpg i for individual j. Then, the residuals are computed: \[ r_{ij} = x_{ij} - \sum_{k = 1}^p{\beta_{ik} z_{jk}} \] These residuals will be used as the methylation values in the first equation. The same equations can be applied with expression data.
This process is a bit computionally expensive. For this reason, we will use a MultiDataSet
with a smaller methylation set with only the differentially methylated probes. Given that cpgs are ordered in the AnalysisResults
, we will take the top 20 differentially methylated probes. Using add.methy
in a MultiDataSet
containing methylation, the methylation data is substituted.
topcpgs <- feats(methRes)[1:20]
minimeth <- methSet[topcpgs, ]
multi3 <- add.methy(multi2, minimeth)
## Warning in add.set(object, methySet, "methylation"): The samples of
## multiDataSet are not the same that samples in the new set. Only common
## samples will be retained.
## Warning in add.set(object, methySet, "methylation"): Slot 'methylation' is
## already set in 'MultiDataSet'. Previous content will be overwritten.
methExprs <- correlationMethExprs(multi3)
## Calculating cpg-expression probe pairs
## Computing residuals
## Computing correlation Methylation-Expression
head(methExprs)
## cpg exprs Beta se P.Value adj.P.Val
## 1 cg27540865 ILMN_1660840 0.09878267 0.1438691 0.49515892 0.9509347
## 2 cg03691818 ILMN_1663447 -0.31654861 0.5708385 0.58142236 0.9509347
## 3 cg23778841 ILMN_1671742 -1.91368783 0.8485435 0.02804386 0.9509347
## 4 cg03691818 ILMN_1674250 -0.12242894 0.3048949 0.68954736 0.9509347
## 5 cg11643285 ILMN_1679912 0.48294784 0.4015975 0.23420407 0.9509347
## 6 cg23778841 ILMN_1705515 -0.94025857 0.9759049 0.33945209 0.9509347
The results shows the cpg name, the expression probe, the change of the relationship and se, and the p-value and adjusted p-value (using B&H). However, in our data there are no correlated cpgs-expression probes.
The function multiCorrMethExprs
performs a redundancy analysis (RDA) between the methylation and the expression values. This analysis gives us an estimate of the proportion of variance of expression that methylation is able to explain. In our example, we will use the same range used in methylation region analysis.
methExprsRDA <- multiCorrMethExprs(multiset = multi2, range = range)
methExprsRDA
## Call: rda(X = exprsres, Y = methpc)
##
## Inertia Proportion Rank
## Total 0.4844 1.0000
## Constrained 0.2631 0.5432 5
## Unconstrained 0.2213 0.4568 5
## Inertia is variance
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3 RDA4 RDA5
## 0.14854 0.08426 0.02401 0.00528 0.00105
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5
## 0.14801 0.04747 0.02018 0.00441 0.00122
multiCorrMethExprs
returns a RDA object from vegan
package. In the next following lines, some useful functions related to this object will be explained while a more deep explanation can be found in the documentation of vegan
package.
rda objects can directly generate a plot:
library(vegan)
## Loading required package: permute
## This is vegan 2.3-2
plot(methExprsRDA)
In figure 11, blue arrows are methylation probes, black labels expression probes and red points the samples. The closer the points the more simmilar they are. So, expression probe WG1977 and cg14199629 are very simmilar.
The following code gives us quantitative information about the global relathionship. The first code, performs an anove and returns a p.value. In this case, the relathionship is not significant. The other code gives us an estimate of the R square. Again, after taking into account the high number of variables used, the value is very small (0.0903).
anova.cca(methExprsRDA)
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(X = exprsres, Y = methpc)
## Df Variance F Pr(>F)
## Model 23 0.26314 1.7579 0.019 *
## Residual 34 0.22128
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RsquareAdj(methExprsRDA)
## $r.squared
## [1] 0.5432093
##
## $adj.r.squared
## [1] 0.2342038
All in all, this analysis shows us that in our region, the expression and methylation levels are not correlated.