---
title: "stageR: stage-wise analysis of high-throughput gene expression data in R"
author: "Koen Van den Berge and Lieven Clement"
date: "`r Sys.Date()`"
bibliography: stageR.bib
output:
  BiocStyle::html_document:
    toc: true
vignette: >
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{stageR: stage-wise analysis of high-throughput gene expression data in R}
---
  
  This vignette describes how to use the stageR package that has been developed for stage-wise analysis of high throughput gene expression data in R. A stage-wise analysis was shown to be beneficial in terms of biological interpretation and statistical performance when multiple hypotheses per gene are of interest.
  The stage-wise analysis has been adopted from [@Heller2009] and consists of a screening stage and a confirmation stage. In the screening stage, genes are screened by calculating p-values that aggregate evidence across the different hypotheses of interest for the gene. The screening p-values are then adjusted for FDR control after which significance of the screening hypothesis is assessed.
  In the confirmation stage, only genes passing the screening stage are considered for analysis. For those genes, every hypothesis of interest is assessed separately and multiple testing correction is performed across hypotheses within a gene to control the FWER on the BH-adjusted significance level of the screening stage.
  `stageR` provides an automated way to perform stage-wise testing, given p-values for the screening and confirmation stages. A number of FWER control procedures that take into account the logical relations among the hypotheses are implemented. Since the logical relations may be specific to the experiment, the user can also specify an adjustment deemed appropriate.
  
  The vignette analyses two datasets. The Hammer dataset [@Hammer2010] is a differential gene expression analysis for an experiment with a complex design. This type of analyses are supported by the `stageR` class. The Ren dataset [@Ren2012] analyses differential transcript usage (DTU) in tumoral versus normal tissue in Chinese patients. Transcript-level analyses are supported by the `stageRTx` class. 
  
# Installing and loading the package
  
The release version of the package is hosted on Bioconductor, and can be installed with the following code

```{r}
#if (!requireNamespace("BiocManager", quietly=TRUE))
#    install.packages("BiocManager")
#BiocManager::install("stageR")
```
  
The development version of the package is hosted on GitHub and can be installed with the `devtools` library using `devtools::install_github("statOmics/stageR")`.

After installing, we will load the package.
  
```{r}
library(stageR)
```


# Differential gene expression: Hammer dataset

```{r,echo=TRUE,warning=FALSE}
library(edgeR) ; library(Biobase) ; library(limma) ; library(utils) ; library(DEXSeq)
```

As a case study for differential gene expression analysis, we analyse the Hammer dataset [@Hammer2010]. The dataset is provided with the stageR package and was originally downloaded from the ReCount project [website](http://bowtie-bio.sourceforge.net/recount) [@Frazee2011].

```{r}
data(hammer.eset, package="stageR")
eset <- hammer.eset ; rm(hammer.eset)
```

The Hammer experiment investigated the effect of a spinal nerve ligation (SNL) versus control samples in rats at two weeks and two months after treatment. For every time $\times$ treatment combination, 2 biological replicates were used. The hypotheses of interest are

- the treatment effect at the first timepoint,
- the treatment effect at the second timepoint and
- assessing whether the effect of the treatment is different between the two timepoints (i.e. the treatment-time interaction) 

We use a contrast for the differential expression at the first and second timepoint and a difference in fold change between the two timepoints, respectively. 
Therefore we create a design matrix consisting of two timepoints, two treatments and two biological replicates in every treatment $\times$ time combination. Note there has been a typo in the phenoData, so we will correct this first.

```{r}
pData(eset)$Time #typo. Will do it ourself
time <- factor(rep(c("mo2","w2"),each=4),levels=c("w2","mo2"))
pData(eset)$protocol
treat <- factor(c("control","control","SNL","SNL","control","control","SNL","SNL"),levels=c("control","SNL"))
design <- model.matrix(~time*treat)
rownames(design) = paste0(time,treat,rep(1:2,4))
colnames(design)[4] = "timeMo2xTreatSNL"
design
```

We perform indpendent filtering [@Bourgon2010] of the genes and retain genes that are expressed with at least 2 counts per million in 2 samples.
The data is then normalised with TMM normalisation [@Robinson2010] to correct for differences in sequencing depth and RNA population between the samples.

```{r}
cpmOffset <- 2
keep <- rowSums(cpm(exprs(eset))>cpmOffset)>=2 #2cpm in 2 samples
dge <- DGEList(exprs(eset)[keep,])
colnames(dge) = rownames(design)
dge <- calcNormFactors(dge)
```

## Conventional analysis

We will first analyse the data with limma-voom [@Law2014] in a standard way: the three contrasts are assessed separately on an FDR level of $5\%$. 

```{r}
## regular analysis
voomObj <- voom(dge,design,plot=TRUE)
fit <- lmFit(voomObj,design)
contrast.matrix <- makeContrasts(treatSNL, treatSNL+timeMo2xTreatSNL, timeMo2xTreatSNL, levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
res <- decideTests(fit2)
summary.TestResults(res) #nr of significant up-/downregulated genes
colSums(summary.TestResults(res)[c(1,3),]) #total nr of significant genes
```

The conventional analysis does not find any genes that have a different effect of the treatment between the two timepoints (i.e. the interaction effect test), while many genes are differentially expressed between treatment and control within every timepoint.

To get a global picture of the effect of SNL on the transcriptome, we can check how many genes are significantly altered following SNL.

```{r}
uniqueGenesRegular <- which(res[,1]!=0 | res[,2]!=0 | res[,3]!=0)
length(uniqueGenesRegular) #total nr of significant genes
```

In total, `r length(uniqueGenesRegular)` genes are found to be differentially expressed following a spinal nerve ligation. However, FDR was only controlled at the contrast level and not at the gene level so we cannot state a target FDR level together with this number.

## Stage-wise analysis

The stage-wise analysis first considers an omnibus test that tests whether any of the three contrasts are significant, i.e. it tests whether there has been any effect of the treatment whatsoever.
For the screening hypothesis, we use the `topTableF` function from the `limma` package to perform an F-test across the three contrasts. The screening hypothesis p-values are then stored in the vector `pScreen`.

```{r}
alpha <- 0.05
nGenes <- nrow(dge)
tableF <- topTableF(fit2, number=nGenes, sort.by="none") #screening hypothesis
pScreen <- tableF$P.Value
names(pScreen) = rownames(tableF)
```

In the confirmation stage, every contrast is assessed separately. The confirmation stage p-values are adjusted to control the FWER across the hypotheses within a gene and are subsequently corrected to the BH-adjusted significance level of the screening stage. This allows a direct comparison of the adjusted p-values to the provided significance level `alpha` for both screening and confirmation stage adjusted p-values. The function `stageR` constructs an object from the `stageR` class and requires a (preferably named) vector of p-values for the screening hypothesis `pScreen` and a (preferably named) matrix of p-values for the confirmation stage `pConfirmation` with columns corresponding to the different contrasts of interest. Note that the rows in `pConfirmation` correspond to features (genes) and the features should be identically sorted in `pScreen` and `pConfirmation`. The constructor function will check whether the length of `pScreen` is identical to the number of rows in `pConfirmation` and return an error if this is not the case. Finally, the `pScreenAdjusted` argument specifies whether the screening p-values have already been adjusted according to FDR control.

```{r}
pConfirmation <- sapply(1:3,function(i) topTable(fit2, coef=i, number=nGenes, sort.by="none")$P.Value)
dimnames(pConfirmation) <- list(rownames(fit2),c("t1","t2","t1t2"))
stageRObj <- stageR(pScreen=pScreen, pConfirmation=pConfirmation, pScreenAdjusted=FALSE)
```

The function `stageWiseAdjustment` then adjusts the p-values according to a stage-wise analysis. The `method` argument specifies the FWER correction procedure to be used in the confirmation stage. More details on the different methods can be found in the help file for `stageWiseAdjustment`. The `alpha` argument specifies the target OFDR level that is used for controlling the fraction of false positive genes across all rejected genes over the entire stage-wise testing procedure. The adjusted p-values for genes that did not pass the screening stage are by default set to `NA`.

Note that when a gene passed the screening hypothesis in the Hammer experiment, only one null hypothesis can still be true: there has to be DE at timepoint 1 or timepoint 2; if the DE only occurs on one timepoint there also exist an interaction; if DE occurs at both timepoints, the $H_0$ of no interaction can still be true. Thus, according to Shaffer's MSRB procedure [@Shaffer1986], no correction is required in the confirmation stage for this experiment to control the FWER. This can be specified with the `method="none"` argument.


```{r}
stageRObj <- stageWiseAdjustment(object=stageRObj, method="none", alpha=0.05)
```

We can explore the results of the stage-wise analysis by querying the object returned by `stageWiseAdjustment`. **Note that the confirmation stage adjusted p-values returned by the function are only valid for the OFDR level provided. If a different OFDR level is of interest, the stage-wise testing adjustment of p-values should be re-run entirely with the other OFDR level specified in `stageWiseAdjustment`.** The adjusted p-values from the confirmation stage can be accessed with the `getAdjustedPValues` function

```{r}
head(getAdjustedPValues(stageRObj, onlySignificantGenes=FALSE, order=FALSE))
head(getAdjustedPValues(stageRObj, onlySignificantGenes=TRUE, order=TRUE))
```

and may either return all p-values or only those from the significant genes, as specified by the `onlySignificantGenes` argument which can then be ordered or not as specified by the `order` argument. 

Finally, the `getResults` function returns a binary matrix where rows correspond to features and columns correspond to hypotheses, including the screening hypothesis. For every feature $\times$ hypothesis combination, it indicates whether the test is significant (1) or not (0) according to the stage-wise testing procedure.

```{r}
res <- getResults(stageRObj)
head(res)
colSums(res) #stage-wise analysis results
```

The `adjustment` argument from the `stageWiseAdjustment` function allows the user to specify the FWER adjustment correction. It requires a numeric vector of the same length as the number of columns in `pConfirmation`. The first element of the vector is the adjustment for the most significant p-value of the gene, the second element for the second most significant p-value etc. Since the Hammer dataset did not require any adjustment, identical results are obtained when manually specifying the adjustments to equal $1$.

```{r}
stageRObj <- stageR(pScreen=pScreen, pConfirmation=pConfirmation, pScreenAdjusted=FALSE)
adjustedPSW <- stageWiseAdjustment(object=stageRObj, method="user", alpha=0.05, adjustment=c(1,1,1))
res <- getResults(adjustedPSW)
colSums(res)
```

# Differential transcript expression/usage

Multiple hypotheses of interest per gene also arise in transcript-level studies, where the different hypotheses correspond to the different isoforms from a gene.
We analyse differential transcript usage for a case study that investigated expression in prostate cancer tumoral tissue versus normal tissue in 14 Chinese patients [@Ren2012].
The raw sequences have been preprocessed with kallisto [@Bray2016] and transcript-level abundance estimates can be downloaded from The Lair project [@Pimentel2016b] [website](http://pachterlab.github.io/lair/). We used the unnormalized, unfiltered abundances for the analysis.
A subset of the dataset comes with the `stageR` package and can be accessed with `data(esetProstate)` after loading `stageR`. The `ExpressionSet` contains the metadata for the samples in `pData(esetProstate)` and corresponding gene identifiers for the transcripts are stored in `fData(esetProstate)`. The dataset contains 945 transcripts from 456 genes.

```{r}
data("esetProstate", package="stageR") #from stageR package
head(pData(esetProstate))
head(fData(esetProstate))
```

We will perform some basic data exploration on the transcripts in the dataset. Since the dataset was preprocessed for the purposes of this vignette, every gene has at least two transcripts, and all transcripts are expressed in at least 1 sample.


```{r}
tx2gene <- fData(esetProstate)
colnames(tx2gene) <- c("transcript","gene")
barplot(table(table(tx2gene$gene)), main="Distribution of number of tx per gene")

#the dataset contains
length(unique(tx2gene$gene)) #nr genes
median(table(as.character(tx2gene$gene))) #median nr of tx/gene
```

## Conventional analysis

We will show how to use the `stageR` package to analyse DTU with a stage-wise approach. We start with a regular DEXseq analysis to obtain p-values for every transcript and q-values for every gene. Since both control and tumoral tissue are derived from the same patient for all 14 patients, we add a block effect for the patient to account for the correlation between samples within every patient.

```{r}
### regular DEXSeq analysis
sampleData <- pData(esetProstate)
geneForEachTx <- tx2gene[match(rownames(exprs(esetProstate)),tx2gene[,1]),2]
dxd <- DEXSeqDataSet(countData = exprs(esetProstate),
                     sampleData = sampleData,
                     design = ~ sample + exon + patient + condition:exon,
                     featureID = rownames(esetProstate),
                     groupID = as.character(geneForEachTx))
dxd <- estimateSizeFactors(dxd)
dxd <- estimateDispersions(dxd)
dxd <- testForDEU(dxd, reducedModel=~ sample + exon + patient)
dxr <- DEXSeqResults(dxd)
qvalDxr <- perGeneQValue(dxr)
```

## Stage-wise analysis

The code above is a conventional `DEXSeq` analysis for analysing differential transcript usage. It would proceed by either assessing the significant genes according to the gene-wise q-values or by assessing the significant transcripts according to the transcript-level p-values, after adjustment for multiple testing. Performing and interpreting both analyses does not provide appropriate FDR control and thus should be avoided. However, interpretation on the gene level combined with transcript-level results can provide useful biological insights and this can be achieved through stage-wise testing. In the following code, we show how to automatically perform a stage-wise analysis using `stageR`. We start by constructing

- a named vector of gene-wise q-values `pScreen`
- a named matrix with transcript-level p-values `pConfirmation`
- a `data.frame` with transcript identifiers and corresponding gene identifiers `tx2gene`

These three objects provide everything we need to construct an instance from the `stageRTx` class for the stage-wise analysis. Note that a different class and thus a different constructor function is used for transcript-level analyses in comparison to DE analysis for complex designs.

```{r}
pConfirmation <- matrix(dxr$pvalue,ncol=1)
dimnames(pConfirmation) <- list(c(dxr$featureID),c("transcript"))
pScreen <- qvalDxr
tx2gene <- fData(esetProstate)
```

Next we build an object from the `stageRTx` class and indicate that the screening hypothesis p-values were already adjusted by setting `pScreenAdjusted=TRUE`. Similar as in the DGE example, we port this object to the `stageWiseAdjustment` function for correcting the p-values. We control the analysis on a $5\%$ target OFDR (`alpha=0.05`). `method="dtu"` indicates the adapted Holm-Shaffer FWER correction that was specifically tailored for DTU analysis as described in the manuscript. In brief, the Holm procedure [@Holm1979] is used from the third transcript onwards and the two most significant p-values are tested on a $\alpha_I/(n_g-2)$ significance level, with $\alpha_I$ the BH adjusted significance level from the screening stage and $n_g$ the number of transcripts for gene $g$. The method will return `NA` p-values for genes with only one transcript if the stage-wise testing method equals `"dtu"`.

```{r}
stageRObj <- stageRTx(pScreen=pScreen, pConfirmation=pConfirmation, pScreenAdjusted=TRUE, tx2gene=tx2gene)
stageRObj <- stageWiseAdjustment(object=stageRObj, method="dtu", alpha=0.05)
```

We can then explore the results using a range of accessor functions. The significant genes can be returned with the `getSignificantGenes` function.

```{r}
head(getSignificantGenes(stageRObj))
```

Similar, the significant transcripts can be returned with `getSignificantTx`.

```{r}
head(getSignificantTx(stageRObj))
```


The stage-wise adjusted p-values are returned using the `getAdjustedPValues` function. The screening (gene) hypothesis p-values were adjusted according to the BH FDR criterion, and the confirmation (transcript) hypothesis p-values were adjusted to control for the full stage-wise analysis, by adopting the correction method specified in `stageWiseAdjustment`. Hence, the confirmation adjusted p-values returned from this function can be directly compared to the significance level `alpha` as provided in the `stageWiseAdjustment` function. `getAdjustedPValues` returns a matrix where the different rows correspond to transcripts and the respective gene and transcript identifiers are given in the first two columns. Transcript-level adjusted p-values for genes not passing the screening stage are set to `NA` by default. Note, that the stage-wise adjusted p-values are only valid for the provided significance level and must not be compared to a different significance level. If this would be of interest, the entire stage-wise testing adjustment should be re-run with the other significance level provided in `alpha`.

```{r}
padj <- getAdjustedPValues(stageRObj, order=TRUE, onlySignificantGenes=FALSE)
head(padj)
```

The output indeed shows that 2 genes and three transcripts are significant because their adjusted p-values are below the specified `alpha` level of $0.05$. The third gene in the list is not significant and thus the p-value of the transcript is set to `NA`.

### Note on a stage-wise DEXSeq analysis.

By default, DEXSeq performs an independent filtering step. This may result in a number of genes that have been filtered and thus no q-value for these genes is given in the output of `perGeneQValue`. This can cause an error in the stage-wise analysis, since we have confirmation stage p-values for transcripts but no q-value for their respective genes. In order to avoid this, one should filter these transcripts in the `pConfirmation` and `tx2gene` objects.

```{r}
rowsNotFiltered <- tx2gene[,2]%in%names(qvalDxr)
pConfirmation <- matrix(pConfirmation[rowsNotFiltered,],ncol=1,dimnames=list(dxr$featureID[rowsNotFiltered],"transcript"))
tx2gene <- tx2gene[rowsNotFiltered,]
```

After which the stage-wise analysis may proceed.

# References