Introduction

consensusOV is a package for molecular subtyping for ovarian cancer. It is intended for whole-transcriptome gene expression datasets from patients with high-grade serous ovarian carcinoma. This package includes implementations of four previously published subtype classifiers [@konecny2014prognostic; @verhaak2013prognostically; @bentink2012angiogenic; @helland2011deregulation] and a consensus random forest classifier.

The get.subtypes() function is a wrapper for the other package subtyping functions get.consensus.subtypes(), get.konecny.subtypes(), get.verhaak.subtypes(), get.bentink.subtypes(), get.helland.subtypes(). It can take as input either a matrix of gene expression values and a vector of Entrez IDs; or a BioBase::ExpressionSet following the format of MetaGxOvarian [@gendoo2016metagxdata]. If expression.dataset is a matrix, it should be formatted with genes as rows and patients as columns; and entrez.ids should be a vector with length the same as nrow(expression.dataset). The method argument specifies which of the five subtype classifiers to use.

Load Data

library(consensusOV)
library(Biobase)
library(genefu)

The package contains a subset of the ovarian cancer microarray dataset GSE14764 as example data.

data(GSE14764.eset)
dim(GSE14764.eset)
## Features  Samples 
##     1175       41
GSE14764.expression.matrix <- exprs(GSE14764.eset)
GSE14764.expression.matrix[1:5,1:5]
##              GSE14764_GSM368661 GSE14764_GSM368662 GSE14764_GSM368664
## geneid.10397          10.856712          10.445412          11.976560
## geneid.65108          10.856441          10.312760          12.499419
## geneid.8655           11.518799          11.897707          11.782895
## geneid.22919           8.608944           8.756986           9.170513
## geneid.3925            7.658680           6.698586           7.159795
##              GSE14764_GSM368665 GSE14764_GSM368668
## geneid.10397          11.651318          10.907453
## geneid.65108          11.377340          11.088542
## geneid.8655           11.799197          11.958500
## geneid.22919           8.627511           8.849757
## geneid.3925            7.466107           6.566558
GSE14764.entrez.ids <- fData(GSE14764.eset)$EntrezGene.ID
head(GSE14764.entrez.ids)
## [1] "10397" "65108" "8655"  "22919" "3925"  "1718"

Subtyping

bentink.subtypes <- get.subtypes(GSE14764.eset, method = "Bentink")
bentink.subtypes$Bentink.subtypes
##  [1] Angiogenic    nonAngiogenic nonAngiogenic Angiogenic    Angiogenic   
##  [6] nonAngiogenic Angiogenic    nonAngiogenic nonAngiogenic Angiogenic   
## [11] nonAngiogenic nonAngiogenic Angiogenic    nonAngiogenic nonAngiogenic
## [16] nonAngiogenic nonAngiogenic Angiogenic    nonAngiogenic nonAngiogenic
## [21] Angiogenic    Angiogenic    Angiogenic    nonAngiogenic nonAngiogenic
## [26] Angiogenic    nonAngiogenic nonAngiogenic nonAngiogenic nonAngiogenic
## [31] nonAngiogenic Angiogenic    nonAngiogenic nonAngiogenic nonAngiogenic
## [36] nonAngiogenic nonAngiogenic Angiogenic    nonAngiogenic nonAngiogenic
## [41] nonAngiogenic
## Levels: Angiogenic nonAngiogenic
konecny.subtypes <- get.subtypes(GSE14764.eset, method = "Konecny")
konecny.subtypes$Konecny.subtypes
##  [1] C3_profL C1_immL  C2_diffL C4_mescL C1_immL  C1_immL  C4_mescL
##  [8] C3_profL C3_profL C1_immL  C2_diffL C2_diffL C4_mescL C2_diffL
## [15] C3_profL C2_diffL C1_immL  C4_mescL C1_immL  C2_diffL C4_mescL
## [22] C4_mescL C4_mescL C1_immL  C3_profL C3_profL C2_diffL C2_diffL
## [29] C3_profL C2_diffL C3_profL C1_immL  C1_immL  C2_diffL C1_immL 
## [36] C2_diffL C3_profL C3_profL C2_diffL C3_profL C2_diffL
## Levels: C1_immL C2_diffL C3_profL C4_mescL
helland.subtypes <- get.subtypes(GSE14764.eset, method = "Helland")
helland.subtypes$Helland.subtypes
##  [1] C1 C2 C5 C1 C1 C2 C4 C1 C5 C2 C4 C4 C1 C5 C4 C4 C2 C1 C4 C5 C1 C1 C1
## [24] C2 C5 C1 C5 C2 C5 C4 C5 C2 C2 C4 C2 C4 C1 C1 C4 C1 C4
## Levels: C2 C4 C5 C1
verhaak.subtypes <- get.subtypes(GSE14764.eset, method = "Verhaak")
verhaak.subtypes$Verhaak.subtypes
##  [1] MES IMR DIF DIF IMR IMR DIF MES PRO IMR DIF DIF MES DIF DIF DIF DIF
## [18] MES DIF PRO MES DIF DIF DIF MES MES DIF IMR MES DIF PRO IMR IMR DIF
## [35] IMR DIF PRO MES DIF MES DIF
## Levels: IMR DIF PRO MES
consensus.subtypes <- get.subtypes(GSE14764.eset, method = "consensusOV")
## Loading training data
## Training Random Forest...
consensus.subtypes$consensusOV.subtypes
##  [1] MES_consensus IMR_consensus PRO_consensus DIF_consensus IMR_consensus
##  [6] IMR_consensus MES_consensus MES_consensus PRO_consensus IMR_consensus
## [11] DIF_consensus DIF_consensus MES_consensus PRO_consensus DIF_consensus
## [16] DIF_consensus IMR_consensus MES_consensus DIF_consensus PRO_consensus
## [21] MES_consensus MES_consensus MES_consensus DIF_consensus PRO_consensus
## [26] MES_consensus DIF_consensus IMR_consensus PRO_consensus DIF_consensus
## [31] PRO_consensus IMR_consensus IMR_consensus DIF_consensus IMR_consensus
## [36] DIF_consensus PRO_consensus PRO_consensus DIF_consensus MES_consensus
## [41] DIF_consensus
## Levels: IMR_consensus DIF_consensus PRO_consensus MES_consensus
## Alternatively, e.g.
bentink.subtypes <- get.subtypes(GSE14764.expression.matrix, GSE14764.entrez.ids, method = "Bentink")

Each subtyping function outputs a list with two values. The first value is a factor of subtype labels. The second is an classifier-specific values. For the Konecny, Helland, Verhaak, and Consensus classifiers, this object is a dataframe with subtype specific scores. For the Bentink classifier, the object is the output of the genefu function call.

Subtype classifiers can alternatively be called using inner functions.

bentink.subtypes <- get.bentink.subtypes(GSE14764.expression.matrix, GSE14764.entrez.ids)

Subtype Scores

The Konecny, Helland, Verhaak, and Consensus classifiers produce real-valued subtype scores. We can use these in various ways - for example, here, we compute correlations between correspinding subtype scores.

We can compare the subtype scores between the Verhaak and Helland classifiers:

vest <- verhaak.subtypes$gsva.out
vest <- vest[,c("IMR", "DIF", "PRO", "MES")]
hest <- helland.subtypes$subtype.scores
hest <- hest[, c("C2", "C4", "C5", "C1")]
dat <- data.frame(
    as.vector(vest), 
    rep(colnames(vest), each=nrow(vest)),
    as.vector(hest), 
    rep(colnames(hest), each=nrow(hest)))
colnames(dat) <- c("Verhaak", "vsc", "Helland", "hsc")
## plot
library(ggplot2)
## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
ggplot(dat, aes(Verhaak, Helland)) + geom_point() + facet_wrap(vsc~hsc, nrow = 2, ncol = 2)

plot of chunk verhaak_helland

Corresponding correlation values are 0.77, 0.79, 0.37, and 0.88.

Likewise, we can compare the subtype scores between the Konecny and Helland classifier:

kost <- konecny.subtypes$spearman.cc.vals
hest <- helland.subtypes$subtype.scores
hest <- hest[, c("C2", "C4", "C5", "C1")]
dat <- data.frame(
    as.vector(kost), 
    rep(colnames(kost), each=nrow(kost)),
    as.vector(hest), 
    rep(colnames(hest), each=nrow(hest)))
colnames(dat) <- c("Konecny", "ksc", "Helland", "hsc")
## plot
ggplot(dat, aes(Konecny, Helland)) + geom_point() + facet_wrap(ksc~hsc, nrow = 2, ncol = 2)

plot of chunk konecny_helland

Corresponding correlation values are 0.95, 0.84, 0.7, and 0.95.

References