--- title: "Shearwater ML" author: "Inigo Martincorena and Moritz Gerstung" date: "6 March 2015" output: html_document --- # Shearwater ML #### Inigo Martincorena and Moritz Gerstung 6 March 2015 ShearwaterML is a maximum-likelihood adaptation of the original Shearwater algorithm. Unlike the original algorithm, ShearwaterML does not use prior information and yields p-values, instead of Bayes factors, using a Likelihood-Ratio Test. This allows using standard multiple testing correction methods to obtain a list of significant variants with a controlled false discovery rate. For a detailed description of the algorithm see: Martincorena I, Roshan A, Gerstung M, et al. (2015). High burden and pervasive positive selection of somatic mutations in normal human skin. _Science_ (2015). Load data from `deepSNV` example ```{r} library(deepSNV) regions <- GRanges("B.FR.83.HXB2_LAI_IIIB_BRU_K034", IRanges(start = 3120, end=3140)) files <- c(system.file("extdata", "test.bam", package="deepSNV"), system.file("extdata", "control.bam", package="deepSNV")) counts <- loadAllData(files, regions, q=30) ``` ShearwaterML: "betabinLRT" calculates p-values for each possible mutation ```{r} pvals <- betabinLRT(counts, rho=1e-4, maxtruncate = 1)$pvals qvals <- p.adjust(pvals, method="BH") dim(qvals) = dim(pvals) vcfML = qvals2Vcf(qvals, counts, regions, samples = files, mvcf = TRUE) ``` Original Shearwater: "bbb" computes the original Bayes factors ```{r} bf <- bbb(counts, model = "OR", rho=1e-4) vcfBF <- bf2Vcf(bf, counts, regions, samples = files, prior = 0.5, mvcf = TRUE) plot(pvals[1,,], bf[1,,]/(1+bf[1,,]), log="xy") ```