--- title: "TCGAbiolinks: Searching, downloading and visualizing mutation files" date: "`r BiocStyle::doc_date()`" vignette: > %\VignetteIndexEntry{"5. Mutation data"} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) knitr::opts_knit$set(progress = FALSE) ``` ```{r message=FALSE, warning=FALSE, include=FALSE} library(TCGAbiolinks) library(SummarizedExperiment) library(dplyr) library(DT) ``` # Search and Download **TCGAbiolinks** has provided a few functions to download mutation data from GDC. There are two options to download the data: 1. Use `GDCquery`, `GDCdownload` and `GDCpreprare` to download MAF aligned against hg38 2. Use `GDCquery`, `GDCdownload` and `GDCpreprare` to download MAF aligned against hg19 3. Use `getMC3MAF()`, to download MC3 MAF from https://gdc.cancer.gov/about-data/publications/mc3-2017 ## Mutation data (hg38) This example will download Aggregate GDC MAFs. For more information please access https://github.com/NCI-GDC/gdc-maf-tool and [GDC docs](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/). ```{r results = 'hide', echo=TRUE, message=FALSE, warning=FALSE,eval=F} query <- GDCquery( project = "TCGA-CHOL", data.category = "Simple Nucleotide Variation", access = "open", data.type = "Masked Somatic Mutation", workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking" ) GDCdownload(query) maf <- GDCprepare(query) ``` ```{r results = 'hide', echo=TRUE, message=FALSE, warning=FALSE,eval=T,include=F} maf <- chol_maf@data ``` ```{r echo = TRUE, message = FALSE, warning = FALSE} # Only first 50 to make render faster datatable(maf[1:20,], filter = 'top', options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), rownames = FALSE) ``` ## Mutation data MC3 file This will download the MC3 MAF file from https://gdc.cancer.gov/about-data/publications/mc3-2017, and add project each sample belongs. ```{r results = 'hide', echo=TRUE, message=FALSE, warning=FALSE,eval=FALSE} maf <- getMC3MAF() ``` # Visualize the data To visualize the data you can use the Bioconductor package [maftools](https://bioconductor.org/packages/release/bioc/html/maftools.html). For more information, please check its [vignette](https://bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/maftools.html#rainfall-plots). ```{r results = "hide",echo = TRUE, message = FALSE, warning = FALSE, eval=FALSE} library(maftools) library(dplyr) query <- GDCquery( project = "TCGA-CHOL", data.category = "Simple Nucleotide Variation", access = "open", data.type = "Masked Somatic Mutation", workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking" ) GDCdownload(query) maf <- GDCprepare(query) maf <- maf %>% maftools::read.maf ``` ```{r message=FALSE, warning=FALSE, include=FALSE} library(maftools) library(dplyr) maf <- chol_maf ``` ```{r results = "hide",echo = TRUE, message = FALSE, warning = FALSE} datatable(getSampleSummary(maf), filter = 'top', options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), rownames = FALSE) plotmafSummary(maf = maf, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE) ``` ```{r echo = TRUE, message = FALSE,eval = FALSE, warning = FALSE} oncoplot(maf = maf, top = 10, removeNonMutated = TRUE) titv = titv(maf = maf, plot = FALSE, useSyn = TRUE) #plot titv summary plotTiTv(res = titv) ```