--- output: html_document: self_contained: true number_sections: no theme: flatly highlight: tango mathjax: null toc: true toc_float: true toc_depth: 2 css: style.css bibliography: bibliography.bib vignette: > %\VignetteIndexEntry{"3.6 - TCGA.pipe: Running ELMER for TCGA data in a compact way"} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} ---
# TCGA.pipe: Running ELMER for TCGA data in a compact way `TCGA.pipe` is a function for easily downloading TCGA data from GDC using TCGAbiolinks package [@TCGAbiolinks] and performing all the analyses in ELMER. For illustration purpose, we skip the downloading step. The user can use the `getTCGA` function to download TCGA data or use `TCGA.pipe` by including "download" in the analysis option. The following command will do distal DNA methylation analysis and predict putative target genes, motif analysis and identify regulatory transcription factors. ```{r, fig.height = 6, eval = FALSE} TCGA.pipe("LUSC", wd = "./ELMER.example", cores = parallel::detectCores()/2, mode = "unsupervised" permu.size = 300, Pe = 0.01, analysis = c("distal.probes","diffMeth","pair","motif","TF.search"), diff.dir = "hypo", rm.chr = paste0("chr",c("X","Y"))) ```
TCGA.pipe: Mode argument
In this new version we added the argument `mode` in the `TCGA.pipe` function. This will automatically set the `minSubgroupFrac` to the following values: Modes available: - `unsupervised`: * Use 20% of each group to identify differently methylated regions (`minSubgroupFrac` = 0.2 in `get.diff.meth`) * Use 40% of all samples to create Unmethytlated (U) and Methylated (M) groups in the other steps (the lowest quintile of samples is the U group and the highest quintile samples is the M group) (`minSubgroupFrac` = 0.4 in `get.pairs` and `get.TFs` functions) - `supervised`: * Use all samples in all functions and set Unmethytlated (U) and Methylated (M) one of the group selected in the analysis. The `unsupervised` mode should be used when want to be able to detect a specific (possibly unknown) molecular subtype among tumor; these subtypes often make up only a minority of samples, and 20\% was chosen as a lower bound for the purposes of statistical power. If you are using pre-defined group labels, such as treated replicates vs. untreated replicated, use `supervised` mode (all samples), For more information please read the analysis section of the vignette.
# Using mutation data to identify groups We add in `TCGA.pipe` function (download step) the option to identify mutant samples to perform WT vs Mutant analysis. It will download open [MAF file](https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/) from GDC database [@grossman2016toward], select a gene and identify the which are the mutant samples based on the following classification: (it can be changed using the atgument `mutant_variant_classification`).
Mutations classification
| Argument | Description | |------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Frame_Shift_Del | Mutant | | Frame_Shift_Ins | Mutant | | Missense_Mutation | Mutant | | Nonsense_Mutation | Mutant | | Splice_Site | Mutant | | In_Frame_Del | Mutant | | In_Frame_Ins | Mutant | | Translation_Start_Site | Mutant | | Nonstop_Mutation | Mutant | | Silent | WT | |3'UTR| WT | |5'UTR| WT | |3'Flank| WT | |5'Flank| WT | |IGR1 (intergenic region)| WT | |Intron| WT | |RNA| WT | |Target_region| WT |
The arguments to be used are below:
`TCGA.pipe` mutation arguments
| Argument | Description | |------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | genes | List of genes for which mutations will be verified. A column in the MAE with the name of the gene will be created with two groups WT (tumor samples without mutation), MUT (tumor samples w/ mutation), NA (not tumor samples)| | mutant_variant_classification | List of GDC variant classification from MAF files to consider a samples mutant. Only used when argument gene is set.| | group.col | A column defining the groups of the sample. You can view the available columns using: colnames(MultiAssayExperiment::colData(data)).| | group1 | A group from group.col. ELMER will run group1 vs group2. That means, if direction is hyper, get probes hypermethylated in group 1 compared to group 2.| | group2 | A group from group.col. ELMER will run group1 vs group2. That means, if direction is hyper, get probes hypermethylated in group 1 compared to group 2.|
Here is an example we TCGA-LUSC data is downloaded and we will compare TP53 Mutant vs TP53 WT samples. ```{r, fig.height = 6, eval = FALSE} TCGA.pipe("LUSC", wd = "./ELMER.example", cores = parallel::detectCores()/2, mode = "supervised" genes = "TP53", group.col = "TP53", group1 = "Mutant", group2 = "WT", permu.size = 300, Pe = 0.01, analysis = c("download","diffMeth","pair","motif","TF.search"), diff.dir = "hypo", rm.chr = paste0("chr",c("X","Y"))) ``` # Session Info ```{r sessioninfo, eval=TRUE} sessionInfo() ``` # Bibliography