--- title: "Using `RTCGA` package to download methylation data that are included in `RTCGA.methylation` package" subtitle: "Date of datasets release: 2015-11-01" author: "Marcin KosiƄski" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using RTCGA to download methylation data as included in RTCGA.methylation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo=FALSE} library(knitr) opts_chunk$set(comment="", message=FALSE, warning = FALSE, tidy.opts=list(keep.blank.line=TRUE, width.cutoff=150),options(width=150), eval = FALSE) ``` # RTCGA package > The Cancer Genome Atlas (TCGA) Data Portal provides a platform for researchers to search, download, and analyze data sets generated by TCGA. It contains clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes. The key is to understand genomics to improve cancer care. `RTCGA` package offers download and integration of the variety and volume of TCGA data using patient barcode key, what enables easier data possession. This may have a benefcial infuence on development of science and improvement of patients' treatment. `RTCGA` is an open-source R package, available to download from Bioconductor ```{r, eval=FALSE} if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("RTCGA") ``` or from GitHub ```{r, eval=FALSE} BiocManager::install("RTCGA/RTCGA") ``` Furthermore, `RTCGA` package transforms TCGA data into form which is convenient to use in R statistical package. Those data transformations can be a part of statistical analysis pipeline which can be more reproducible with `RTCGA`. Use cases and examples are shown in `RTCGA` packages vignettes: ```{r, eval=FALSE} browseVignettes("RTCGA") ``` # How to download methylation data to gain the same datasets as in RTCGA.methylation package? There are many available date times of TCGA data releases. To see them all just type: ```{r, eval=FALSE} library(RTCGA) checkTCGA('Dates') ``` Version 1.0 of `RTCGA.methylation` package contains methylation datasets which were released `2015-11-01`. They were downloaded in the following way (which is mainly copied from [http://rtcga.github.io/RTCGA/](http://rtcga.github.io/RTCGA/): ## Available cohorts All cohort names can be checked using: ```{r, eval=FALSE} (cohorts <- infoTCGA() %>% rownames() %>% sub("-counts", "", x=.)) ``` For all cohorts the following code downloads the mRNA data. ## Downloading tarred files ```{r, eval=FALSE} #dir.create( "data2" ) releaseDate <- "2015-11-01" sapply(cohorts, function(element){ try({ downloadTCGA( cancerTypes = element, dataSet = "Merge_methylation__humanmethylation27", destDir = "data2", date = releaseDate ) }) }) ``` ## Reading downloaded methylation dataset ### Shortening paths and directories ```{r, eval=FALSE} list.files( "data2") %>% file.path( "data2", .) %>% file.rename( to = substr(.,start=1,stop=50)) ``` ### Removing `NA` files from data2 The existance of `NA` files mean that there were no methylation data for these cohorts. ```{r, eval=FALSE} list.files( "data2") %>% file.path( "data2", .) %>% sapply(function(x){ if (x == "data2/NA") file.remove(x) }) ``` ### Paths to methylation data Below is the code that removes unneeded "MANIFEST.txt" file from each methylation cohort folder. ```{r} list.files( "data2") %>% file.path( "data2", .) %>% sapply(function(x){ file.path(x, list.files(x)) %>% grep(pattern = "MANIFEST.txt", x = ., value=TRUE) %>% file.remove() }) ``` Below is the code that automatically gives the path to files for all available methylation cohorts types downloaded to `data2` folder. ```{r} list.files("data2") %>% file.path("data2", .) %>% sapply(function(y){ file.path(y, list.files(y)) %>% assign(value = ., x = paste0(list.files(y) %>% gsub(x = ., pattern = "\\..*", replacement = "") %>% gsub(x=., pattern="-", replacement = "_"), ".methylation.path"), envir = .GlobalEnv) }) ``` ### Reading methylation data using `readTCGA` Because of the fact that methylation data are transposed in downloaded files, there has been prepared special function `readTCGA` to read and transpose data automatically. Code is below ```{r, eval=FALSE} ls() %>% grep("methylation\\.path", x = ., value = TRUE) %>% sapply(function(element){ try({ readTCGA(get(element, envir = .GlobalEnv), dataType = "methylation") %>% assign(value = ., x = sub("\\.path", "", x = element), envir = .GlobalEnv ) }) invisible(NULL) }) ``` # Saving methylation data to `RTCGA.methylation` package ```{r, eval=FALSE} OV.methylation[1:300,] -> OV.methylation1 OV.methylation[301:612,] -> OV.methylation2 rm(OV.methylation) grep( "methylation", ls(), value = TRUE) %>% grep("path", x=., value = TRUE, invert = TRUE) %>% cat( sep="," ) #can one to id better? as from use_data documentation: # ... Unquoted names of existing objects to save devtools::use_data(BRCA.methylation,COAD.methylation, COADREAD.methylation,GBMLGG.methylation, GBM.methylation,KIPAN.methylation, KIRC.methylation,KIRP.methylation, LAML.methylation,LUAD.methylation, LUSC.methylation,OV.methylation1,OV.methylation2, READ.methylation,STAD.methylation, STES.methylation,UCEC.methylation, # overwrite = TRUE, compress="xz") ```