---
title: "Using `RTCGA` package to download rna-seq data that are included in `RTCGA.rnaseq` package"
subtitle: "Date of datasets release: 2015-08-21"
author: "Marcin Kosiński"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using RTCGA to download rna-seq data as included in RTCGA.data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, echo=FALSE}
library(knitr)
opts_chunk$set(comment="", message=FALSE, warning = FALSE, tidy.opts=list(keep.blank.line=TRUE, width.cutoff=150),options(width=150), eval = FALSE)
```

# RTCGA package

> The Cancer Genome Atlas (TCGA) Data Portal provides a platform for researchers to search, download, and analyze data sets generated by TCGA. It contains clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes. The key is to understand genomics to improve cancer care.

`RTCGA` package offers download and integration of the variety and volume of TCGA data using patient barcode key, what enables easier data possession. This may have an benefcial infuence on impact on development of science and improvement of patients' treatment. `RTCGA` is an open-source R package, available to download from Bioconductor 

```{r, eval=FALSE}
source("http://bioconductor.org/biocLite.R")
biocLite("RTCGA")
```

or from github
```{r, eval=FALSE}
if (!require(devtools)) {
    install.packages("devtools")
    require(devtools)
}
biocLite("MarcinKosinski/RTCGA")
```

Furthermore, `RTCGA` package transforms TCGA data to form which is convenient to use in R statistical package. Those data transformations can be a part of statistical analysis pipeline which can be more reproducible with `RTCGA`.

Use cases and examples are shown in `RTCGA` packages vignettes:
```{r, eval=FALSE}
browseVignettes("RTCGA")
```


# How to download rna-seq data to gain the same datasets as in RTCGA.rnaseq package?

There are many available date times of TCGA data releases. To see them all just type:
```{r, eval=FALSE}
library(RTCGA)
checkTCGA('Dates')
```

Version 1.0.0 of `RTCGA.rnaseq` package contains rna-seq datasets from `2015-08-21`.
There were downloaded as follows:

## Available cohorts

All cohort names can be checked using:
```{r, eval=FALSE}
(cohorts <- infoTCGA() %>% 
   rownames() %>% 
   sub("-counts", "", x=.))
```

For all cohorts the following code downloads the rna-seq data.

## Downloading tarred files
```{r, eval=FALSE}
# dir.create( "data2" )
releaseDate <- "2015-08-21"
sapply( cohorts, function(element){
tryCatch({
downloadTCGA( cancerTypes = element, 
              dataSet = "rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level",
              destDir = "data2/", 
              date = releaseDate )},
error = function(cond){
   cat("Error: Maybe there weren't rnaseq data for ", element, " cancer.\n")
}
)
})
```

## Reading downloaded rna-seq datasets

### Shortening paths and directories 

```{r, eval=FALSE}
list.files( "data2") %>% 
   file.path( "data2", .) %>%
   file.rename( to = substr(.,start=1,stop=50))
```


### Paths to rna-seq data
Below is the code that automatically gives the path to `*rnaseqv2*` files for all cohorts types downloaded to `data2` folder.

```{r, eval=FALSE}
sapply( cohorts, function( element ){
   folder <- grep( paste0( "(_",element,"\\.", "|","_",element,"-FFPE)", ".*rnaseqv2"), 
                   list.files("data2"),value = TRUE)
   file <- grep( paste0(element, ".*rnaseqv2"), list.files( file.path( "data2",folder ) ),
                 value = TRUE)
   path <- file.path( "data2", folder, file )
   assign( value = path, x = paste0(element, ".rnaseq.path"), envir = .GlobalEnv)
}) 
```

### Reading rna-seq data 

Code is below


```{r, eval=FALSE}
sapply( cohorts, function(element){
   tryCatch({
    assign( value = readTCGA(get(paste0(element,".rnaseq.path"),
                                      envir = .GlobalEnv),
                          "rnaseq"),
            x = paste0(element, ".rnaseq"),
            envir = .GlobalEnv )
    },
   error=function(cond){
   cat("Error: Maybe there weren't rnaseq data for ", element, " cancer.\n")
})
})
```



# Saving rna-seq data to `RTCGA.rnaseq` package


```{r, eval=FALSE}
grep( "rnaseq", ls(), value = TRUE)[ -c(grep("path", grep( "rnaseq", ls(), value = TRUE)))] %>%
   cat( sep="," ) #can one to id better? as from use_data documentation:
   # ...	Unquoted names of existing objects to save
   devtools::use_data(ACC.rnaseq,BLCA.rnaseq,BRCA.rnaseq,CESC.rnaseq,CHOL.rnaseq,COAD.rnaseq,DLBC.rnaseq,ESCA.rnaseq,GBM.rnaseq,HNSC.rnaseq,KICH.rnaseq,KIPAN.rnaseq,KIRC.rnaseq,KIRP.rnaseq,LAML.rnaseq,LGG.rnaseq,LIHC.rnaseq,LUAD.rnaseq,LUSC.rnaseq,MESO.rnaseq,OV.rnaseq,PAAD.rnaseq,PCPG.rnaseq,PRAD.rnaseq,READ.rnaseq,SKCM.rnaseq,STES.rnaseq,TGCT.rnaseq,THCA.rnaseq,THYM.rnaseq,UCEC.rnaseq,UCS.rnaseq,UVM.rnaseq, overwrite = TRUE, compress = "xz")
```