--- title: "curatedTCGAData" date: "`r BiocStyle::doc_date()`" vignette: | %\VignetteIndexEntry{curatedTCGAData} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document: toc_float: true Package: curatedTCGAData --- # Installation ```{r, eval=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("curatedTCGAData") ``` Load packages: ```{r,include=TRUE,results="hide",message=FALSE,warning=FALSE} library(curatedTCGAData) library(MultiAssayExperiment) library(TCGAutils) ``` # Downloading datasets Checking available cancer codes and assays in TCGA data: ```{r} curatedTCGAData(diseaseCode = "*", assays = "*", dry.run = TRUE) ``` Check potential files to be downloaded: ```{r} curatedTCGAData(diseaseCode = "COAD", assays = "RPPA*", dry.run = TRUE) ``` # Caveats for working with TCGA data Not all TCGA samples are cancer, there are a mix of samples in each of the 33 cancer types. Use `sampleTables` on the `MultiAssayExperiment` object along with `data(sampleTypes, package = "TCGAutils")` to see what samples are present in the data. There may be tumors that were used to create multiple contributions leading to technical replicates. These should be resolved using the appropriate helper functions such as `mergeReplicates`. Primary tumors should be selected using `TCGAutils::TCGAsampleSelect` and used as input to the subsetting mechanisms. See the "Samples in Assays" section of this vignette. ## ACC dataset example ```{r, message=FALSE} (accmae <- curatedTCGAData("ACC", c("CN*", "Mutation"), FALSE)) ``` **Note**. For more on how to use a `MultiAssayExperiment` please see the `MultiAssayExperiment` vignette. ### Subtype information Some cancer datasets contain associated subtype information within the clinical datasets provided. This subtype information is included in the metadata of `colData` of the `MultiAssayExperiment` object. To obtain these variable names, use the `getSubtypeMap` function from TCGA utils: ```{r} head(getSubtypeMap(accmae)) ``` ### Typical clinical variables Another helper function provided by TCGAutils allows users to obtain a set of consistent clinical variable names across several cancer types. Use the `getClinicalNames` function to obtain a character vector of common clinical variables such as vital status, years to birth, days to death, etc. ```{r} head(getClinicalNames("ACC")) colData(accmae)[, getClinicalNames("ACC")][1:5, 1:5] ``` ### Samples in Assays The `sampleTables` function gives an overview of sample types / codes present in the data: ```{r} sampleTables(accmae) ``` Often, an analysis is performed comparing two groups of samples to each other. To facilitate the separation of samples, the `splitAssays` TCGAutils function identifies all sample types in the assays and moves each into its own assay. By default, all discoverable sample types are separated into a separate experiment. In this case we requested only solid tumors and blood derived normal samples as seen in the `sampleTypes` reference dataset: ```{r} sampleTypes[sampleTypes[["Code"]] %in% c("01", "10"), ] splitAssays(accmae, c("01", "10")) ``` To obtain a logical vector that could be used for subsetting a `MultiAsssayExperiment`, refer to `TCGAsampleSelect`. To select only primary tumors, use the function on the colnames of the `MultiAssayExperiment`: ```{r} tums <- TCGAsampleSelect(colnames(accmae), "01") ``` You can subsequently provide this input to the subsetting function to select only primary tumors: ```{r} accmae[, tums, ] ``` ## Exporting Data MultiAssayExperiment provides users with an integrative representation of multi-omic TCGA data at the convenience of the user. For those users who wish to use alternative environments, we have provided an export function to extract all the data from a MultiAssayExperiment instance and write them to a series of files: ```{r} td <- tempdir() tempd <- file.path(td, "ACCMAE") if (!dir.exists(tempd)) dir.create(tempd) exportClass(accmae, dir = tempd, fmt = "csv", ext = ".csv") ``` This works for all data classes stored (e.g., `RaggedExperiment`, `HDF5Matrix`, `SummarizedExperiment`) in the `MultiAssayExperiment` via the `assays` method which converts classes to `matrix` format (using individual `assay` methods). # Session Information ```{r} sessionInfo() ```