--- title: "The depmap data" author: - name: Theo Killian affiliation: Computational Biology, UCLouvain - name: Laurent Gatto affiliation: Computational Biology, UCLouvain date: "`r Sys.Date()`" output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{depmap} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} --- ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown() ``` ```{r, echo = FALSE} suppressPackageStartupMessages(library("dplyr")) ``` # Introduction The `depmap` package aims to provide a reproducible research framework to cancer dependency data described by [Tsherniak, Aviad, et al. "Defining a cancer dependency map." Cell 170.3 (2017): 564-576.](https://www.ncbi.nlm.nih.gov/pubmed/28753430). The data found in the [depmap](https://bioconductor.org/packages/devel/data/experiment/html/depmap.html) package has been formatted to facilitate the use of common R packages such as `dplyr` and `ggplot2`. We hope that this package will allow researchers to more easily mine, explore and visually illustrate dependency data taken from the Depmap cancer genomic dependency study. # Installation instructions To install [depmap](https://bioconductor.org/packages/devel/data/experiment/html/depmap.html), the [BiocManager](https://cran.r-project.org/web/packages/BiocManager/index.html) Bioconductor Project Package Manager is required. If [BiocManager](https://cran.r-project.org/web/packages/BiocManager/index.html) is not already installed, it will need to be done so beforehand. Type (within R) install.packages("BiocManager") (This needs to be done just once.) ```{r install, eval=FALSE} install.packages("BiocManager") BiocManager::install("depmap") ``` The `depmap` package fully depends on the `ExperimentHub` Bioconductor package, which allows the data accessed in this package to be stored and retrieved from the cloud. ```{r import_EH, message = FALSE} library("depmap") library("ExperimentHub") ``` # Available data The [depmap](https://bioconductor.org/packages/devel/data/experiment/html/depmap.html) package currently contains eight datasets available through `ExperimentHub`. The data found in this R package has been converted from a "wide" format `.csv` file to "long" format .rda file. None of the values taken from the original datasets have been changed, although the columns have been re-arranged. Descriptions of the changes made are described under the `Details` section after querying the relevant dataset. ```{r ehquery} ## create ExperimentHub query object eh <- ExperimentHub() query(eh, "depmap") ``` Each dataset has a `ExperimentHub` accession number, (e.g. *EH2260* refers to the `rnai` dataset from the 19Q1 release). ## RNA inference knockout data ```{r, echo = FALSE} rnai <- eh[["EH2260"]] ``` The `rnai` dataset contains the combined genetic dependency data for RNAi - induced gene knockdown for select genes and cancer cell lines. This data corresponds to the `D2_combined_genetic_dependency_scores.csv` file found in the `r depmap::depmap_release()` depmap release and includes `r length(unique(rnai$gene))` genes, `r length(unique(rnai$cell_line))` cell lines, 30 primary diseases and 31 lineages. ```{r, eval = FALSE} ## access `rnai_19Q1` by EH number rnai <- eh[["EH2260"]] ``` ```{r} rnai ``` The `rnai` dataset can also be accessed by using the `depmap_rnai` function. ```{r} # depmap_rnai() ``` ```{r, echo = FALSE} rm(rnai) ``` ## CRISPR-Cas9 knockout data ```{r, echo = FALSE} crispr <- eh[["EH2261"]] ``` The `crispr` dataset contains the (batch corrected CERES inferred gene effect) CRISPR-Cas9 knockout data of select genes and cancer cell lines. This data corresponds to the `gene_effect_corrected.csv` file from the `r depmap::depmap_release()` depmap release. Data from this dataset includes `r length(unique(crispr$gene_name))` genes, `r length(unique(crispr$cell_line))` cell lines, 26 primary diseases, 28 lineages. ```{r, eval = FALSE} ## access `crispr_19Q1` by EH number crispr <- eh[["EH2261"]] ``` ```{r} crispr ``` The `crispr` dataset can also be accessed by using the `depmap_crispr` function. ```{r} # depmap_crispr() ``` ```{r, echo = FALSE} rm(crispr) ``` ## WES copy number data ```{r, echo = FALSE} copyNumber <- eh[["EH2262"]] ``` The `copyNumber` dataset contains the WES copy number data, relating to the numerical log-fold copy number change measured against the baseline copy number of select genes and cell lines. This dataset corresponds to the `public_19Q1_gene_cn.csv` from the `r depmap::depmap_release()` depmap release. This dataset includes `r length(unique(copyNumber$gene_name))` genes, `r length(unique(copyNumber$cell_line))` cell lines, 38 primary diseases and 33 lineages. ```{r, eval = FALSE} ## access `copyNumber_19Q1` by EH number copyNumber <- eh[["EH2262"]] ``` ```{r} copyNumber ``` The `copyNumber` dataset can also be accessed by using the `depmap_copyNumber` function. ```{r} # depmap_copyNumber() ``` ```{r, echo = FALSE} rm(copyNumber) ``` ## CCLE Reverse Phase Protein Array data ```{r, echo = FALSE} RPPA <- eh[["EH2263"]] ``` The `RPPA` dataset contains the CCLE Reverse Phase Protein Array (RPPA) data which corresponds to the `CCLE_RPPA_20180123.csv` file from the `r depmap::depmap_release()` depmap release. This dataset includes 214 genes, `r length(unique(RPPA$cell_line))` cell lines, 28 primary diseases, 28 lineages. ```{r, eval = FALSE} ## access `RPPA_19Q1` by EH number RPPA <- eh[["EH2263"]] ``` ```{r} RPPA ``` The `RPPA` dataset can also be accessed by using the `depmap_RPPA` function. ```{r} # depmap_RPPA() ``` ```{r, echo = FALSE} rm(RPPA) ``` ## CCLE RNAseq gene expression data ```{r, echo = FALSE} TPM <- eh[["EH2264"]] ``` The `TPM` dataset contains the CCLE RNAseq gene expression data. This shows expression data only for protein coding genes (using scale log2(TPM+1)). This data corresponds to the `CCLE_depMap_19Q1_TPM.csv` file from the `r depmap::depmap_release()` depmap release. This dataset includes `r length(unique(TPM$gene_name))` genes, `r length(unique(TPM$cell_line))` cell lines, 33 primary Diseases, 32 lineages. ```{r, eval = FALSE} ## access `TPM_19Q1` by EH number TPM <- eh[["EH2264"]] ``` ```{r} TPM ``` The `TPM` dataset can also be accessed by using the `depmap_TPM` function. ```{r} # depmap_TPM() ``` ```{r, echo = FALSE} rm(TPM) ``` ## Cancer cell lines ```{r, echo = FALSE} metadata <- eh[["EH2266"]] ``` The `metadata` dataset contains the metadata about all of the cancer cell lines. It corresponds to the `depmap_19Q1_cell_lines.csv` file found in the `r depmap::depmap_release()` depmap release. This dataset includes 0 genes, `r length(unique(metadata$cell_line))` cell lines, 38 primary diseases and 33 lineages. ```{r, eval = FALSE} ## access `metadata_19Q1` by EH number metadata <- eh[["EH2266"]] ``` ```{r} metadata ``` The `metadat` dataset can also be accessed by using the `depmap_metadata` function. ```{r} # depmap_metadata() ``` ```{r, echo = FALSE} rm(metadata) ``` ## Mutation calls ```{r, echo = FALSE} mutationCalls <- eh[["EH2265"]] ``` The `mutationCalls` dataset contains all merged mutation calls (coding region, germline filtered) found in the depmap dependency study. This dataset corresponds with the `depmap_19Q1_mutation_calls.csv` file found in the `r depmap::depmap_release()` depmap release and includes `r length(unique(mutationCalls$gene_name))` genes, `r length(unique(mutationCalls$depmap_id))` cell lines, 37 primary diseases and 33 lineages. ```{r, eval = FALSE} ## access `mutationCalls_19Q1` by EH number mutationCalls <- eh[["EH2265"]] ``` ```{r} mutationCalls ``` The `mutationCalls` dataset can also be accessed by using the `depmap_mutationCalls` function. ```{r} # depmap_mutationCalls() ``` ```{r, echo = FALSE} rm(mutationCalls) ``` # The [Broad Institute](https://depmap.org/portal/download/) data If desired, the original data from which the [depmap](https://bioconductor.org/packages/devel/data/experiment/html/depmap.html) package were derived from can be downloaded from the [Broad Institute](https://depmap.org/portal/download/) website. The instructions on how to download these files and how the data was transformed and loaded into the [depmap](https://bioconductor.org/packages/devel/data/experiment/html/depmap.html) package can be found in the `make_data.R` file found in `./inst/scripts`. (It should be noted that the original uncompressed *.csv* files are >1.5GB in total and take a moderate amount of time to download remotely.) # Session information ```{r echo = FALSE} sessionInfo()