--- title: "cgdsr to cBioPortalData: Migration Tutorial" author: "Karim Mezhoud & Marcel Ramos" date: "`r format(Sys.time(), '%B %d, %Y')`" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{cgdsr to cBioPortalData Migration} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document: code_folding: show number_sections: no toc: yes toc_depth: 4 --- ```{r, setup, include=FALSE} knitr::opts_chunk$set(cache = TRUE) ``` # Introduction This vignette aims to help developers migrate from the now defunct `cgdsr` CRAN package. Note that the `cgdsr` package code is shown for comparison but it is not guaranteed to work. If you have questions regarding the contents, please create an issue at the GitHub repository: https://github.com/waldronlab/cBioPortalData/issues # Loading the package ```{r,include=TRUE,results="hide",message=FALSE,warning=FALSE} library(cBioPortalData) ``` # Discovering studies {.tabset .tabset-fade .tabset-pills} ## `cBioPortalData` setup Here we show the default inputs to the cBioPortal function for clarity. ```{r , comment=FALSE, warning=FALSE, message=FALSE} cbio <- cBioPortal( hostname = "www.cbioportal.org", protocol = "https", api. = "/api/api-docs" ) getStudies(cbio) ``` Note that the `studyId` column is important for further queries. ```{r} head(getStudies(cbio)[["studyId"]]) ``` ## `cgdsr` setup ```{r,eval=FALSE} library(cgdsr) cgds <- CGDS("http://www.cbioportal.org/") getCancerStudies.CGDS(cgds) ``` # Obtaining Cases {.tabset .tabset-fade .tabset-pills} ## `cBioPortalData` (Cases) ### Notes * Each patient is identified by a `patientId`. * `sampleListId` identifies groups of `patientId` based on profile type * The `sampleLists` function uses `studyId` input to return `sampleListId` ### sampleLists For the sample list identifiers, you can use `sampleLists` and inspect the `sampleListId` column. ```{r} samps <- sampleLists(cbio, "gbm_tcga_pub") samps[, c("category", "name", "sampleListId")] ``` ### samples from sampleLists It is possible to get `case_ids` directly when using the `samplesInSampleLists` function. The function handles multiple `sampleList` identifiers. ```{r} samplesInSampleLists( api = cbio, sampleListIds = c( "gbm_tcga_pub_expr_classical", "gbm_tcga_pub_expr_mesenchymal" ) ) ``` ### getSampleInfo To get more information about patients, we can query with `getSampleInfo` function. ```{r} getSampleInfo(api = cbio, studyId = "gbm_tcga_pub", projection = "SUMMARY") ``` ## `cgdsr` (Cases) ### Notes * 'Cases' and 'Patients' are synonymous. * Each patient is identified by a `case_id`. * Queries only handle a single `cancerStudy` identifier * The `case_list_description` describes the assays ### `getCaseLists` and `getClinicalData` We obtain the first `case_list_id` in the `cgds` object from above and the corresponding clinical data for that case list (`gbm_tcga_pub_all` as the case list in this example). ```{r,eval=FALSE} clist1 <- getCaseLists.CGDS(cgds, cancerStudy = "gbm_tcga_pub")[1, "case_list_id"] getClinicalData.CGDS(cgds, clist1) ``` # Obtaining Clinical Data {.tabset .tabset-fade .tabset-pills} ## `cBioPortalData` (Clinical) ### All clinical data Note that a `sampleListId` is not required when using the `fetchAllClinicalDataInStudyUsingPOST` internal endpoint. Data for all patients can be obtained using the `clinicalData` function. ```{r} clinicalData(cbio, "gbm_tcga_pub") ``` ### By sample data You can use a different endpoint to obtain data for a single sample. First, obtain a single `sampleId` with the `samplesInSampleLists` function. ```{r} clist1 <- "gbm_tcga_pub_all" samplist <- samplesInSampleLists(cbio, clist1) onesample <- samplist[["gbm_tcga_pub_all"]][1] onesample ``` Then we use the API endpoint to retrieve the data. Note that you would run `httr::content` on the output to extract the data. ```{r} cbio$getAllClinicalDataOfSampleInStudyUsingGET( sampleId = onesample, studyId = "gbm_tcga_pub" ) ``` ## `cgdsr` (Clinical) ### Notes * `getClinicalData` uses `case_list_id` as input without specifying the `study_id` as case list identifiers are unique to each study. ### getClinicalData We query clinical data for the `gbm_tcga_pub_expr_classical` case list identifier which is part of the `gbm_tcga_pub` study. ```{r, eval = FALSE} getClinicalData.CGDS(x = cgds, caseList = "gbm_tcga_pub_expr_classical" ) ``` # Clinical Data Summary `cgdsr` allows you to obtain clinical data for a case list subset (54 cases with `gbm_tcga_pub_expr_classical`) and `cBioPortalData` provides clinical data for all 206 samples in `gbm_tcga_pub` using the `clinicalData` function. * `cgdsr` returns a `data.frame` with `sampleId` (TCGA.02.0009.01) but not `patientId` (TCGA.02.0009) * `cBioPortalData` returns `sampleId` (TCGA-02-0009-01) and `patientId` (TCGA-02-0009). * Note the differences in identifier delimiters between the two packages, `cgdsr` provides `case_id`s with `.` and `cBioPortalData` returns `patientId`s with `-`. You may be interested in other clinical data endpoints. For a list, use the `searchOps` function. ```{r} searchOps(cbio, "clinical") ``` # Molecular or Genetic Profiles {.tabset .tabset-fade .tabset-pills} ## `cBioPortalData` (molecularProfiles) ```{r} molecularProfiles(api = cbio, studyId = "gbm_tcga_pub") ``` Note that we want to pull the `molecularProfileId` column to use in other queries. ## `cgdsr` (getGeneticProfiles) ```{r,eval=FALSE} getGeneticProfiles.CGDS(cgds, cancerStudy = "gbm_tcga_pub") ``` # Genomic Profile Data for a set of genes {.tabset .tabset-fade .tabset-pills} ## `cBioPortalData` (Indentify samples and genes) Currently, some conversion is needed to directly use the `molecularData` function, if you only have Hugo symbols. First, convert to Entrez gene IDs and then obtain all the samples in the sample list of interest. ### Convert `hugoGeneSymbol` to `entrezGeneId` ```{r} genetab <- queryGeneTable(cbio, by = "hugoGeneSymbol", genes = c("NF1", "TP53", "ABL1") ) genetab entrez <- genetab[["entrezGeneId"]] ``` ### Obtain all samples in study ```{r} allsamps <- samplesInSampleLists(cbio, "gbm_tcga_pub_all") ``` In the next section, we will show how to use the genes and sample identifiers to obtain the molecular profile data. ## `cgdsr` (Profile Data) The `getProfileData` function allows for straightforward retrieval of the molecular profile data with only a case list and genetic profile identifiers. ```{r,eval=FALSE} getProfileData.CGDS(x = cgds, genes = c("NF1", "TP53", "ABL1"), geneticProfiles = "gbm_tcga_pub_mrna", caseList = "gbm_tcga_pub_all" ) ``` # Molecular Data with `cBioPortalData` `cBioPortalData` provides a number of options for retrieving molecular profile data depending on the use case. Note that `molecularData` is mostly used internally and that the `cBioPortalData` function is the user-friendly method for downloading such data. ## `molecularData` We use the translated `entrez` identifiers from above. ```{r} molecularData(cbio, "gbm_tcga_pub_mrna", entrezGeneIds = entrez, sampleIds = unlist(allsamps)) ``` ## `getDataByGenes` The `getDataByGenes` function automatically figures out all the sample identifiers in the study and it allows Hugo and Entrez identifiers, as well as `genePanelId` inputs. ```{r} getDataByGenes( api = cbio, studyId = "gbm_tcga_pub", genes = c("NF1", "TP53", "ABL1"), by = "hugoGeneSymbol", molecularProfileIds = "gbm_tcga_pub_mrna" ) ``` ## `cBioPortalData`: the main end-user function It is important to note that end users who wish to obtain the data as easily as possible should use the main `cBioPortalData` function: ```{r} gbm_pub <- cBioPortalData( api = cbio, studyId = "gbm_tcga_pub", genes = c("NF1", "TP53", "ABL1"), by = "hugoGeneSymbol", molecularProfileIds = "gbm_tcga_pub_mrna" ) assay(gbm_pub[["gbm_tcga_pub_mrna"]])[, 1:4] ``` # Mutation Data {.tabset .tabset-fade .tabset-pills} ## `cBioPortalData` (mutationData) Similar to `molecularData`, mutation data can be obtained with the `mutationData` function or the `getDataByGenes` function. ```{r} mutationData( api = cbio, molecularProfileIds = "gbm_tcga_pub_mutations", entrezGeneIds = entrez, sampleIds = unlist(allsamps) ) ``` ```{r} getDataByGenes( api = cbio, studyId = "gbm_tcga_pub", genes = c("NF1", "TP53", "ABL1"), by = "hugoGeneSymbol", molecularProfileIds = "gbm_tcga_pub_mutations" ) ``` ## `cgdsr` (getMutationData) ```{r,eval=FALSE} getMutationData.CGDS( x = cgds, caseList = "getMutationData", geneticProfile = "gbm_tcga_pub_mutations", genes = c("NF1", "TP53", "ABL1") ) ``` # Copy Number Alteration (CNA) {.tabset .tabset-fade .tabset-pills} ## `cBioPortalData` (CNA) Copy Number Alteration data can be obtained with the `getDataByGenes` function or by the main `cBioPortal` function. ```{r} getDataByGenes( api = cbio, studyId = "gbm_tcga_pub", genes = c("NF1", "TP53", "ABL1"), by = "hugoGeneSymbol", molecularProfileIds = "gbm_tcga_pub_cna_rae" ) ``` ```{r} cBioPortalData( api = cbio, studyId = "gbm_tcga_pub", genes = c("NF1", "TP53", "ABL1"), by = "hugoGeneSymbol", molecularProfileIds = "gbm_tcga_pub_cna_rae" ) ``` ## `cgdsr` (CNA) ```{r, eval=FALSE} getProfileData.CGDS( x = cgds, genes = c("NF1", "TP53", "ABL1"), geneticProfiles = "gbm_tcga_pub_cna_rae", caseList = "gbm_tcga_pub_cna" ) ``` # Methylation Data {.tabset .tabset-fade .tabset-pills} ## `cBioPortalData` (Methylation) Similar to Copy Number Alteration, Methylation can be obtained by `getDataByGenes` function or by 'cBioPortalData' function. ```{r} getDataByGenes( api = cbio, studyId = "gbm_tcga_pub", genes = c("NF1", "TP53", "ABL1"), by = "hugoGeneSymbol", molecularProfileIds = "gbm_tcga_pub_methylation_hm27" ) ``` ```{r} cBioPortalData( api = cbio, studyId = "gbm_tcga_pub", genes = c("NF1", "TP53", "ABL1"), by = "hugoGeneSymbol", molecularProfileIds = "gbm_tcga_pub_methylation_hm27" ) ``` ## `cgdsr` (Methylation) ```{r,eval=FALSE} getProfileData.CGDS( x = cgds, genes = c("NF1", "TP53", "ABL1"), geneticProfiles = "gbm_tcga_pub_methylation_hm27", caseList = "gbm_tcga_pub_methylation_hm27" ) ``` # sessionInfo ```{r} sessionInfo() ```