--- title: "4. HPAanalyze use case: Working with Human Protein Atlas (HPA) xml files offline" author: - name: Anh N. Tran affiliation: Northwestern University, Illinois, USA email: trannhatanh89@gmail.com date: 6/7/2019 output: BiocStyle::html_document: toc: true toc_depth: 2 toc_float: true number_sections: true vignette: > %\VignetteIndexEntry{"4. HPAanalyze use case: Working with Human Protein Atlas (HPA) xml files offline"} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse=TRUE, comment="#>", warning=FALSE, error=FALSE, eval=FALSE ) ``` ```{r library, message=FALSE, warning=FALSE, error=FALSE} library(BiocStyle) library(HPAanalyze) library(dplyr) library(xml2) ``` # The case The Human Protein Atlas allow you to download very detailed data for each protein in the form of an xml file, and `hpaXmlGet` and `hpaXml` allow you to retrieve those files automatically from HPA server and parse them. However, due to technical limitation, you will not be able to save those `"xml_document"/"xml_node"` objects. The question is: How do you keep a version of these files to use when you are not connected to the internet, or for reproducibility? # The solution ## Download and keep a local version of the xml files for yourself Look at the ["Downloadable data"](https://www.proteinatlas.org/about/download) page from HPA website, you will see how these files are downloaded. Basically, you add `[ensembl_id].xml` to `http://www.proteinatlas.org` to download individual entries (that's what `hpaXmlGet` does behind the scene), or download the [whole big set](https://www.proteinatlas.org/download/proteinatlas.xml.gz). From there, you can import the file using `xml2::read_xml()`. The output should be exactly the same as `hpaXmlGet`. ```{r} ## same as hpaXmlGet("ENSG00000134057") CCNB1xml <- xml2::read_xml("data/ENSG00000134057.xml") ``` ## Business as usual with `hpaXml` functions Since the umbrella function `hpaXml` take either the *ensembl id* or the imported `xml_document` object, you can feed what you just imported to it and get the expected result. ```{r} CCNB1_parsed <- hpaXml(CCNB1xml) ``` You can obviously use other `hpaXml` functions as well. ```{r} hpaXmlProtClass(CCNB1xml) hpaXmlTissueExprSum(CCNB1xml) hpaXmlAntibody(CCNB1xml) hpaXmlTissueExpr(CCNB1xml) ``` ## Save your parsed objects It is recommended that you save your parsed objects for reproducibility. Unlike the `xml_document` object, these parsed objects are just regular R lists of standard vectors or data frames. You can save them just as usual. ```{r} saveRDS(CCNB1_parsed, "data/CCNB1_parsed.rds") ``` # Copyright ```{r child = 'data/copyright', eval = TRUE} ```