--- title: "phastCons30way.UCSC.hg38 AnnotationHub Resource Metadata" author: - name: Robert Castelo affiliation: - &id Dept. of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain email: robert.castelo@upf.edu package: "`r pkg_ver('phastCons30way.UCSC.hg38')`" abstract: > Store phastCons30way.UCSC.hg38 AnnotationHub Resource Metadata. vignette: > %\VignetteIndexEntry{phastCons30way.UCSC.hg38 AnnotationHub Resource Metadata} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document: toc: true toc_float: true number_sections: true --- ```{r setup, echo=FALSE} options(width=80) ``` # Retrieval of phastCons30way.UCSC.hg38 genomic scores through AnnotationHub resources The `phastCons30way.UCSC.hg38` package provides metadata for the `r Biocpkg("AnnotationHub")` resources associated with human phastCons conservation scores calculated from multiple genome alignments of the human genome with 29 other mammal genomes, including 26 primates. The original data can be found at the UCSC download [site](http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz30way). Details about how those original data were processed into `r Biocpkg("AnnotationHub")` resources can be found in the source file: ``` phastCons30way.UCSC.hg38/scripts/make-metadata_phastCons30way.UCSC.hg38.R ``` The genomic scores for `phastCons30way.UCSC.hg38` can be retrieved using the `r Biocpkg("AnnotationHub")`, which is a web resource that provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard (e.g., UCSC, Ensembl) and distributed sites, can be found. A Bioconductor `r Biocpkg("AnnotationHub")` web resource creates and manages a local cache of files retrieved by the user, helping with quick and reproducible access. While the `r Biocpkg("AnnotationHub")` API can be used to query those resources, we encourage to use the `r Biocpkg("GenomicScores")` API, as follows. The first step to retrieve genomic scores is to check the ones available to download. ```{r, echo=FALSE} avgs <- readRDS(system.file("extdata", "avgs.rds", package="GenomicScores")) ``` ```{r retrieve2, message=FALSE, cache=FALSE, eval=FALSE} availableGScores() ``` ```{r, echo=FALSE} avgs ``` The selected resource can be downloaded with the function getGScores(). After the resource is downloaded the first time, the cached copy will enable a quicker retrieval later. ```{r retrieve3, message=FALSE, cache=FALSE, eval=FALSE} phast <- getGScores("phastCons30way.UCSC.hg38") ``` Finally, the phastCons score of a particular genomic position is retrieved using the function 'gscores()'. Please consult the the documentation of the `r Biocpkg("GenomicScores")` package for details on how to use it. ```{r retrieve4, message=FALSE, eval=FALSE} gscores(phast, GRanges(seqnames="chr22", IRanges(start=50967020:50967025, width=1))) ``` ## Building an annotation package from a GScores object Retrieving genomic scores through `AnnotationHub` resources requires an internet connection and we may want to work with such resources offline. For that purpose, we can create ourselves an annotation package, such as [phastCons100way.UCSC.hg19](https://bioconductor.org/packages/phastCons100way.UCSC.hg19), from a `GScores` object corresponding to a downloaded `AnnotationHub` resource. To do that we use the function `makeGScoresPackage()` as follows: ```{r eval=FALSE} makeGScoresPackage(phast, maintainer="Me ", author="Me", version="1.0.0") ``` ```{r echo=FALSE} cat("Creating package in ./phastCons30way.UCSC.hg38\n") ``` An argument, `destDir`, which by default points to the current working directory, can be used to change where in the filesystem the package is created. Afterwards, we should still build and install the package via, e.g., `R CMD build` and `R CMD INSTALL`, to be able to use it offline. # Session information ```{r session_info, cache=FALSE} sessionInfo() ```