%\VignetteIndexEntry{ensemblVEP} %\VignetteDepends{GenomicRanges, VariantAnnotation, Biostrings} %\VignetteKeywords{annotation, variants} %\VignettePackage{ensemblVEP} \documentclass[10pt]{article} \usepackage{times} \usepackage{hyperref} \usepackage[margin=0.65in]{geometry} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textsf{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rcode}[1]{{\texttt{#1}}} \newcommand{\software}[1]{\textsf{#1}} \newcommand{\R}{\software{R}} \newcommand{\Bioconductor}{\software{Bioconductor}} \SweaveOpts{keep.source=TRUE} \title{Overview of \Rpackage{ensemblVEP}} \author{Lori Shepherd and Valerie Obenchain} \begin{document} \maketitle \tableofcontents %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Ensembl provides the facility to predict functional consequences of known and unknown variants using the Variant Effect Predictor (VEP). The \Rpackage{ensemblVEP} package wraps Ensembl VEP and returns the results as \R objects or a file on disk. To use this package the Ensembl VEP perl script must be installed in your path. See the package README for details. NOTE: As of Ensembl version 88 the VEP script has been renamed from variant\_effect\_predictor.pl to vep. The ensemblVEP package code and documentation have been updated to reflect this change. \noindent Downloads: \url{http://uswest.ensembl.org/info/docs/tools/vep/index.html} \\ \noindent Complete documentation for runtime options: \url{http://uswest.ensembl.org/info/docs/tools/vep/script/vep_options.html} \\ \noindent To test that Ensembl VEP is properly installed, enter the name of the script from the command line: {\it vep} \\ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Results as \R{} objects} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% <>= library(ensemblVEP) @ The \Rfunction{ensemblVEP} function can return variant consequences from Ensembl VEP as \R objects (\Robject{GRanges} or \Robject{VCF}) or write them to a file. The default behavior returns a \Robject{GRanges}. Runtime options are stored in a \Robject{VEPFlags} object and allow a great deal of control over the content and format of the results. See the man pages for more details. <>= ?ensemblVEP ?VEPFlags @ The default runtime options can be inspected by creating a \Robject{VEPFlags}. <>= param <- VEPFlags() param flags(param) @ Of note is the 'host'. The default 'host' for queries is character() and therefore is set to the vep default of 'ensembldb.ensembl.org'. Users in the US may find connection and transfer speeds quicker using our East coast mirror, 'useastdb.ensembl.org'. <>= param <- VEPFlags(flags=list(host="useastdb.ensembl.org")) @ Using a vcf file from \Rpackage{VariantAnnotation} as input, we query Ensembl VEP with the default runtime parameters. Consequence data are parsed into the metadata columns of the \Robject{GRanges}. To control the type and amount of data returned see the output options at \url{http://uswest.ensembl.org/info/docs/tools/vep/script/vep_options.html}. <>= fl <- system.file("extdata", "gl_chr1.vcf", package="VariantAnnotation") gr <- ensemblVEP(fl, param) head(gr, 3) @ Next we request that a \Robject{VCF} object be returned by setting the {\it vcf} option in the {\it flags} slot to TRUE. <>= param <- VEPFlags(flags=list(vcf=TRUE, host="useastdb.ensembl.org")) vep <- ensemblVEP(fl, param) @ Success! When a \Robject{VCF} is returned, consequence data are included as an unparsed INFO column labeled {\it CSQ}. <>= info(vep)$CSQ @ The \Rfunction{parseCSQToGRanges} function parses these data into a \Robject{GRanges}. When the rownames of the original VCF are provided as \Rcode{VCFRowID} a metadata column of the same name is included in the output. <>= vcf <- readVcf(fl, "hg19") csq <- parseCSQToGRanges(vep, VCFRowID=rownames(vcf)) head(csq, 3) @ The \Rcode{VCFRowID} columns maps the expanded {\it CSQ} data back to the rows in the \Rclass{VCF} object. This index can be used to subset the original VCF. <>= vcf[csq$"VCFRowID"] @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Write results to a file} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% In the previous section we saw Ensembl VEP results returned as \R{} objects in the workspace. Alternatively, these results can be written directly to a file. The flag that controls how the data are returned is the {\it output\_file} flag. When {\it output\_file} is NULL (default), the results are returned as either a \Rclass{GRanges} or \Rclass{VCF} object. <>= flags(param)$output_file @ To write results directly to a file, specify a file name for the {\it output\_file} flag. <>= flags(param)$output_file <- "/mypath/myfile" @ The file can be written as a {\it vcf} or {\it gvf} by setting the options of the slot to TRUE. If neither of {\it vcf} or {\it gvf} are TRUE the file is written out as tab delimited. <>= ## Write a vcf file to myfile.vcf: myparam <- VEPFlags(flags=list(vcf=TRUE, output_file="/path/myfile.vcf")) ## Write a gvf file to myfile.gvf: myparam <- VEPFlags(flags=list(gvf=TRUE, output_file="/path/myfile.gvf")) ## Write a tab delimited file to myfile.txt: myparam <- VEPFlags(flags=list(output_file="/path/myfile.txt")) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Configuring runtime options} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% The Ensembl VEP web page has complete descriptions of all runtime options. \url{http://uswest.ensembl.org/info/docs/tools/vep/script/vep_options.html} Below are examples of how to configure the runtime options in the \Rclass{VEPFlags} for specific situations. Investigate the differences in results using a sample file from \Rpackage{VariantAnnotation}. <>= fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") @ \begin{itemize} \item Add regulatory region consequences: <>= param <- VEPFlags(flags=list(regulatory=TRUE, host="useastdb.ensembl.org")) gr <- ensemblVEP(fl, param) @ \item Specify input file format as VCF, add HGNC gene identifiers, output SO consequence terms: <>= param <- VEPFlags(flag=list(format="vcf", terms="SO", symbol=TRUE, host="useastdb.ensembl.org")) gr <- ensemblVEP(fl, param) @ \item Check for co-located variants, output only coding sequence consequences, output HGVS names: <>= param <- VEPFlags(flags=list(coding_only=TRUE, check_existing=TRUE, symbol=TRUE, host="useastdb.ensembl.org")) gr <- ensemblVEP(fl, param) @ \item Add SIFT score and prediction, PolyPhen prediction only, output results as \Rcode{VCF}: \begin{verbatim} fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation") param <- VEPFlags(flags=list(sift="b", polyphen="p", vcf=TRUE, host="useastdb.ensembl.org")) vcf <- ensemblVEP(fl, param) csq <- parseCSQToGRanges(vcf) > head(levels(mcols(csq)$SIFT)) [1] "deleterious(0.01)" "deleterious(0.02)" "deleterious(0.03)" [4] "deleterious(0.04)" "deleterious(0.05)" "deleterious(0)" > levels(mcols(csq)$PolyPhen) [1] "benign" "possibly_damaging" "probably_damaging" [4] "unknown" \end{verbatim} \end{itemize} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{\Rcode{sessionInfo()}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% <>= sessionInfo() @ \end{document}