%\VignetteIndexEntry{PreV90EnsemblVEP} %\VignetteDepends{GenomicRanges, VariantAnnotation, Biostrings} %\VignetteKeywords{annotation, variants} %\VignettePackage{ensemblVEP} \documentclass[10pt]{article} \usepackage{times} \usepackage{hyperref} \usepackage[margin=0.65in]{geometry} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textsf{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rcode}[1]{{\texttt{#1}}} \newcommand{\software}[1]{\textsf{#1}} \newcommand{\R}{\software{R}} \newcommand{\Bioconductor}{\software{Bioconductor}} \SweaveOpts{keep.source=TRUE} \title{Overview of \Rpackage{ensemblVEP} Pre Ensembl 90} \author{Valerie Obenchain and Lori Shepherd} \begin{document} \maketitle \tableofcontents %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Ensembl provides the facility to predict functional consequences of known and unknown variants using the Variant Effect Predictor (VEP). The \Rpackage{ensemblVEP} package wraps Ensembl VEP and returns the results as \R objects or a file on disk. To use this package the Ensembl VEP perl script must be installed in your path. See the package README for details. NOTE: As of Ensembl version 88 the VEP script has been renamed from variant\_effect\_predictor.pl to vep. The ensemblVEP package code and documentation have been updated to reflect this change. \noindent Downloads: \url{http://uswest.ensembl.org/info/docs/tools/vep/index.html} \\ \noindent Complete documentation for runtime options: \url{http://uswest.ensembl.org/info/docs/tools/vep/script/vep_options.html} \\ \noindent To test that Ensembl VEP is properly installed, enter the name of the script from the command line: {\it vep} \\ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Results as \R{} objects} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% <>= library(ensemblVEP) @ The \Rfunction{ensemblVEP} function can return variant consequences from Ensembl VEP as \R objects (\Robject{GRanges} or \Robject{VCF}) or write them to a file. The default behavior returns a \Robject{GRanges}. Runtime options are stored in a \Robject{VEPParam} object and allow a great deal of control over the content and format of the results. See the man pages for more details. <>= ?ensemblVEP ?VEPParam @ The default runtime options can be inspected by creating a \Robject{VEPParam}. <>= param <- VEPParam(version=88) param basic(param) @ Using a vcf file from \Rpackage{VariantAnnotation} as input, we query Ensembl VEP with the default runtime parameters. <>= fl <- system.file("extdata", "gl_chr1.vcf", package="VariantAnnotation") gr <- ensemblVEP(fl) @ Consequence data are parsed into the metadata columns of the \Robject{GRanges}. To control the type and amount of data returned see the options in \Rcode{output(VEPParam())}. \begin{verbatim} > head(gr, 3) GRanges object with 3 ranges and 23 metadata columns: seqnames ranges strand | Allele | rs6054257 20 [ 14370, 14370] * | A 20:17330_T/A 20 [ 17330, 17330] * | A rs6040355 20 [1110696, 1110696] * | G Consequence IMPACT SYMBOL Gene rs6054257 intergenic_variant MODIFIER 20:17330_T/A intergenic_variant MODIFIER rs6040355 upstream_gene_variant MODIFIER PSMF1 ENSG00000125818 Feature_type Feature BIOTYPE EXON rs6054257 20:17330_T/A rs6040355 Transcript ENST00000479715 processed_transcript INTRON HGVSc HGVSp cDNA_position CDS_position rs6054257 20:17330_T/A rs6040355 Protein_position Amino_acids Codons Existing_variation rs6054257 20:17330_T/A rs6040355 DISTANCE STRAND FLAGS SYMBOL_SOURCE HGNC_ID rs6054257 20:17330_T/A rs6040355 2610 1 HGNC HGNC:9571 ------- seqinfo: 1 sequence from genome \end{verbatim} Next we use a vcf of structural variants as input <>= fl <- system.file("extdata", "structural.vcf", package="VariantAnnotation") @ and request that a \Robject{VCF} object be returned by setting the {\it vcf} option in the {\it dataformat} slot to TRUE. <>= param <- VEPParam(dataformat=c(vcf=TRUE), version=88) @ An call to \Rfunction{ensemblVEP} results in an error. \begin{verbatim} > vcf <- ensemblVEP(fl, param) 2012-12-03 16:40:55 - Starting... ERROR: Could not detect input file format \end{verbatim} In most situations Ensembl VEP can auto-detect the input format. In this case, however, it cannot so we explicitly set the {\it format} option to 'vcf'. <>= input(param)$format <- "vcf" @ Try again. <>= vep <- ensemblVEP(fl, param) @ Success! When a \Robject{VCF} is returned, consequence data are included as an unparsed INFO column labeled {\it CSQ}. <>= info(vep)$CSQ @ The \Rfunction{parseCSQToGRanges} function parses these data into a \Robject{GRanges}. When the rownames of the original VCF are provided as \Rcode{VCFRowID} a metadata column of the same name is included in the output. <>= vcf <- readVcf(fl, "hg19") csq <- parseCSQToGRanges(vep, VCFRowID=rownames(vcf)) head(csq, 3) @ The \Rcode{VCFRowID} columns maps the expanded {\it CSQ} data back to the rows in the \Rclass{VCF} object. This index can be used to subset the original VCF. <>= vcf[csq$"VCFRowID"] @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Write results to a file} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% In the previous section we saw Ensembl VEP results returned as \R{} objects in the workspace. Alternatively, these results can be written directly to a file. The flag that controls how the data are returned is the {\it output\_file} flag in the {\it input} options. When {\it output\_file} is an empty character (default), the results are returned as either a \Rclass{GRanges} or \Rclass{VCF} object. <>= input(param)$output_file @ To write results directly to a file, specify a file name for the {\it output\_file} flag. <>= input(param)$output_file <- "/mypath/myfile" @ The file can be written as a {\it vcf} or {\it gvf} by setting the options in the {\it dataformat} slot to TRUE. If neither of {\it vcf} or {\it gvf} are TRUE the file is written out as tab delimited. <>= ## Write a vcf file to myfile.vcf: myparam <- VEPParam(dataformat=c(vcf=TRUE), input=c(output_file="/path/myfile.vcf"), version=88) ## Write a gvf file to myfile.gvf: myparam <- VEPParam(dataformat=c(gvf=TRUE), input=c(output_file="/path/myfile.gvf"), version=88) ## Write a tab delimited file to myfile.txt: myparam <- VEPParam(input=c(output_file="/path/myfile.txt"), version=88) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Configuring runtime options} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% The Ensembl VEP web page has complete descriptions of all runtime options. \url{http://uswest.ensembl.org/info/docs/tools/vep/script/vep_options.html} Below are examples of how to configure the runtime options in the \Rclass{VEPParam} for specific situations. Investigate the differences in results using a sample file from \Rpackage{VariantAnnotation}. <>= fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") @ \begin{itemize} \item Add regulatory region consequences: <>= param <- VEPParam(output=c(regulatory=TRUE), version=88) gr <- ensemblVEP(fl, param) @ \item Specify input file format as VCF, add HGNC gene identifiers, output SO consequence terms: <>= param <- VEPParam(input=c(format="vcf"), output=c(terms="so"), identifiers=c(symbol=TRUE), version=88) gr <- ensemblVEP(fl, param) @ \item Check for co-located variants, output only coding sequence consequences, output HGVS names: <>= param <- VEPParam(filterqc=c(coding_only=TRUE), colocatedVariants=c(check_existing=TRUE), identifiers=c(symbol=TRUE), version=88) gr <- ensemblVEP(fl, param) @ \item Add SIFT score and prediction, PolyPhen prediction only, output results as \Rcode{VCF}: \begin{verbatim} fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation") param <- VEPParam(output=c(sift="b", polyphen="p"), dataformat=c(vcf=TRUE), version=88) vcf <- ensemblVEP(fl, param) csq <- parseCSQToGRanges(vcf) > head(levels(mcols(csq)$SIFT)) [1] "deleterious(0.01)" "deleterious(0.02)" "deleterious(0.03)" [4] "deleterious(0.04)" "deleterious(0.05)" "deleterious(0)" > levels(mcols(csq)$PolyPhen) [1] "benign" "possibly_damaging" "probably_damaging" [4] "unknown" \end{verbatim} \end{itemize} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{\Rcode{sessionInfo()}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% <>= sessionInfo() @ \end{document}