% % NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % %\VignetteIndexEntry{ceu1kg overview} %\VignetteDepends{GGtools} %\VignetteKeywords{genetics} %\VignettePackage{ceu1kg} \documentclass[12pt]{article} \usepackage{amsmath,pstricks} % With MikTeX 2.9, using pstricks also requires using auto-pst-pdf or running % pdflatex will fail. Note that using auto-pst-pdf requires to set environment % variable MIKTEX_ENABLEWRITE18=t on Windows, and to set shell_escape = t in % texmf.cnf for Tex Live on Unix/Linux/Mac. \usepackage{auto-pst-pdf} \usepackage[authoryear,round]{natbib} \usepackage{hyperref} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \textwidth=6.2in \bibliographystyle{plainnat} \begin{document} %\setkeys{Gin}{width=0.55\textwidth} \title{\textit{ceu1kg}: resources for exploring the 1000 genomes data on individuals of central European ancestry in Bioconductor} \author{VJ Carey} \maketitle \section{Introduction} Using results of next generation sequencing experiments, a consortium of geneticists produced calls for SNP at approximately 8 million loci of the genomes of individuals of central European ancestry. Full genotype calls are held in a folder of SnpMatrix instances: <>= library(ceu1kg) dir(system.file("parts", package="ceu1kg")) lk = load(dir(system.file("parts", package="ceu1kg"),full=TRUE)[1]) c1gt = get(lk) c1gt @ Metadata about the loci are provided in GRanges instances available from SNPlocs packages. Here we consider the 2010 November release. <>= library(SNPlocs.Hsapiens.dbSNP.20101109) if (!exists("c1loc")) c1loc = getSNPlocs("ch1", as.GRanges=TRUE) c1loc rsn1 = paste("rs", elementMetadata(c1loc)$RefSNP_id, sep="") length(intersect(rsn1, colnames(c1gt))) ext1 = grep("chr", colnames(c1gt)) ext1 = as.numeric(gsub("chr1:", "", colnames(c1gt)[ext1])) length(intersect(ext1, start(c1loc))) @ The last computation shows that most of the 1KG locations are not in dbSNP. The Bioconductor \textit{GGdata} package includes HapMap phase II genotypes on 90 CEU individuals in 30 trios, coupled with expression data as distributed at the Sanger GENEVAR project (\url{ftp://ftp.sanger.ac.uk/pub/genevar/}). The 1KG genotypes are available for 43 of these 90 and the associated genotype plus expression data for these 43 can be acquired using getSS, for any chromosome or set of chromosomes. <>= c20 = getSS("ceu1kg", "chr20") c20 @ The above code throws warning because the genotype data are present for 60 individuals, but only 43 have expression values. To create the same structure without a warning: <>= data(eset) # assume ceu1kg is first in line, yields ex in global c1m = c1gt[sampleNames(ex),] c1ss = make_smlSet( ex, list(chr1=c1m) ) c1ss @ \section{Session information} <>= sessionInfo() @ \end{document}