--- title: "ldblock package: linkage disequilibrium data structures" author: "Vincent J. Carey, stvjc at channing.harvard.edu" date: "February 2015" vignette: > %\VignetteIndexEntry{ldblock package: linkage disequilibrium data structures} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} output: BiocStyle::html_document: highlight: pygments number_sections: yes theme: united toc: yes --- ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown() ``` # Introduction There is a nice vignette in `r Biocpkg("snpStats")` concerning linkage disequilibrium (LD) analysis as supported by software in that package. The purpose of this package is to simplify handling of existing population-level data on LD for the purpose of flexibly defining LD blocks. # Import of HapMap LD data The `hmld` function imports gzipped tabular data from hapmap's repository \url{hapmap.ncbi.nlm.nih.gov/downloads/ld_data/2009-02_phaseIII_r2/}. ```{r lkd, cache=TRUE} suppressPackageStartupMessages({ library(ldblock) library(GenomeInfoDb) }) path = dir(system.file("hapmap", package="ldblock"), full=TRUE) ceu17 = hmld(path, poptag="CEU", chrom="chr17") ceu17 ``` # A view of the block structure For some reason knitr/render will not display this image nicely. ```{r abc, fig=TRUE, fig.width=7, fig.height=7} library(Matrix) image(ceu17@ldmat[1:400,1:400], col.reg=heat.colors(120), colorkey=TRUE, useRaster=TRUE) ``` This ignores physical distance and MAF. The bright stripes are probably due to SNP with low MAF. # Collecting SNPs exhibiting linkage to selected SNP We'll use `ceu17` and the `gwascat` package to enumerate SNP that are in LD with GWAS hits. ```{r getg} library(gwascat) data(ebicat37) seqlevelsStyle(ebicat37) = "NCBI" e17 = ebicat37[ which(as.character(seqnames(ebicat37)) == "17") ] ``` Some dbSNP names for GWAS hits on chr17 are ```{r getrs} rsh17 = unique(e17$SNPS) head(rsh17) ``` We will use `expandSnpSet` to obtain names for SNP that were found in HapMap CEU to have which $D' > .9$ with any of these hits. These names are added to the input set. ```{r doexpa} length(rsh17) exset = ldblock::expandSnpSet( rsh17, ldstruct= ceu17, lb=.9 ) length(exset) all(rsh17 %in% exset) ``` Not all GWAS SNP are in the HapMap LD resource. You can use your own LD data as long as the format agrees with that of the HapMap distribution.