%\VignetteIndexEntry{BeadDataPackR Vignette} %\VignetteDepends{} %\VignetteKeywords{BeadDataPackR Illumina Microarray Compression} %\VignettePackage{BeadDataPackR} %\VignetteEngine{knitr::knitr} \documentclass{article} %\usepackage{hyperref} %\usepackage{Sweave} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\classdef}[1]{ {\em #1} } <>= BiocStyle::latex() @ \begin{document} %\SweaveOpts{concordance=TRUE} \title{\Rpackage{BeadDataPackR}: Compression Tools for Raw Illumina Beadarray Data} \author{Mike L. Smith and Andy G. Lynch} \maketitle \section*{Introduction} Raw Illumina BeadArray data consists of \textit{.tif} images produced by the scanner, accompanied by \textit{.txt} and \textit{.locs} files containing details of bead locations and their intensities within the image. For a single channel human expression array these data typically occupy $\approx$ 125MB of storage. The size of these files can prove a hinderance to both their storage and distribution. Whilst the images can be compressed using a variety of common tools, the \Rpackage{BeadDataPackR} package aims to provide tools for the efficient compression of the \textit{.txt} and \textit{.locs} files.\\ \noindent Disclaimer: \Rpackage{BeadDataPackR} has been tested on data from a variety of Illumina BeadArray platforms and, to the best of our knowledge, data compressed using lossless settings has always been restored successfully. However, we cannot take responsibility for any loss or damage to data that results from its use. \subsection*{Citing the package} If you make use of \Rpackage{BeadDataPackR} to ease distribution of your data please cite the following paper:\\ \noindent BeadDataPackR: A Tool to Facilitate the Sharing of Raw Data from Illumina BeadArray Studies. \textit{Cancer Informatics}, 9:217-227, 2010. \subsection*{iScan Data} Bead-level data from Illumina's iScan system comes in a different format to that from the older BeadScan system. The primary difference is that for each array section two images and accompanying \textit{.locs} files are produced, labelled Swath1 and Swath2, along with a single \textit{.txt}. \Rpackage{BeadDataPackR} is currently unable to compress data in this format. \section*{Compressing Data} The first step before using any of the functionality is to load the package. For the purpose of this vignette we also get the path to the example data that is distributed with the package. <>= library(BeadDataPackR) dataPath <- system.file("extdata", package = "BeadDataPackR") tempPath <- tempdir() @ \Rpackage{BeadDataPackR} has two primary functions, namely to compress raw Illumina data, or decompress a file already created with the package. We'll begin with file compression, which is carried out using the following commands: <>= compressBeadData(txtFile = "example.txt", locsGrn = "example_Grn.locs", outputFile = "example.bab", path = tempPath, nBytes = 4, nrow = 326, ncol = 4) @ The \Rfunarg{txtFile} and \Rfunarg{locsGrn} arguments specify the names of the files to be compressed. For two channel data there is an additional argument, \Rfunarg{locsRed}, giving the name of the \textit{.locs} file for the red channel. These files should be found within the directory specified in the \Rfunarg{path} argument. A future revision of the package will hopefully alter this behaviour, so all arrays within a specified folder will be automatically identified and compressed. The argument \Rfunarg{nBytes} specifies how many bytes should be used to store the fractional parts of the bead coordinates. For a single channel array the maximum value is 4 (8 for a two channel array). If the maximum value is used the coordinates are stored in the \textit{.bab} file as single precision floating point numbers, as they are in the \textit{.locs} files. If a value smaller than the maximum is choosen then the integer parts of each coordinate are stored seperately. The requested number of bytes are then used to store the fractional parts, with a corresponding loss of precision as the number of bytes decreases. The \Rfunarg{nrow} and \Rfunarg{ncol} arguments can normally be left blank. They specify the dimensions of each grid segment on the array and, if left blank, can normally be infered from the grid coordinates. However, this can fail for particularly small grids or cases of misregistration where segments overlap. If one wants or needs to specify them explicitly, these values can be found in the \textit{.sdf} which accompanies the bead level output from the scanner. The number of columns and rows per segment can be found within the tags \texttt{} and \texttt{} respectively. \section*{Decompressing Data} To decompress a \textit{.bab} file that was created by \Rpackage{BeadDataPackR}, use the following function: <>= decompressBeadData(inputFile = "example.bab", inputPath = tempPath, outputMask = "restored", outputPath = tempPath, outputNonDecoded = FALSE, roundValues = TRUE ) @ The \Rfunarg{inputFile} argument specifies the name of the \textit{.bab} that should be decompressed. This file should be located in the folder indicated by \Rfunarg{inputPath}, which by default is the current working directory. When an array is compressed its name is stored in the resulting \textit{.bab} file. By default when it is decompressed this name is used for the restored files. However, this can be troublesome if you don't want to overwrite an exisiting file. The \Rfunarg{outputMask} argument allows the user to define the names of the restored \textit{.txt} and \textit{.locs} files. In this case the restored files will be named `restored.txt' and `restored\_Grn.locs'. If this was two channel data a further file, `restored\_Red.locs', would also be produced. The files are created in the location specified by the \Rfunarg{outputPath} argument. If this is left blank the current working directory is used. The \textit{.txt} file that are produced by Illumina's scanner do not included the locations of beads that failed their decoding process. Since their location are retained in the \textit{.locs} file, we have to option of including them in the restored files. \Rfunarg{outputNonDecoded} toggles whether to include them or not. Illumina's \textit{.txt} files also give the bead centre coordinates to 7 significant figures, resulting in a different number of decimal places as we move across the array, rather than the single precision values held in the \textit{.locs} files. \Rfunarg{roundValues} allows the user to choose between mimicing this behaviour when recreating the \textit{.txt} files or ouputing the maximum available precision. The default for both these arguments is to reproduce the original Illumina files. \subsection*{Extracting data directly} Rather than decompressing the \textit{.bab} file into the two original files, it may be useful to extract data directly from it. This can be achieved in the following way: <>= readCompressedData(inputFile = "example.bab", path = tempPath, probeIDs = c(10008, 10010) ) @ This function takes a \textit{.bab} file found in the directory specified by the \Rfunarg{path} argument. In this case it extracts the data relating to the beads with IDs 10008 and 10010, which is returned as a matrix in the same format as the \textit{.txt} file that would be created if the file were to be decompressed. If the \Rfunarg{probeIDs} argument is left NULL then a matrix containing data for every probe on the array is returned (including the non-decoded beads). This mechanism is used within the \Rpackage{beadarray} package, from version \texttt{2.1.11}, to read compressed data directly from \textit{.bab} file into \Rpackage{beadarray} for analysis.\\ Similarly, we can extract the information contained in the original \textit{.locs} file, i.e. the bead centre coordinates in \textit{.locs} file order, by using the following command: <>= locs <- extractLocsFile(inputFile = "example.bab", path = tempPath) @ Using coordinates in the grid order provided by the \textit{.locs} can be useful for quickly determining neighbouring beads and is used in several instances by the \Rpackage{beadarray} package. \newpage \section*{Session Info} Here is the output of \Rfunction{sessionInfo} on the system on which this document was compiled: <>= sessionInfo() @ \end{document}