---
title: "Creating probe packages"
author: "Wolfgang Huber and Robert Gentleman"
date: "`r format(Sys.time(), '%B %d, %Y')`"
package: AnnotationForge
vignette: >
  %\VignetteIndexEntry{Creating probe packages}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
output:
    BiocStyle::html_document:
bibliography:
- references.bib
link-citations: true
---

# Overview

This document describes how to create a Bioconductor *probe* package from the
reporter sequence information of a particular chip. Probe packages are a
convenient way for distributing and storing the probe sequences and related
information.

First, let us load the `r Biocpkg("AnnotationForge")` package.

```{r results=FALSE, message=FALSE}
library("AnnotationForge")
```

## For Affymetrix genechips 

In this section we see how a probe package can be created for Affymetrix
genechips from the tabulator-separated sequence files that can be obtained from
the vendor (at <https://www.thermofisher.com/us/en/home/support.html>). As an
example, the file `HG-U95Av2_probe_tab.gz` is provided in the `extdata`
subdirectory of the `r Biocpkg("AnnotationForge")` package.

```{r makeprobepackage}
filename <- system.file("extdata", "HG-U95Av2_probe_tab.gz", 
                        package="AnnotationForge")
outdir   <- tempdir()
me       <- "Wolfgang Huber <w.huber@dkfz.de>"
species  <- "Homo_sapiens"
makeProbePackage("HG-U95Av2",
                 datafile   = gzfile(filename, open="r"),
                 outdir     = outdir,
                 maintainer = me,
                 species    = species,
                 version    = "0.0.1")
dir(outdir)
```

## For other chiptypes {#subsec.otherarraytypes}

To deal with different file formats and additional types of probe annotation
data from public or in-house databases, the function `makeProbePackage` offers a
great deal of flexibility. The user can specify her own import function through
the `importfun` argument. By default, its value is `getProbeDataAffy`, a
function that reads tabular Affymetrix genechip sequence files. Import functions
for other types of arrays can be adapted from this prototype.

The help pages and R code contained in the produced packages are derived from a
template directory that obeys the usual R package conventions
[-@Rfoundation1999]. The input parameters of an import function are A prototype
for such a directory is provided within the package
`r Biocpkg("AnnotationForge")`. To facilitate the automated production of large
numbers of similar packages, we provide a text substitution mechanism similar to
the one used in the GNU `configure` system.

- a character string naming the array type

- the input data files

- a character string with the directory name of a package template directory,
containing at least a file `DESCRIPTION`, a directory `man` with a file
`package.Rd`, a directory `data`, and possible other directories and files,
conforming to the usual R package conventions.

- ... any sort of further parameters, as necessary.

The output of an import function is a named list with elements

- `pkgname`: a character string with the package name.

- `dataEnv`: an environment containing an arbitrary number of data objects;
these make the core of the package. Among these, there should be a data frame
whose name is the value of `pkgname` with a column `sequence`. This data frame
may have other columns such as the $x$- and $y$-position of the probes on the
array, identifiers linking a probe to genomic databases, or its length and
relative position within the gene it represents. The objects in the environment
will be saved as individual `.rda` files of the same name into the data
directory of the produced package. Documentation needs to be provided for the
columns of the data frame, as well as for the other objects.

- `symVals`: a named list, containing the symbol-value substitutions that are
  used in the text processing. It must at least contain the elements

  -  \`ARRAYTYPE`: name of the array

  -  `DATASOURCE`: a textual description of how the data were
     obtained. It should contain the URL, or the name of the company
     / person.

For more details, please refer to the help files for the functions
`makeProbePackage` and `getProbeDataAffy`. For an example, refer to the source 
code of `getProbeDataAffy`.

# References {-}