<!--
%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{Making New Organism Packages}
-->


```{r setup, echo=FALSE}
library(knitr)
options(width=80)
```
```{r wrap-hook, echo=FALSE}
hook_output = knit_hooks$get('output')
knit_hooks$set(output = function(x, options) {
  # this hook is used only when the linewidth option is not NULL
  if (!is.null(n <- options$linewidth)) {
    x = knitr:::split_lines(x)
    # any lines wider than n should be wrapped
    if (any(nchar(x) > n)) x = strwrap(x, width = n)
    x = paste(x, collapse = '\n')
  }
  hook_output(x, options)
})
```


# Making Organism packages

### by Marc Carlson

## Overview

Making Organism Packages is a straightforward process using the helper
functions makeOrgPackageFromNCBI() and makeOrgPackage().  If your
package is available at NCBI with an identifiable NCBI Taxonomy ID you
can try makeOrgPackageFromNCBI().  However, even if this fails, the
second and more general makeOrgPackage() function will allow you to
make a database package using only data.frames of data.


## Making use of makeOrgPackageFromNCBI()

The makeOrgPackageFromNCBI() function, will take several different
arguments to help declare who made the package and what species it is
for.  But the most important arguement is the tax_id arguement.  That
arguement is uses to search for gene records from NCBI that go with
that species.  So when calling this function it is important to choose
the correct NCBI taxonomny ID for this argument.  You can look up
information about the taconomy ID at NCBIs website.  Here is an
example of calling this function to make a package for zebrafinch.


```{r makeOrgPackageFromNCBI, eval=FALSE}
library(AnnotationForge)
makeOrgPackageFromNCBI(version = "0.1",
                       author = "Some One <so@someplace.org>",
                       maintainer = "Some One <so@someplace.org>",
                       outputDir = ".",
                       tax_id = "59729",
                       genus = "Taeniopygia",
                       species = "guttata")
```


## Making use of makeOrgPackage()

Sometimes you may not find what you need at NCBI, most commonly this
is because they may just not have enough data about the organism you
are interested in.  But often other resources will have annotation
data that you want to make into an organism package.  When this
happens you can use the much more general makeOrgPackage() function.
This function takes more arguments, but it does not rely on NCBI in
order to run.  We do however still ask for you to provice a tax ID for
the metadata (even though we are not using it to look up data from
NCBI).  Many of the other arguments are also the same as the
makeOrgPackageFromNCBI() function.  But a key difference is that the
1st argument for this function is (...).  For that argument, we want
you to provide named arguments corresponding to data.frames of
data. Each named argument will become a table name in the resulting
database, and each field name (for the data.frames) will become the
field names of the database as well as the names looked up by the
columns() and keytypes() methods.  With the exception of any table
that is named by the goTable argument (more on this below), there are
not too many restrictions on what kind of data you can put into the
data.frame.  But one rule you must follow is that the 1st collumn of
each data.frame has to correspond to a central gene ID and be labeled
"GID".

Finally, the goTable method is also new.  That argument indicates when
one of the data.frames contains GO information.  If you choose to use
this argument, makeOrgPackage() will post-process your GO data to 1)
remove IDs that are too new and 2) create a second table to also
represent the GOALL, EVIDENCEALL and ONTOLOGYALL fields for the select
method etc.  However to use the goTable argument, you have to follow a
strict convention with the data.  Such a data.frame must have three
columns only and these must correspond to the gene id, GO id and
evidence codes.  These columns also have to be named as "GID", "GO"
and "EVIDENCE" Below is an example that parses an example file into
three data.frame and that makes use of the goTable argument.



```{r makeOrgPackage, eval=FALSE}
## Makes an organism package for Zebra Finch data.frames:
finchFile <- system.file("extdata","finch_info.txt",
		         package="AnnotationForge")
finch <- read.table(finchFile,sep="\t")
    
## Now prepare some data.frames
fSym <- finch[,c(2,3,9)]
fSym <- fSym[fSym[,2]!="-",]
fSym <- fSym[fSym[,3]!="-",]
colnames(fSym) <- c("GID","SYMBOL","GENENAME")
     
fChr <- finch[,c(2,7)]
fChr <- fChr[fChr[,2]!="-",]
colnames(fChr) <- c("GID","CHROMOSOME")
     
finchGOFile <- system.file("extdata","GO_finch.txt",
   			   package="AnnotationForge")
fGO <- read.table(finchGOFile,sep="\t")
fGO <- fGO[fGO[,2]!="",]
fGO <- fGO[fGO[,3]!="",]
colnames(fGO) <- c("GID","GO","EVIDENCE")
     
## Then call the function
makeOrgPackage(gene_info=fSym, chromosome=fChr, go=fGO,
               version="0.1",
               maintainer="Some One <so@someplace.org>",
               author="Some One <so@someplace.org>",
               outputDir = ".",
               tax_id="59729",
               genus="Taeniopygia",
               species="guttata",
               goTable="go")
     
## then you can call install.packages based on the return value
install.packages("./org.Tguttata.eg.db", repos=NULL)
```