---
title: "Customize BioCarta Pathway Images"
author: "Zuguang Gu (z.gu@dkfz.de)"
date: '`r Sys.Date()`'
output:
  BiocStyle::html_document:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Customize BioCarta Pathway Images}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, eval = TRUE, echo = FALSE}
library(knitr)
knitr::opts_chunk$set(
    fig.width = 7,
    fig.height = 7,
    error = FALSE,
    tidy  = FALSE,
    message = FALSE,
    crop = NULL
)
```

```{r, echo = FALSE}
knitr::knit_hooks$set(pngquant = knitr::hook_pngquant)

knitr::opts_chunk$set(
  message = FALSE,
  dev = "ragg_png",
  fig.align = "center",
  pngquant = "--speed=10 --quality=30"
)
```

<style>
p img {
    border: 1px solid black;
    padding: 0px;
}
</style>

# Introduction

BioCarta is a valuable source of biological pathways which not only provides
well manually curated pathways, but also remarkable and intuitive pathway images.
One useful features of pathway analysis which is to highlight genes of
interest on the pathway images is lost. Since the original source of
BioCarta (biocarte.com) is lost from the internet, we digged out the data from
the internet archive and formatted it into a package.

# Preprocessing

The BioCarta data is collected from
[web.archive.org](https://web.archive.org/web/20170122225118/https://cgap.nci.nih.gov/Pathways/BioCarta_Pathways).
This is an archive of BioCarta's successor website cgap.nci.nih.gov which is
also retired from internet. The snapshot was taken on 2017-01-22. The script is
also shipped in the package:

```{r}
system.file("script", "process.R", package = "BioCartaImage")
```

The core data of this package is the coordinates of proteins in the pathway images. This information
is included in the HTML code (in the `<map>/<area>` tags) of the web page of a certain pathway. We use the
**rvest** package to extract such information.

# Get pathways

The total pathways in the BioCarta database:

```{r}
library(BioCartaImage)
ap = all_pathways()
length(ap)
head(ap)
```

A single pathway can be obtained by providing the pathway ID. It prints two numbers:

- The number of nodes without removing duplicated ones. 
- The number of unique genes that are mapped to the pathway.


```{r}
p = get_pathway("h_RELAPathway")
p
```

[MSigDB](https://www.gsea-msigdb.org/gsea/msigdb/human/genesets.jsp?collection=CP:BIOCARTA) is also a popular resource for
BioCarta pathway analysis. Here we also support MSigSB IDs for the BioCarta pathways. The MSigDB ID is very similar
to the original BioCarta ID:

```{r}
# MSigDB ID
get_pathway("BIOCARTA_RELA_PATHWAY")
```


The pathway object `p` is actually a very simple list which contains
coordinates of member nodes.

```{r}
str(p)
```

As the users, they do not need to touch the internal part of `p`, but the elements in the list are explained as follows:

- `id`: The pathway ID.
- `name`: The pathway name.
- `bc`: The nodes in the original BioCarta pathways are proteins and some of them do not have one-to-one
            mapping to genes, such as protein families or complex. Here `bc` contains the primary IDs of proteins/single nodes in 
            the pathway. The mapping to genes can be obtained by `genes_in_pathway()`.
- `shape`: The shape of the corresponding protein/node in the pathway image.
- `coords`: It is a list of integer vectors, which contains coordinates of the corresponding shapes, in the unit of pixels.
           This information is retrieved from the HTML source code (in the `<area>` tag), so the the coordinates start from 
           the top left of the image. The format of the coordinate vectors is `c(x1, y1, x2, y2, ...)`.
- `image_file`: The file name of the pathway image.

As we have already explained in the previous text, the basic units in pathways
are proteins/nodes, while not directly genes. Thus, the so-called "bc_id" is
used as the primary ID in the package. However, for users, they do not need
to touch all these details. They just directly interact with genes and pathways, the mapping
from genes to "bc_ids" and then to pathways is done automatically in the package.

Similar as many other packages which contain BioCarta gene sets, the member
genes of a pathway can be obtained by `genes_in_pathway()`. You can provide
the pathway ID or the pathway object. The EntreZ ID is used as the gene ID
type.

```{r}
genes_in_pathway("h_RELAPathway")
genes_in_pathway(p)
```

# Plot the pathway

Next, let's move to the main functionality of this package: customizing the
pathway.

First, as many other **grid** plotting functions, `grid.biocarta()` draws a
pathway (where the pathway image is imported as a `raster` object internally).

```{r, crop = NULL}
library(grid)
grid.newpage()
grid.biocarta("h_RELAPathway", color = c("1387" = "yellow"))
```

You can specify the location and how the image is aligned to the anchor point.


```{r, crop = NULL}
grid.newpage()
grid.biocarta("h_RELAPathway", 
    x = unit(0.2, "npc"), y = unit(0.9, "npc"),
    just = c("left", "top"),
    color = c("1387" = "yellow"),
    width = unit(6, "cm"))
```

You can also first create a viewport, then draw the pathway inside it.


```{r, crop = NULL}
grid.newpage()
pushViewport(viewport(width = 0.7, height = 0.5))
grid.biocarta("h_RELAPathway", color = c("1387" = "yellow"))
popViewport()
```

As the aspect ratio of the image is fixed, you can either set `width` or
`height`. If both are set, the size of the image is internally adjusted to let
the image maximally fill the plotting region.

One of the main use of the pathway image is to highlight genes of interest.
The simple use is to set the `color` argument which is a named vector where
gene EntreZ ID are names. When the colors are set, the genes are highlighted
with dashed colored borders.

```{r, crop = NULL}
grid.newpage()
grid.biocarta("h_RELAPathway", color = c("1387" = "yellow"))
```

As normally BioCarta pathway images are colorful, it is quite difficult to
find a proper color to be distinguished from other genes. There is a more
flexible way in the package which allows to add self-defined graphics over or besides
the genes.

To edit the pathway image, we create the pathway grob first ("grob" is short for "graphic object").

```{r}
grob = biocartaGrob("h_RELAPathway")
```

The object `grob` basically contains a viewport and a raster image object. Later we
can add more graphics for single genes to it.

Graphics for single genes are added by the function `mark_gene()`. You need to
provide the pathway grob, the gene EntreZ ID and a self-defined graphics
function. As you can imagine, the input of the function is the coordinate of
the polygon of the gene in forms of two vectors: the x-coordinates and the
y-coordinates.

There are two ways to implement the graphics function. First, the function
directly returns a grob object. Later this grob is inserted to the global
pathway grob.

There is a helper function `pos_by_polygon()` which returns the position of a
certain side of the polygon.

In the following code, we add a yellow point to the left side of gene "1387"
(CBP in the image).

The graphics are drawn in the pathway image viewport which already has a
coordinate system associated. the "xscale" and "yscale" correspond to the
numbers of pixels horizontally and vertically. So `unit(1, "native")` means 1
pixel in the original image.

```{r, crop = NULL}
grid.newpage()
grob2 = mark_gene(grob, "1387", function(x, y) {
    pos = pos_by_polygon(x, y, where = "left")
    pointsGrob(pos[1], pos[2], default.units = "native",
        pch = 16, gp = gpar(col = "yellow"))
})
grid.draw(grob2)
```

If you have complicated graphics, you can consider to use `gTree()` and
`gList()` to combine them.

If you are not familiar with `gTree()` and `gList()` or `*Grob()` functions.
You can directly use the grid plotting functions such as `grid.points()` or
`grid.lines()`. In this case, you have to set `capture` to `TRUE`, then the
graphics will be captured as grobs internally.


```{r, crop = NULL}
grid.newpage()
grob3 = mark_gene(grob, "1387", function(x, y) {
    pos = pos_by_polygon(x, y, where = "left")
    grid.points(pos[1], pos[2], default.units = "native",
        pch = 16, gp = gpar(col = "yellow"))
}, capture = TRUE)
grid.draw(grob3)
```

With this functionality, you can implement complicated graphics to associate
 a gene. In the following example, we create a viewport and put it to the
left of the gene.

```{r, crop = NULL}
grid.newpage()
grob4 = mark_gene(grob, "1387", function(x, y) {
    pos = pos_by_polygon(x, y)
    pushViewport(viewport(x = pos[1] - 10, y = pos[2], 
        width = unit(4, "cm"), height = unit(4, "cm"), 
        default.units = "native", just = "right"))
    grid.rect(gp = gpar(fill = "red"))
    grid.text("add whatever\nyou want here")
    popViewport()
}, capture = TRUE)
grid.draw(grob4)
```


# Session info

```{r}
sessionInfo()
```