---
title: "Loading and re-analysing public data through ReactomeGSA"
author: "Johannes Griss"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Loading and re-analysing public data through ReactomeGSA}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Introduction

Since October 2023, ReactomeGSA was extended to simplify the reuse of public data. As key features,
ReactomeGSA can now directly load data from **EBI's ExpressionAtlas**, and **NCBI's GREIN**. Both of 
these resources reprocess available public datasets using consistent pipelines.

Additionally, a search function was integrated into ReactomeGSA that can search for datasets simultaneously
in all of these supported resources.

The ReactomeGSA R package now also has all required functions to directly access this web-based service. It
is thereby possible to search for public datasets directly and download them as **ExpressionSet** objects.

## Installation

The `ReactomeGSA` package can be directly installed from Bioconductor:

```{r, eval=FALSE }
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

if (!require(ReactomeGSA))
  BiocManager::install("ReactomeGSA")
```

For more information, see https://bioconductor.org/install/.

## Searching for Public Datasets

The `find_public_datasets` function uses ReactomeGSA's web service to search for public datasets
in all supported resources.

By default, the datasets are limited to human studies. This can be changed by setting the `species`
parameter. The complete list of available species is returned by the `get_public_species` function.

```{r}
library(ReactomeGSA)

# get all available species found in the datasets
all_species <- get_public_species()

head(all_species)
```

The `search_term` parameter takes a single string as an argument. Words separated by a space are
logically combined using an **AND**.
 
```{r}
# search for datasets on BRAF and melanoma
datasets <- find_public_datasets("melanoma BRAF")

# the function returns the found datasets as a data.frame
datasets[1:4, c("id", "title")]
```

## Load a public dataset

Datasets found through the `find_public_datasets` function can subsequently
loaded using the `load_public_dataset` function.

```{r}
# find the correct entry in the search result
# this must be the complete row of the data.frame returned
# by the find_public_datasets function
dataset_search_entry <- datasets[datasets$id == "E-MTAB-7453", ]

str(dataset_search_entry)
```

The selected dataset can now be loaded through the `load_public_dataset` function.

```{r}
# this function only takes one argument, which must be
# a single row from the data.frame returned by the
# find_public_datasets function
mel_cells_braf <- load_public_dataset(dataset_search_entry, verbose = TRUE)
```

The returned object is an `ExpressionSet` object that already contains
all available metada.

```{r}
# use the biobase functions to access the metadata
library(Biobase)

# basic metadata
pData(mel_cells_braf)
```

Detailed descriptions of the loaded study are further stored in the metadata slot.

```{r}
# access the stored metadata using the experimentData function
experimentData(mel_cells_braf)

# for some datasets, longer descriptions are available. These
# can be accessed using the abstract function
abstract(mel_cells_braf)
```


Additionally, you can use the `table` function to quickly get the number
of available samples for a specific metadata field.

```{r}
table(mel_cells_braf$compound)
```

## Perform the pathway analysis using ReactomeGSA

This object is now directly compatible with ReactomeGSA's pathway
analysis functions. A detailed explanation of how to perform
this analysis, please have a look at the respective vignette.

```{r}
# create the analysis request
my_request <-ReactomeAnalysisRequest(method = "Camera")

# do not create a visualization for this example
my_request <- set_parameters(request = my_request, create_reactome_visualization = FALSE)

# add the dataset using the loaded object
my_request <- add_dataset(request = my_request, 
                          expression_values = mel_cells_braf, 
                          name = "E-MTAB-7453", 
                          type = "rnaseq_counts",
                          comparison_factor = "compound", 
                          comparison_group_1 = "PLX4720", 
                          comparison_group_2 = "none")

my_request
```

The analysis can now started using the standard workflow:

```{r}
# perform the analysis using ReactomeGSA
res <- perform_reactome_analysis(my_request)

# basic overview of the result
print(res)

# key pathways
res_pathways <- pathways(res)

head(res_pathways)
```

## Session Info

```{r}
sessionInfo()
```