--- title: "An Introduction to the ImmuneSpaceR Package" author: "Renan Sauteraud" date: "`r Sys.Date()`" output: html_vignette: toc: true number_sections: true vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{An Introduction to the ImmuneSpaceR Package} --- ```{r knitr, echo=FALSE, cache = FALSE} library(knitr) library(rmarkdown) opts_chunk$set(cache = FALSE, fig.align = "center", fig.width = 7, fig.height = 5) ``` ```{r netrc_req, echo = FALSE} # This chunk is only useful for BioConductor checks and shouldn't affect any other setup if (!any(file.exists("~/.netrc", "~/_netrc"))) { labkey.netrc.file <- ImmuneSpaceR:::.get_env_netrc() labkey.url.base <- ImmuneSpaceR:::.get_env_url() } ``` This package provides a *thin* wrapper around [`Rlabkey`](https://cran.r-project.org/web/packages/Rlabkey/index.html) and connects to the [ImmuneSpace](https://www.immunespace.org) database, making it easier to fetch datasets, including gene expression data, HAI, and so forth, from specific studies. # Configuration In order to connect to ImmuneSpace, you will need a netrc file in your home directory. **Set up your netrc file now!** If you're not familiar with the command-line interface, there is the `interactive_netrc()` function to set up your netrc file. See [the `interactive_netrc` vignette](https://rglab.github.io/ImmuneSpaceR/articles/interactiveNetrc.html). Or create netrc file in the computer running R: * On a **UNIX** system, this file should be named `.netrc` * On **Windows**, it sould be named `_netrc` * The file should be located in the user's **home** directory * To determine home directory, run `Sys.getenv("HOME")` in R * The permissions on this file should be unreadable for everybody except the owner The following three lines must be included in the `.netrc` or `_netrc` file either separated by white space (spaces, tabs, or newlines) or commas. ``` machine www.immunespace.org login myUser@mySite.com password superSecretPassword ``` Multiple such blocks can exist in one file. Please ensure that the machine name in the netrc file contains the "www" prefix as that is how the package connects to immunespace by default. A mismatch will lead to connection failures. See [the official LabKey documentation](https://www.labkey.org/wiki/home/Documentation/page.view?name=netrc) for more information. # Instantiate a connection We'll be looking at study `SDY269`. If you want to use a different study, change that string. The connections have state, so you can instantiate multiple connections to different studies simultaneously. ```{r CreateConnection, cache=FALSE, message=FALSE} library(ImmuneSpaceR) sdy269 <- CreateConnection(study = "SDY269") sdy269 ``` The call to `CreateConnection` instantiates the connection. Printing the object shows where it's connected, to what study, and the available data sets and gene expression matrices. Note that when a script is running on ImmuneSpace, some variables set in the global environments will automatically indicate which study should be used and the `study` argument can be skipped. # Fetching datasets We can grab any of the datasets listed in the connection. ```{r getDataset} sdy269$getDataset("hai") ``` The *sdy269* object is an **R6** class, so it behaves like a true object. Methods (like `getDataset`) are members of the object, thus the `$` semantics to access member functions. The first time you retrieve a dataset, it will contact the database. The data is cached in the object, so the next time you call `getDataset` on the same dataset, it will retrieve the cached local copy. This is much faster. To get only a subset of the data and speed up the download, filters can be passed to `getDataset`. The filters are created using the `makeFilter` function of the `Rlabkey` package. ```{r getDataset-filter, message = FALSE} library(Rlabkey) myFilter <- makeFilter(c("gender", "EQUAL", "Female")) hai <- sdy269$getDataset("hai", colFilter = myFilter) ``` See `?Rlabkey::makeFilter` for more information on the syntax. For more information about `getDataset`'s options, refer to the dedicated vignette. # Fetching expression matrices We can also grab a gene expression matrix ```{r getGEMatrix} sdy269$getGEMatrix("SDY269_PBMC_LAIV_Geo") ``` The object contacts the database and downloads the matrix file. This is stored and cached locally as a `data.table`. The next time you access it, it will be much faster since it won't need to contact the database again. It is also possible to call this function using multiple matrix names. In this case, all the matrices are downloaded and combined into a single `ExpressionSet`. ```{r getGEMatrix-multiple} sdy269$getGEMatrix(c("SDY269_PBMC_TIV_Geo", "SDY269_PBMC_LAIV_Geo")) ``` Finally, the summary argument will let you download the matrix with gene symbols in place of probe ids. ```{r getGEMatrix-summary} gs <- sdy269$getGEMatrix("SDY269_PBMC_TIV_Geo", outputType = "summary", annotation = "latest") ``` If the connection was created with `verbose = TRUE`, some methods will display additional informations such as the valid dataset names. # Plotting A plot of a dataset can be generated using the `plot` method which automatically chooses the type of plot depending on the selected dataset. ```{r plotting} sdy269$plot("hai") sdy269$plot("elisa") ``` However, the `type` argument can be used to manually select from "boxplot", "heatmap", "violin" and "line". # Cross study connections To fetch data from multiple studies, simply create a connection at the project level. ```{r cross-connection} con <- CreateConnection("") ``` This will instantiate a connection at the `Studies` level. Most functions work cross study connections just like they do on single studies. You can get a list of datasets and gene expression matrices available accross all studies. ```{r cross-connection-print} con ``` In cross-study connections, `getDataset` and `getGEMatrix` will combine the requested datasets or expression matrices. See the dedicated vignettes for more information. Likewise, `plot` will visualize accross studies. Note that in most cases the datasets will have too many cohorts/subjects, making the filtering of the data a necessity. The `colFilter` argument can be used here, as described in the `getDataset` section. ```{r cross-connection-qplot} plotFilter <- makeFilter( c("cohort", "IN", "TIV 2010;TIV Group 2008"), c("study_time_collected", "EQUALS", "7") ) con$plot("elispot", filter = plotFilter) ``` The figure above shows the ELISPOT results for two different years of TIV vaccine cohorts from two different studies. # Session info ```{r sessionInfo} sessionInfo() ```