--- title: "optimalFlowData: a data package for optimalFlow" author: - name: Hristo Inouzhe affiliation: Universidad de Valladolid, Spain email: hristo.inouzhe@gmail.com date: "`r format(Sys.time(), '%d %B, %Y')`" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{optimalFlow: optimal-transport approach to Flow Cytometry analysis} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} references: - id: optimalFlow title: 'optimalFlow: optimal-transport approach to Flow Cytometry analysis' author: - family: del Barrio given: Eustasio - family: Inouzhe given: Hristo - family: Loubes given: Jean-Michel - family: Mayo-Iscar given: Agustin - family: Matran given: Carlos URL: 'https://arxiv.org/abs/1907.08006' type: article-journal issued: year: 2019 month: 7 --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # Introduction *optimalFlowData* is a package containing 40 simulated flow cytometry datasets, saved as data frames, used for testing and developping examples for the package *optimalFlow* based on the results in @optimalFlow. The simulated cytometries are based on data that come from flow cytometry measurements obtained following the Euroflow protocols and kindly provided by Centro de Investigación del Cancer (CIC) in Salamanca, Spain. The artificial cytometries mimic 31 cytometries from healthy individuals and 9 cytometries from patients with different types of cancer. # Installation Installation procedure: ```{r ej0, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("optimalFlowData") ``` # Use ```{r ej01, eval = TRUE} library(optimalFlowData) head(Cytometry1) ``` We can create a database of gated cytometries containing. For simplicity and visualisation we only choose 4 cell types. For an example of a database, we select some of the cytometries, as is usual in machine learning, where a subset of the data is the learning set. ```{r ej1, eval = TRUE} database <- buildDatabase( dataset_names = paste0('Cytometry', c(2:5, 7:9, 12:17, 19, 21)), population_ids = c('Monocytes', 'CD4+CD8-', 'Mature SIg Kappa', 'TCRgd-')) ``` A plot of the data in a 3 dimensional subspace ```{r ej2, echo = TRUE} pairs(database[[1]][,c(4, 3, 9)], col = droplevels(database[[1]][, 11])) ``` The diagnosis for each cytometry is obtained as follows ```{r ej3, echo = TRUE} help("cytometry.diagnosis") # for an explanation of the abbreviations cytometry.diagnosis ``` # References