--- title: "Vignette illustrating the use of MOFA on simulated data" author: "Britta Velten and Ricard Argelaguet" output: BiocStyle::html_document: toc: true package: MOFA vignette: > %\VignetteIndexEntry{MOFA: How to assess model robustness and do model selection} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Simulate an example data set To illustrate the MOFA workflow we simulate a small example data set with 3 different views. `makeExampleData` generates an untrained MOFAobject containing the simulated data. If you work on your own data use `createMOFAobject` to create the untrained MOFA object (see our vignettes for CLL and scMT data). ```{r, message=FALSE} library(MOFA) ``` By default the function `makeExampleData` produces a small data set containing 3 views with 100 features each and 50 samples. These parameters can be varied using the arguments `n_views`, `n_features` and `n_samples`. ```{r} set.seed(1234) data <- makeExampleData() MOFAobject <- createMOFAobject(data) MOFAobject ``` # Prepare MOFA: Set the training and model options Once the untrained MOFAobject was created, we can specify details on data processing, model specifications and training options such as the number of factors, the likelihood models etc. Default option can be obtained using the functions `getDefaultTrainOptions`, `getDefaultModelOptions` and `getDefaultDataOptions`. We describe details on these option in our vignettes for CLL and scMT data. Using `prepareMOFA` the model is set up for training. ```{r} TrainOptions <- getDefaultTrainOptions() ModelOptions <- getDefaultModelOptions(MOFAobject) DataOptions <- getDefaultDataOptions() TrainOptions$DropFactorThreshold <- 0.01 ``` # Run MOFA Once the MOFAobject is set up we can use `runMOFA` to train the model. As depending on the random initilization the results might differ, we recommend to use runMOFA multiple times (e.g. ten times, here we use a smaller number for illustration as the model training can take some time). As a next step we will compare the different fits and select the best model for downstream analyses. ```{r} n_inits <- 3 MOFAlist <- lapply(seq_len(n_inits), function(it) { TrainOptions$seed <- 2018 + it MOFAobject <- prepareMOFA( MOFAobject, DataOptions = DataOptions, ModelOptions = ModelOptions, TrainOptions = TrainOptions ) runMOFA(MOFAobject) }) ``` # Compare different random inits and select the best model Having a list of trained models we can use `compareModels` to get an overview of how many factors were inferred in each run and what the optimized ELBO value is (a model with larger ELBO is preferred). ```{r} compareModels(MOFAlist) ``` With `compareFactors` we can get an overview of how robust the factors are between different model instances. ```{r} compareFactors(MOFAlist) ``` For down-stream analyses we recommned to choose the model with the best ELBO value as is done by `selectModel`. ```{r} MOFAobject <- selectModel(MOFAlist, plotit = FALSE) MOFAobject ``` # Downstream analyses On the trained MOFAobject we can now start looking into the inferred factors, its weights etc. Here the data was generated using five factors, whose activity patterns we can recover using `plotVarianceExplained`. ```{r} plotVarianceExplained(MOFAobject) ``` For details on downstream analyses please have a look at the vignettes on the CLL data and scMT data. # SessionInfo ```{r} sessionInfo() ```