%\VignetteEngine{knitr::knitr} \documentclass[a4paper,9pt]{article} <>= BiocStyle::latex() @ %\VignetteIndexEntry{LACE} \usepackage[utf8]{inputenc} \usepackage{graphicx} \usepackage{placeins} \usepackage{url} \usepackage{tcolorbox} \usepackage{authblk} \begin{document} \title{Longitudinal Analysis of Cancer Evolution with LACE} \author[1]{Daniele Ramazzotti} \author[2]{Fabrizio Angaroni} \author[2,3]{Davide Maspero} \author[2]{Gianluca Ascolani} \author[4,5]{Isabella Castiglioni} \author[1]{Rocco Piazza} \author[2]{Marco Antoniotti} \author[5]{Alex Graudenzi} \affil[1]{School of Medicine and Surgery, Univ. of Milan-Bicocca, Monza, Italy.} \affil[2]{Dept. of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy.} \affil[3]{Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy.} \affil[4]{Department of Physics "Giuseppe Occhialini", Univ. of Milan-Bicocca, Milan, Italy.} \affil[5]{Inst. of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy.} \date{\today} \maketitle \begin{tcolorbox}{\bf Overview.} LACE (Longitudinal Analysis of Cancer Evolution) is an algorithmic framework that processes single-cell somatic mutation profiles from cancer samples collected at different time points and in distinct experimental settings, in order to describe the evolutionary history of the tumor. The tool can return an high resolution picture of clones' prevalence and their varietions, e.g., because of therapies. LACE can be employed to process single-cell mutational profiles as generated by calling variants from the increasingly available scRNA-seq data, such as the ones obtained by SMARTseq2 protocol. The output of the method is a longitudinal tree that best fits the input data, modelling both phylogenetic constraints and sc-RNAseq specific noise. Moreover, the package provides a suite of functions to visualize and explore the results. \vspace{1.0cm} {\em In this vignette, we give an overview of the package by presenting its main functions.} \vspace{1.0cm} \renewcommand{\arraystretch}{1.5} \end{tcolorbox} <>= library(knitr) opts_chunk$set( concordance = TRUE, background = "#f3f3ff" ) @ \newpage \tableofcontents \section{Using the LACE R package} We now present an example of longitudinal analysis of cancer evolution with LACE using single-cell data obtained from Rambow, Florian, et al. "Toward minimal residual disease-directed therapy in melanoma." Cell 174.4 (2018): 843-855. The data comprises point mutations for four time points: (1) before treatment, (2) 4 days treatment, (3) 28 days treatment and finally (4) 57 days treatment. We first load the data. <>= library("LACE") data(longitudinal_sc_variants) names(longitudinal_sc_variants) @ The input data (D) can be either a list as shown above or a SummarizedExperiment object. In the latter case, the object needs to include genomic data (0: mutation absent, 1: mutation detected or NA: missing information) in assay field. Following guidelines, the rows of such matrix are variants, while colums are single cells concatenated for all the time points. Furthermore, a rowData field needs to be specified and must be a data.frame with a column named "TimePoint" reporting the label of the experiment for each single cell; we highlight that the ordering of the labels in rowData must match the one of the columns in the assay field. We next show an example. <>= library("SummarizedExperiment",attach.required=FALSE) T1 = t(longitudinal_sc_variants[[1]]) T2 = t(longitudinal_sc_variants[[2]]) T3 = t(longitudinal_sc_variants[[3]]) T4 = t(longitudinal_sc_variants[[4]]) concat_time_point = cbind(T1,T2,T3,T4) TimePointLabes = c(rep("T1", ncol(T1)), rep("T2", ncol(T2)), rep("T3", ncol(T3)), rep("T4", ncol(T4))) longitudinal_SE = SummarizedExperiment(assays = concat_time_point, colData = data.frame(TimePoint = TimePointLabes)) print(longitudinal_SE) @ We setup the main parameter in oder to perform the inference. First of all, as the three data proint may potentially provide sequencing for an unbalanced number of cells, we weight each time point as follow $w_s=(1-\frac{n_s}{n_T})/(y-1)$ in order to account for this. In the formula, e.g., the weigth for time point $s$ ($w_s$) is calculated based on the number of cell observed in the time point ($n_s$) and the total number of cells in the threee time points ($n_T$). The denominator ($y-1$, with $y$ being the numnber of time points, i.e., 3 in our case) aims at normalizing the weights to sum to one. <>= lik_weights = c(0.2308772,0.2554386,0.2701754,0.2435088) @ The second main parameter to be defined as input is represented by the false positive and false negative error rates, i.e., alpha and beta. We can specify a different rate per time point as a list of rates. When multiple set of rates are provided, LACE performs a grid search in order to estimate the best set of error rates. <>= alpha = list() alpha[[1]] = c(0.02,0.01,0.01,0.01) alpha[[2]] = c(0.10,0.05,0.05,0.05) beta = list() beta[[1]] = c(0.10,0.05,0.05,0.05) beta[[2]] = c(0.10,0.05,0.05,0.05) head(alpha) head(beta) @ We can now perform the inference as follow. Notice that D can be either the list longitudinal sc variants or the SummarizedExperiment longitudinal SE. <>= inference = LACE(D = longitudinal_sc_variants, lik_w = lik_weights, alpha = alpha, beta = beta, keep_equivalent = FALSE, num_rs = 5, num_iter = 10, n_try_bs = 5, num_processes = NA, seed = 12345, verbose = FALSE) @ We notice that the inference resulting on the command above should be considered only as an example; the parameters num rs, num iter and n try bs representing the number of steps perfomed during the inference are downscaled to reduce execution time. We refer to the Manual for discussion on default values. We provide within the package results of inferences performed with correct parameters as RData. <>= data(inference) print(names(inference)) @ LACE returns a list of nine elements as results. Namely, B and C provide respectively the maximum likelihood longitudinal tree and cells attachments; corrected genotypes the corrected genotypes, clones prevalence, the estimated prevalence of any observed clone; relative likelihoods and joint likelihood the estimated likelihoods for each time point and the weighted likelihood; clones summary provide a summary of association of mutations to clones. In equivalent solutions, solutions (B and C) with likelihood equivalent to the best solution are returned; notice that in the example we disabled this feature by setting equivalent solutions parameter to FALSE. Finally, error rates provide the best error rates (alpha and beta) as estimated by the grid search. We can plot the inferred model using the function longitudinal.tree.plot. <>= clone_labels = c("ARPC2","PRAME","HNRNPC","COL1A2","RPL5","CCT8") longitudinal.tree = longitudinal.tree.plot(inference = inference, labels = "clones", clone_labels = clone_labels, legend_position = "topleft") @ \section{\Rcode{sessionInfo()}} <>= toLatex(sessionInfo()) @ \end{document}