--- title: "How to use TDbasedUFEadv" author: - name: Y-h. Taguchi affiliation: Department of Physics, Chuo University, Tokyo 112-8551, Japan email: tag@granular.com output: BiocStyle::html_document: toc: true bibliography: references.bib vignette: > %\VignetteIndexEntry{How to use TDbasedUFEadv} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown() ``` ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, crop = NULL, comment = "#>" ) ``` ```{r setup} library(TDbasedUFEadv) library(Biobase) library(RTCGA.rnaseq) library(TDbasedUFE) library(MOFAdata) library(TDbasedUFE) library(RTCGA.clinical) ``` # Installation ``` {r, eval = FALSE} if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("TDbasedUFEadv") ``` # Integrated analyses of two omics profiles Here is a flowchart how we can make use of individual functions in [TDbasedUFE](https://doi.org/doi:10.18129/B9.bioc.TDbasedUFE) and TDbasedUFEadv. ![Relationship among functions in TDbasedUFE and TDbasedUFEadv](./flowchart.jpg) ## When features are shared. In order to make use of [TDbasedUFE](https://doi.org/doi:10.18129/B9.bioc.TDbasedUFE) for the drug repositioning, we previously proposed[@Taguchi2017] the integrated analysis of two gene expression profiles, each of which is composed of gene expression of drug treated one and disease one. At first, we try to prepare two omics profiles, expDrug and expDisease, that represent gene expression profiles of cell lines treated by various drugs and a cell line of diseases by ``` {r} Cancer_cell_lines <- list(ACC.rnaseq, BLCA.rnaseq, BRCA.rnaseq, CESC.rnaseq) Drug_and_Disease <- prepareexpDrugandDisease(Cancer_cell_lines) expDrug <- Drug_and_Disease$expDrug expDisease <- Drug_and_Disease$expDisease rm(Cancer_cell_lines) ``` expDrug is taken from RTCGA package and those associated with Drugs based upon [@Ding2016]. Those files are listed in drug_response.txt included in Clinical drug responses at http://lifeome.net/supp/drug_response/. expDisease is composed of files in BRCA.rnaseq, but not included in expDrug (For more details, see source code of prepareexpDrugandDisease). Then prepare a tensor as ```{r} Z <- prepareTensorfromMatrix( exprs(expDrug[seq_len(200), seq_len(100)]), exprs(expDisease[seq_len(200), seq_len(100)]) ) sample <- outer( colnames(expDrug)[seq_len(100)], colnames(expDisease)[seq_len(100)], function(x, y) { paste(x, y) } ) Z <- PrepareSummarizedExperimentTensor( sample = sample, feature = rownames(expDrug)[seq_len(200)], value = Z ) ``` In the above, sample are pairs of file IDs taken from expDrug and expDisease. Since full data cannot be treated because of memory restriction, we restricted the first two hundred features and the first one hundred samples, respectively (In the below, we will introduce how to deal with the full data sets). Then HOSVD is applied to a tensor as ``` {r} HOSVD <- computeHosvd(Z) ``` Here we tries to find if Cisplatin causes distinct expression (0: cell lines treated with drugs other than Cisplatin, 1: cell lines treated with Cisplatin) and those between two classes (1 vs 2) of BRCA (in this case, there are no meaning of two classes) within top one hundred samples. ``` {r} Cond <- prepareCondDrugandDisease(expDrug) cond <- list(NULL, Cond[, colnames = "Cisplatin"][seq_len(100)], rep(1:2, each = 50)) ``` Then try to select singular value vectors attributed to objects. When you try this vignettes, although you can do it in the interactive mode (see below), here we assume that you have already finished the selection. ```{r} input_all <- selectSingularValueVectorLarge(HOSVD,cond,input_all=c(2,9)) #Batch mode ``` In the case you prefer to select by yourself you can execute interactive mode. ``` input_all <- selectSingularValueVectorLarge(HOSVD,cond) ``` When you can see ``Next'', ``Prev'', and ``Select'' radio buttons by which you can performs selection as well as histogram and standard deviation optimization by which you can verify the success of selection interactively. Next we select which genes' expression is altered by Cisplatin. ```{r } index <- selectFeature(HOSVD,input_all,de=0.05) ``` You might need to specify suitable value for de which is initial value of standard deviation. Then we get the following plot. Finally, list the genes selected as those associated with distinct expression. ```{r} head(tableFeatures(Z,index)) ``` ```{r} rm(Z) rm(HOSVD) detach("package:RTCGA.rnaseq") rm(SVD) ``` The described methods were frequently used in the studies[@Taguchi2017a] [@Taguchi2018] [@Taguchi2020] by maintainers. ### Reduction of required memory using partial summation. In the case that there are large number of features, it is impossible to apply HOSVD to a full tensor (Then we have reduced the size of tensor). In this case, we apply SVD instead of HOSVD to matrix generated from a tensor as follows. In contrast to the above where only top two hundred features and top one hundred samples are included, the following one includes all features and all samples since it can save required memory because partial summation of features. ``` {r} SVD <- computeSVD(exprs(expDrug), exprs(expDisease)) Z <- t(exprs(expDrug)) %*% exprs(expDisease) sample <- outer( colnames(expDrug), colnames(expDisease), function(x, y) { paste(x, y) } ) Z <- PrepareSummarizedExperimentTensor( sample = sample, feature = rownames(expDrug), value = Z ) ``` Nest select singular value vectors attributed to drugs and cell lines then identify features associated with altered expression by treatment of Cisplatin as well as distinction between two classes. Again, it included all samples for expDrug and expDisease. ``` {r} cond <- list(NULL,Cond[,colnames="Cisplatin"],rep(1:2,each=dim(SVD$SVD$v)[1]/2)) ``` Next we select singular value vectors and optimize standard deviation as batch mode ```{r} index_all <- selectFeatureRect(SVD,cond,de=c(0.01,0.01), input_all=3) #batch mode ``` Again you need to select suitable de by trials and errors. For interactive mode, one should do ``` index_all <- selectFeatureRect(SVD,cond,de=c(0.01,0.01)) ``` but it is not possible in vignettes that does not allow interactive mode. Then you can see selected features as ```{r} head(tableFeatures(Z,index_all[[1]])) head(tableFeatures(Z,index_all[[2]])) ``` The upper one is for distinct expression between cell lines treated with Cisplatin and other cell lines and the lower one is for distinct expression between two classes of BRCA cell lines. Although they are highly coincident, not fully same ones (Row: expDrug, column:expDisease). ```{r} table(index_all[[1]]$index,index_all[[2]]$index) ``` Confusion matrix of features selected between expDrug and expDisease. The described methods were frequently used in the studies[@Taguchi2019a] by maintainers. ```{r} rm(Z) rm(SVD) ``` ## When samples are shared The above procedure can be used when two omics data that shares samples must be integrated. Prepare data set as ```{r} data("CLL_data") data("CLL_covariates") ``` (see vignettes QuickStart in [TDbasedUFE](https://doi.org/doi:10.18129/B9.bioc.TDbasedUFE) for more details about this data set). Generate tensor from matrix as in the above, but since not features but samples are shared between two matrices, the resulting Z has samples as features and features as samples, respectively. ```{r} Z <- prepareTensorfromMatrix( t(CLL_data$Drugs[seq_len(200), seq_len(50)]), t(CLL_data$Methylation[seq_len(200), seq_len(50)]) ) Z <- prepareTensorRect( sample = colnames(CLL_data$Drugs)[seq_len(50)], feature = list( Drugs = rownames(CLL_data$Drugs)[seq_len(200)], Methylatiion = rownames(CLL_data$Methylation)[seq_len(200)] ), sampleData = list(CLL_covariates$Gender[seq_len(50)]), value = Z ) ``` HOSVD was applied to Z as ```{r} HOSVD <- computeHosvd(Z) ``` ```{r} cond <- list(attr(Z,"sampleData")[[1]],NULL,NULL) ``` Condition is distinction between male and female (see QucikStart in TDbasedUFE package). Then try to find singular value vectors distinct between male and female in interactive mode. ```{r} index_all <- selectFeatureTransRect(HOSVD,cond,de=c(0.01,0.01), input_all=8) #batch mode ``` In the above, selection was supposed to be performed before executing the above, since vignettes does not allow interactive mode. In actual, you need to execute it in interactive mode ``` index_all <- selectFeatureTransRect(HOSVD,cond,de=c(0.01,0.01)) ``` and try to select iteratively. Selected features can be shown in the below. ```{r} head(tableFeaturesSquare(Z,index_all,1)) head(tableFeaturesSquare(Z,index_all,2)) ``` This method was used in the studies[@Taguchi2019] by the maintainer. ### Reduction of required memory using partial summation. As in the case where two omics profiles share features, in the case where two omics data share the samples, we can also take an alternative approach where SVD is applied to an matrix generated from a tensor by taking partial summation. ```{r} SVD <- computeSVD(t(CLL_data$Drugs), t(CLL_data$Methylation)) Z <- CLL_data$Drugs %*% t(CLL_data$Methylation) sample <- colnames(CLL_data$Methylation) Z <- prepareTensorRect( sample = sample, feature = list(rownames(CLL_data$Drugs), rownames(CLL_data$Methylation)), value = array(NA, dim(Z)), sampleData = list(CLL_covariates[, 1]) ) ``` Condition is also distinction between male (m) and female (f). ```{r} cond <- list(NULL,attr(Z,"sampleData")[[1]],attr(Z,"sampleData")[[1]]) ``` In order to apply the previous function to SVD, we exchange feature singular value vectors with sample singular value vectors. ```{r} SVD <- transSVD(SVD) ``` Then try to find which sample singular value vectors should be selected and which features are selected based upon feature singular value vectors corresponding to selected sample feature vectors. Although I do not intend to repeat whole process, we decided to select the sixth singular value vectors which are some what distinct between male and female. Since package does not allow us interactive mode, we place here batch mode. ```{r} index_all <- selectFeatureRect(SVD,cond,de=c(0.5,0.5),input_all=6) #batch mode ``` In the real usage, we can activate selectFeatureRect in interactive mode as well. ``` index_all <- selectFeatureRect(SVD,cond,de=c(0.5,0.5)) ``` Then we can list the Drugs and Methylation sites selected as being distinct between male and female. ```{r} head(tableFeaturesSquare(Z,index_all,1)) head(tableFeaturesSquare(Z,index_all,2)) ``` This method was used in many studies[@Taguchi2018a] [@Taguchi2020a] by maintainer. # Integrated analysis of multiple omics data Here is a flowchart how we can make use of individual functions in TDbasedUFE and TDbasedUFEadv. ![Relationship among functions in TDbasedUFE and TDbasedUFEadv](./flowchart2.jpg) ## When samples are shared As an alternative approach that can integrate multiple omics that share sample, we propose the method that makes use of projection provided by SVD. We prepare a tensor that is a bundle of the first ten singular value vectors generated by applying SVD to individual omics profiles. ```{r} data("CLL_data") data("CLL_covariates") Z <- prepareTensorfromList(CLL_data, 10L) Z <- PrepareSummarizedExperimentTensor( feature = character("1"), sample = array(colnames(CLL_data$Drugs), 1), value = Z, sampleData = list(CLL_covariates[, 1]) ) ``` Then HOSVD was applied to a tensor ```{r} HOSVD <- computeHosvd(Z,scale=FALSE) ``` Next we select singular value vectors attributed to samples. In order to select those distinct between male (m) and female (f), we set conditions as ```{r} cond <- list(NULL,attr(Z,"sampleData")[[1]],seq_len(4)) ``` But here in order to include TDbasedUFEadv into package, we are forced to execute function as batch mode as ```{r} input_all <- selectSingularValueVectorLarge(HOSVD, cond, input_all = c(12, 1) ) # batch mode ``` Interactive more can be activated as ``` input_all <- selectSingularValueVectorLarge(HOSVD,cond) ``` Although we do not intend to repeat how to use menu in interactive mode, please select the 12th one and the third one. Finally, we perform the following function to select features in individual omics profiles in an batch mode, since packaging does not allow interactive mode. ``` {r} HOSVD$U[[1]] <- HOSVD$U[[2]] index_all <- selectFeatureSquare(HOSVD, input_all, CLL_data, de = c(0.5, 0.1, 0.1, 1), interact = FALSE ) # Batch mode ``` In actual usage, you can activate interactive mode as ``` HOSVD$U[[1]] <- HOSVD$U[[2]] index_all <- selectFeatureSquare(HOSVD, input_all, CLL_data, de = c(0.5, 0.1, 0.1, 1) ) ``` Finally, we list the selected features for four omics profiles that share samples. ```{r} for (id in c(1:4)) { attr(Z, "feature") <- rownames(CLL_data[[id]]) print(tableFeatures(Z, index_all[[id]])) } ``` This method was used in many studies[@Taguchi2022] by maintainer. ## When features are shared Now we discuss what to do when multiple omics data share not samples but features. We prepare data set from RTCGA.rnaseq as follows, with retrieving reduced partial sets from four ones. One should notice that RTCGA is an old package from TCGA (as for 2015). I used it only for demonstration purpose. If you would like to use TCGA for your research, I recommend you to use more modern packages, e.g., curatedTCGAData in Bioconductor. ```{r} library(RTCGA.rnaseq) #it must be here, not in the first chunk Multi <- list( BLCA.rnaseq[seq_len(100), 1 + seq_len(1000)], BRCA.rnaseq[seq_len(100), 1 + seq_len(1000)], CESC.rnaseq[seq_len(100), 1 + seq_len(1000)], COAD.rnaseq[seq_len(100), 1 + seq_len(1000)] ) ``` Multi includes four objects, each of which is matrix that represent 100 samples (rows) and 1000 (features). Please note it is different from usual cases where columns and rows are features and samples, respectively. They are marge into tensor as follows ```{r} Z <- prepareTensorfromList(Multi,10L) Z <- aperm(Z,c(2,1,3)) ``` The function, prepareTeansorfromList which was used in the previous subsection where samples are shared, can be used as it is. However, the first and second modes of a tensor must be exchanged by aperm function for the latter analyses, because of the difference as mentioned in the above. Then tensor object associated with various information is generated as usual as follows and HOSVD was applied to it. ``` {r} Clinical <- list(BLCA.clinical, BRCA.clinical, CESC.clinical, COAD.clinical) Multi_sample <- list( BLCA.rnaseq[seq_len(100), 1, drop = FALSE], BRCA.rnaseq[seq_len(100), 1, drop = FALSE], CESC.rnaseq[seq_len(100), 1, drop = FALSE], COAD.rnaseq[seq_len(100), 1, drop = FALSE] ) # patient.stage_event.tnm_categories.pathologic_categories.pathologic_m ID_column_of_Multi_sample <- c(770, 1482, 773, 791) # patient.bcr_patient_barcode ID_column_of_Clinical <- c(20, 20, 12, 14) Z <- PrepareSummarizedExperimentTensor( feature = colnames(ACC.rnaseq)[1 + seq_len(1000)], sample = array("", 1), value = Z, sampleData = prepareCondTCGA( Multi_sample, Clinical, ID_column_of_Multi_sample, ID_column_of_Clinical ) ) HOSVD <- computeHosvd(Z) ``` In order to see which singular value vectors attributed to samples are used for the selection of singular value vectors attributed to features, we need to assign sample conditions. ```{r} cond<- attr(Z,"sampleData") ``` Since package does not allow us to include interactive mode, we place here batch mode as follows. Finally, selected feature are listed as follows. ``` {r} index <- selectFeatureProj(HOSVD,Multi,cond,de=1e-3,input_all=3) #Batch mode head(tableFeatures(Z,index)) ``` In actual, you can activate interactive mode as ``` par(mai=c(0.3,0.2,0.2,0.2)) index <- selectFeatureProj(HOSVD,Multi,cond,de=1e-3) ``` Although we do not intend to explain how to use menu interactively, we select the third singular value vectors as shown in above. This method was used in many studies[@Taguchi2021] by maintainer. ```{r} sessionInfo() ```