TCMC 0.99.0
TCMC
TCMC package (Mukesha, 2024) will be soon available on Bioconductor.
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("TCMC")
BiocManager::valid()
TCMC
I hope that TCMC will be useful for research. Please use the following information to cite the package and the overall approach. Thank you!
citation("TCMC")
#> To cite package 'TCMC' in publications use:
#>
#> Mukesha D (2024). _TCMC: Compare Classification Models_. R package
#> version 0.99.0, <https://github.com/danymukesha/TCMC>.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {TCMC: Compare Classification Models},
#> author = {Dany Mukesha},
#> year = {2024},
#> note = {R package version 0.99.0},
#> url = {https://github.com/danymukesha/TCMC},
#> }
TCMC
library(TCMC)
library(mlbench)
data(PimaIndiansDiabetes)
str(PimaIndiansDiabetes)
#> 'data.frame': 768 obs. of 9 variables:
#> $ pregnant: num 6 1 8 1 0 5 3 10 2 8 ...
#> $ glucose : num 148 85 183 89 137 116 78 115 197 125 ...
#> $ pressure: num 72 66 64 66 40 74 50 0 70 96 ...
#> $ triceps : num 35 29 0 23 35 0 32 0 45 0 ...
#> $ insulin : num 0 0 0 94 168 0 88 0 543 0 ...
#> $ mass : num 33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 0 ...
#> $ pedigree: num 0.627 0.351 0.672 0.167 2.288 ...
#> $ age : num 50 31 32 21 33 30 26 29 53 54 ...
#> $ diabetes: Factor w/ 2 levels "neg","pos": 2 1 2 1 2 1 2 1 2 2 ...
# for this example only LVQ and GBM are being tested
results <- model_comparer(PimaIndiansDiabetes, "diabetes", for_utest = TRUE)
# plot variable importance for a specific model
plot_importance(results$trained_models$lvq, "LVQ")
plot_importance(results$trained_models$rf, "RF")
# access trained models
models_results <- resamples(results$trained_models)
summary(models_results)
#>
#> Call:
#> summary.resamples(object = models_results)
#>
#> Models: lvq, rf
#> Number of resamples: 30
#>
#> Accuracy
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> lvq 0.5967742 0.6557377 0.6854839 0.6958664 0.7213115 0.8709677 0
#> rf 0.5901639 0.7287811 0.7741935 0.7636524 0.8000397 0.8870968 0
#>
#> Kappa
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> lvq -0.01572739 0.1838918 0.2599161 0.2711357 0.3547419 0.7181818 0
#> rf 0.03785489 0.4163366 0.4809856 0.4693944 0.5515619 0.7654054 0
bwplot(models_results)
best_model <- results$performance$rf
best_model
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction pos neg
#> pos 33 20
#> neg 16 84
#>
#> Accuracy : 0.7647
#> 95% CI : (0.6894, 0.8294)
#> No Information Rate : 0.6797
#> P-Value [Acc > NIR] : 0.01348
#>
#> Kappa : 0.471
#>
#> Mcnemar's Test P-Value : 0.61708
#>
#> Sensitivity : 0.6735
#> Specificity : 0.8077
#> Pos Pred Value : 0.6226
#> Neg Pred Value : 0.8400
#> Prevalence : 0.3203
#> Detection Rate : 0.2157
#> Detection Prevalence : 0.3464
#> Balanced Accuracy : 0.7406
#>
#> 'Positive' Class : pos
#>
SummarizedExperiment
Example integrated with SummarizedExperiment (Morgan, Obenchain, Hester, and Pagès, 2024) with machine learning workflows for evaluating treatment effectiveness.
Scenario description: In this example, we investigate the effectiveness of different treatments in influencing positive outcomes for a simulated clinical dataset. Each treatment corresponds to a class of drugs (e.g., TZD and DPP-4), and the outcome variable indicates whether the response to treatment was positive or negative. Using the SummarizedExperiment (Morgan, Obenchain, Hester et al., 2024) class from Bioconductor, we will preprocess the data, train machine learning models, and analyze the most impactful features and models.
Data simulation: The dataset contains measurements of sugar levels across eight samples, along with metadata describing treatment classes and outcomes. The SummarizedExperiment (Morgan, Obenchain, Hester et al., 2024) object is used to organize and manage this data.
library(SummarizedExperiment)
# Simulate data
nrows <- 200 # Number of features (e.g., genes or biomarkers)
ncols <- 8 # Number of samples
sugar_level <- matrix(runif(nrows * ncols, 1, 500), nrows)
# Metadata: treatment classes and outcomes
colData <- DataFrame(Treatment_class = rep(c("TZD", "DPP-4"), 4),
row.names = LETTERS[1:8])
Outcome <- DataFrame(Outcome = (rep(c("neg", "pos"), 5)))
se0 <- SummarizedExperiment(assays = SimpleList(counts = sugar_level),
colData = colData, metadata = Outcome)
# in the case the input is a SummarizedExperiment, extract assay and metadata
if (inherits(se0, "SummarizedExperiment")) {
data <- as.data.frame(assay(se0))
metadata <- as.data.frame(metadata(se0))
data_df <- cbind(metadata, data)
}
feature_names <- c("Outcome",
"TreatmentA", "TreatmentB", "TreatmentC", "TreatmentD",
"TreatmentE", "TreatmentF", "TreatmentG", "TreatmentH"
)
colnames(data_df) <- feature_names
data_df$Outcome <- data_df$Outcome |> as.factor()
str(data_df)
#> 'data.frame': 200 obs. of 9 variables:
#> $ Outcome : Factor w/ 2 levels "neg","pos": 1 2 1 2 1 2 1 2 1 2 ...
#> $ TreatmentA: num 274.2 479.9 99.6 191.6 473.2 ...
#> $ TreatmentB: num 227.4 127.9 275.4 361.3 80.7 ...
#> $ TreatmentC: num 472 362 175 469 428 ...
#> $ TreatmentD: num 57.5 436.5 197.2 351.2 338.9 ...
#> $ TreatmentE: num 236.1 359.5 286.3 315.4 32.4 ...
#> $ TreatmentF: num 77.3 429.2 76.1 449.3 311.2 ...
#> $ TreatmentG: num 34.8 119.7 100.7 426 298.7 ...
#> $ TreatmentH: num 362 1.77 36.49 347.83 454.56 ...
results <- model_comparer(data = data_df, "Outcome", for_utest = TRUE)
plot_importance(results$trained_models$lvq, "LVQ", type_plot = "enhanced")
plot_importance(results$trained_models$rf, "RF", type_plot = "basic")
models_results <- resamples(results$trained_models)
summary(models_results)
#>
#> Call:
#> summary.resamples(object = models_results)
#>
#> Models: lvq, rf
#> Number of resamples: 30
#>
#> Accuracy
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> lvq 0.3125 0.4375 0.5625 0.5375000 0.625000 0.750 0
#> rf 0.2500 0.4375 0.5000 0.4770833 0.546875 0.625 0
#>
#> Kappa
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> lvq -0.375 -0.125 0.125 0.07500000 0.25000 0.50 0
#> rf -0.500 -0.125 0.000 -0.04583333 0.09375 0.25 0
bwplot(models_results)
best_model <- results$performance$rf
best_model
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction pos neg
#> pos 12 8
#> neg 9 11
#>
#> Accuracy : 0.575
#> 95% CI : (0.4089, 0.7296)
#> No Information Rate : 0.525
#> P-Value [Acc > NIR] : 0.3184
#>
#> Kappa : 0.15
#>
#> Mcnemar's Test P-Value : 1.0000
#>
#> Sensitivity : 0.5714
#> Specificity : 0.5789
#> Pos Pred Value : 0.6000
#> Neg Pred Value : 0.5500
#> Prevalence : 0.5250
#> Detection Rate : 0.3000
#> Detection Prevalence : 0.5000
#> Balanced Accuracy : 0.5752
#>
#> 'Positive' Class : pos
#>
By identifying the most predictive treatments and features, we can inform clinical decision-making and prioritize interventions that maximize positive outcomes.
Here is an example of you can cite your package inside the vignette:
The data set utilized in the example is originally from the National Institute of Diabetes and Digestive and Kidney Diseases.
Date the vignette was generated.
#> [1] "2024-12-04 14:47:07 EST"
Wallclock time spent generating the vignette.
#> Time difference of 48.489 secs
R
session information.
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R Under development (unstable) (2024-10-21 r87258)
#> os Ubuntu 24.04.1 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2024-12-04
#> pandoc 3.1.3 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> abind 1.4-8 2024-09-12 [3] CRAN (R 4.5.0)
#> backports 1.5.0 2024-05-23 [3] CRAN (R 4.5.0)
#> bibtex 0.5.1 2023-01-26 [3] CRAN (R 4.5.0)
#> Biobase * 2.67.0 2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#> BiocGenerics * 0.53.3 2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#> BiocManager 1.30.25 2024-08-28 [2] CRAN (R 4.5.0)
#> BiocStyle * 2.35.0 2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#> bookdown 0.41 2024-10-16 [3] CRAN (R 4.5.0)
#> bslib 0.8.0 2024-07-29 [3] CRAN (R 4.5.0)
#> C50 0.1.8 2023-02-08 [3] CRAN (R 4.5.0)
#> cachem 1.1.0 2024-05-16 [3] CRAN (R 4.5.0)
#> caret * 6.0-94 2023-03-21 [3] CRAN (R 4.5.0)
#> class 7.3-22 2023-05-03 [4] CRAN (R 4.5.0)
#> cli 3.6.3 2024-06-21 [3] CRAN (R 4.5.0)
#> codetools 0.2-20 2024-03-31 [4] CRAN (R 4.5.0)
#> colorspace 2.1-1 2024-07-26 [3] CRAN (R 4.5.0)
#> combinat 0.0-8 2012-10-29 [3] CRAN (R 4.5.0)
#> crayon 1.5.3 2024-06-20 [3] CRAN (R 4.5.0)
#> Cubist 0.4.4 2024-07-02 [3] CRAN (R 4.5.0)
#> data.table 1.16.2 2024-10-10 [3] CRAN (R 4.5.0)
#> DelayedArray 0.33.3 2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#> digest 0.6.37 2024-08-19 [3] CRAN (R 4.5.0)
#> dplyr 1.1.4 2023-11-17 [3] CRAN (R 4.5.0)
#> e1071 1.7-16 2024-09-16 [3] CRAN (R 4.5.0)
#> evaluate 1.0.1 2024-10-10 [3] CRAN (R 4.5.0)
#> fansi 1.0.6 2023-12-08 [3] CRAN (R 4.5.0)
#> fastmap 1.2.0 2024-05-15 [3] CRAN (R 4.5.0)
#> forcats 1.0.0 2023-01-29 [3] CRAN (R 4.5.0)
#> foreach 1.5.2 2022-02-02 [3] CRAN (R 4.5.0)
#> Formula 1.2-5 2023-02-24 [3] CRAN (R 4.5.0)
#> future 1.34.0 2024-07-29 [3] CRAN (R 4.5.0)
#> future.apply 1.11.3 2024-10-27 [3] CRAN (R 4.5.0)
#> gbm 2.2.2 2024-06-28 [3] CRAN (R 4.5.0)
#> generics * 0.1.3 2022-07-05 [3] CRAN (R 4.5.0)
#> GenomeInfoDb * 1.43.2 2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#> GenomeInfoDbData 1.2.13 2024-10-23 [3] Bioconductor
#> GenomicRanges * 1.59.1 2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#> ggplot2 * 3.5.1 2024-04-23 [3] CRAN (R 4.5.0)
#> globals 0.16.3 2024-03-08 [3] CRAN (R 4.5.0)
#> glue 1.8.0 2024-09-30 [3] CRAN (R 4.5.0)
#> gower 1.0.1 2022-12-22 [3] CRAN (R 4.5.0)
#> gtable 0.3.6 2024-10-25 [3] CRAN (R 4.5.0)
#> hardhat 1.4.0 2024-06-02 [3] CRAN (R 4.5.0)
#> haven 2.5.4 2023-11-30 [3] CRAN (R 4.5.0)
#> highr 0.11 2024-05-26 [3] CRAN (R 4.5.0)
#> hms 1.1.3 2023-03-21 [3] CRAN (R 4.5.0)
#> htmltools 0.5.8.1 2024-04-04 [3] CRAN (R 4.5.0)
#> httpuv 1.6.15 2024-03-26 [3] CRAN (R 4.5.0)
#> httr 1.4.7 2023-08-15 [3] CRAN (R 4.5.0)
#> inum 1.0-5 2023-03-09 [3] CRAN (R 4.5.0)
#> ipred 0.9-15 2024-07-18 [3] CRAN (R 4.5.0)
#> IRanges * 2.41.2 2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#> iterators 1.0.14 2022-02-05 [3] CRAN (R 4.5.0)
#> jquerylib 0.1.4 2021-04-26 [3] CRAN (R 4.5.0)
#> jsonlite 1.8.9 2024-09-20 [3] CRAN (R 4.5.0)
#> klaR 1.7-3 2023-12-13 [3] CRAN (R 4.5.0)
#> knitr 1.49 2024-11-08 [3] CRAN (R 4.5.0)
#> labelled 2.13.0 2024-04-23 [3] CRAN (R 4.5.0)
#> later 1.4.1 2024-11-27 [3] CRAN (R 4.5.0)
#> lattice * 0.22-6 2024-03-20 [4] CRAN (R 4.5.0)
#> lava 1.8.0 2024-03-05 [3] CRAN (R 4.5.0)
#> libcoin 1.0-10 2023-09-27 [3] CRAN (R 4.5.0)
#> lifecycle 1.0.4 2023-11-07 [3] CRAN (R 4.5.0)
#> listenv 0.9.1 2024-01-29 [3] CRAN (R 4.5.0)
#> lubridate 1.9.3 2023-09-27 [3] CRAN (R 4.5.0)
#> magrittr 2.0.3 2022-03-30 [3] CRAN (R 4.5.0)
#> MASS 7.3-61 2024-06-13 [4] CRAN (R 4.5.0)
#> Matrix 1.7-1 2024-10-18 [4] CRAN (R 4.5.0)
#> MatrixGenerics * 1.19.0 2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#> matrixStats * 1.4.1 2024-09-08 [3] CRAN (R 4.5.0)
#> mime 0.12 2021-09-28 [3] CRAN (R 4.5.0)
#> miniUI 0.1.1.1 2018-05-18 [3] CRAN (R 4.5.0)
#> mlbench * 2.1-5 2024-05-02 [3] CRAN (R 4.5.0)
#> ModelMetrics 1.2.2.2 2020-03-17 [3] CRAN (R 4.5.0)
#> munsell 0.5.1 2024-04-01 [3] CRAN (R 4.5.0)
#> mvtnorm 1.3-2 2024-11-04 [3] CRAN (R 4.5.0)
#> nlme 3.1-166 2024-08-14 [4] CRAN (R 4.5.0)
#> nnet 7.3-19 2023-05-03 [4] CRAN (R 4.5.0)
#> parallelly 1.40.0 2024-12-03 [3] CRAN (R 4.5.0)
#> partykit 1.2-23 2024-12-02 [3] CRAN (R 4.5.0)
#> pillar 1.9.0 2023-03-22 [3] CRAN (R 4.5.0)
#> pkgconfig 2.0.3 2019-09-22 [3] CRAN (R 4.5.0)
#> plyr 1.8.9 2023-10-02 [3] CRAN (R 4.5.0)
#> pROC 1.18.5 2023-11-01 [3] CRAN (R 4.5.0)
#> prodlim 2024.06.25 2024-06-24 [3] CRAN (R 4.5.0)
#> promises 1.3.2 2024-11-28 [3] CRAN (R 4.5.0)
#> proxy 0.4-27 2022-06-09 [3] CRAN (R 4.5.0)
#> purrr 1.0.2 2023-08-10 [3] CRAN (R 4.5.0)
#> questionr 0.7.8 2023-01-31 [3] CRAN (R 4.5.0)
#> R6 2.5.1 2021-08-19 [3] CRAN (R 4.5.0)
#> randomForest 4.7-1.2 2024-09-22 [3] CRAN (R 4.5.0)
#> Rcpp 1.0.13-1 2024-11-02 [3] CRAN (R 4.5.0)
#> recipes 1.1.0 2024-07-04 [3] CRAN (R 4.5.0)
#> RefManageR * 1.4.0 2022-09-30 [3] CRAN (R 4.5.0)
#> reshape2 1.4.4 2020-04-09 [3] CRAN (R 4.5.0)
#> rlang 1.1.4 2024-06-04 [3] CRAN (R 4.5.0)
#> rmarkdown 2.29 2024-11-04 [3] CRAN (R 4.5.0)
#> rpart 4.1.23 2023-12-05 [4] CRAN (R 4.5.0)
#> rstudioapi 0.17.1 2024-10-22 [3] CRAN (R 4.5.0)
#> S4Arrays 1.7.1 2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#> S4Vectors * 0.45.2 2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#> sass 0.4.9 2024-03-15 [3] CRAN (R 4.5.0)
#> scales 1.3.0 2023-11-28 [3] CRAN (R 4.5.0)
#> sessioninfo * 1.2.2 2021-12-06 [3] CRAN (R 4.5.0)
#> shiny 1.9.1 2024-08-01 [3] CRAN (R 4.5.0)
#> SparseArray 1.7.2 2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#> stringi 1.8.4 2024-05-06 [3] CRAN (R 4.5.0)
#> stringr 1.5.1 2023-11-14 [3] CRAN (R 4.5.0)
#> SummarizedExperiment * 1.37.0 2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#> survival 3.7-0 2024-06-05 [4] CRAN (R 4.5.0)
#> TCMC * 0.99.0 2024-12-04 [1] Bioconductor
#> tibble 3.2.1 2023-03-20 [3] CRAN (R 4.5.0)
#> tidyselect 1.2.1 2024-03-11 [3] CRAN (R 4.5.0)
#> tidyverse 2.0.0 2023-02-22 [3] CRAN (R 4.5.0)
#> timechange 0.3.0 2024-01-18 [3] CRAN (R 4.5.0)
#> timeDate 4041.110 2024-09-22 [3] CRAN (R 4.5.0)
#> UCSC.utils 1.3.0 2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#> utf8 1.2.4 2023-10-22 [3] CRAN (R 4.5.0)
#> vctrs 0.6.5 2023-12-01 [3] CRAN (R 4.5.0)
#> withr 3.0.2 2024-10-28 [3] CRAN (R 4.5.0)
#> xfun 0.49 2024-10-31 [3] CRAN (R 4.5.0)
#> xml2 1.3.6 2023-12-04 [3] CRAN (R 4.5.0)
#> xtable 1.8-4 2019-04-21 [3] CRAN (R 4.5.0)
#> XVector 0.47.0 2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#> yaml 2.3.10 2024-07-26 [2] CRAN (R 4.5.0)
#> zlibbioc 1.53.0 2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#>
#> [1] /tmp/RtmpXhJlc9/Rinst3a93b452926105
#> [2] /home/pkgbuild/packagebuilder/workers/jobs/3657/R-libs
#> [3] /home/biocbuild/bbs-3.21-bioc/R/site-library
#> [4] /home/biocbuild/bbs-3.21-bioc/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
[1] F. Leisch and E. Dimitriadou. mlbench: Machine Learning Benchmark Problems. R package version 2.1-5. 2024. URL: https://CRAN.R-project.org/package=mlbench.
[2] M. Morgan, V. Obenchain, J. Hester, et al. SummarizedExperiment: A container (S4 class) for matrix-like assays. R package version 1.37.0. 2024. DOI: 10.18129/B9.bioc.SummarizedExperiment. URL: https://bioconductor.org/packages/SummarizedExperiment.
[3] D. Mukesha. TCMC: Compare Classification Models. R package version 0.99.0. 2024. URL: https://github.com/danymukesha/TCMC.