1 Basics

1.1 Install TCMC

TCMC package (Mukesha, 2024) will be soon available on Bioconductor.

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("TCMC")
BiocManager::valid()

1.2 Citing TCMC

I hope that TCMC will be useful for research. Please use the following information to cite the package and the overall approach. Thank you!

citation("TCMC")
#> To cite package 'TCMC' in publications use:
#> 
#>   Mukesha D (2024). _TCMC: Compare Classification Models_. R package
#>   version 0.99.0, <https://github.com/danymukesha/TCMC>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {TCMC: Compare Classification Models},
#>     author = {Dany Mukesha},
#>     year = {2024},
#>     note = {R package version 0.99.0},
#>     url = {https://github.com/danymukesha/TCMC},
#>   }

2 Quick start to using TCMC

library(TCMC)
library(mlbench)
data(PimaIndiansDiabetes)
str(PimaIndiansDiabetes)
#> 'data.frame':    768 obs. of  9 variables:
#>  $ pregnant: num  6 1 8 1 0 5 3 10 2 8 ...
#>  $ glucose : num  148 85 183 89 137 116 78 115 197 125 ...
#>  $ pressure: num  72 66 64 66 40 74 50 0 70 96 ...
#>  $ triceps : num  35 29 0 23 35 0 32 0 45 0 ...
#>  $ insulin : num  0 0 0 94 168 0 88 0 543 0 ...
#>  $ mass    : num  33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 0 ...
#>  $ pedigree: num  0.627 0.351 0.672 0.167 2.288 ...
#>  $ age     : num  50 31 32 21 33 30 26 29 53 54 ...
#>  $ diabetes: Factor w/ 2 levels "neg","pos": 2 1 2 1 2 1 2 1 2 2 ...
# for this example only LVQ and GBM are being tested
results <- model_comparer(PimaIndiansDiabetes, "diabetes", for_utest = TRUE)
# plot variable importance for a specific model
plot_importance(results$trained_models$lvq, "LVQ")

plot_importance(results$trained_models$rf, "RF")

# access trained models
models_results <- resamples(results$trained_models)
summary(models_results)
#> 
#> Call:
#> summary.resamples(object = models_results)
#> 
#> Models: lvq, rf 
#> Number of resamples: 30 
#> 
#> Accuracy 
#>          Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
#> lvq 0.5967742 0.6557377 0.6854839 0.6958664 0.7213115 0.8709677    0
#> rf  0.5901639 0.7287811 0.7741935 0.7636524 0.8000397 0.8870968    0
#> 
#> Kappa 
#>            Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
#> lvq -0.01572739 0.1838918 0.2599161 0.2711357 0.3547419 0.7181818    0
#> rf   0.03785489 0.4163366 0.4809856 0.4693944 0.5515619 0.7654054    0
bwplot(models_results)

best_model <- results$performance$rf
best_model
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction pos neg
#>        pos  33  20
#>        neg  16  84
#>                                           
#>                Accuracy : 0.7647          
#>                  95% CI : (0.6894, 0.8294)
#>     No Information Rate : 0.6797          
#>     P-Value [Acc > NIR] : 0.01348         
#>                                           
#>                   Kappa : 0.471           
#>                                           
#>  Mcnemar's Test P-Value : 0.61708         
#>                                           
#>             Sensitivity : 0.6735          
#>             Specificity : 0.8077          
#>          Pos Pred Value : 0.6226          
#>          Neg Pred Value : 0.8400          
#>              Prevalence : 0.3203          
#>          Detection Rate : 0.2157          
#>    Detection Prevalence : 0.3464          
#>       Balanced Accuracy : 0.7406          
#>                                           
#>        'Positive' Class : pos             
#> 

3 Example with SummarizedExperiment

Example integrated with SummarizedExperiment (Morgan, Obenchain, Hester, and Pagès, 2024) with machine learning workflows for evaluating treatment effectiveness.

  • Scenario description: In this example, we investigate the effectiveness of different treatments in influencing positive outcomes for a simulated clinical dataset. Each treatment corresponds to a class of drugs (e.g., TZD and DPP-4), and the outcome variable indicates whether the response to treatment was positive or negative. Using the SummarizedExperiment (Morgan, Obenchain, Hester et al., 2024) class from Bioconductor, we will preprocess the data, train machine learning models, and analyze the most impactful features and models.

  • Data simulation: The dataset contains measurements of sugar levels across eight samples, along with metadata describing treatment classes and outcomes. The SummarizedExperiment (Morgan, Obenchain, Hester et al., 2024) object is used to organize and manage this data.

library(SummarizedExperiment)

# Simulate data
nrows <- 200 # Number of features (e.g., genes or biomarkers)
ncols <- 8  # Number of samples
sugar_level <- matrix(runif(nrows * ncols, 1, 500), nrows)

# Metadata: treatment classes and outcomes
colData <- DataFrame(Treatment_class = rep(c("TZD", "DPP-4"), 4),
        row.names = LETTERS[1:8])
Outcome <- DataFrame(Outcome = (rep(c("neg", "pos"), 5)))
se0 <- SummarizedExperiment(assays = SimpleList(counts = sugar_level),
        colData = colData, metadata = Outcome)
# in the case the input is a SummarizedExperiment, extract assay and metadata
if (inherits(se0, "SummarizedExperiment")) {
    data <- as.data.frame(assay(se0))
    metadata <- as.data.frame(metadata(se0))
    data_df <- cbind(metadata, data)
}
feature_names <- c("Outcome",
    "TreatmentA", "TreatmentB", "TreatmentC", "TreatmentD", 
    "TreatmentE", "TreatmentF", "TreatmentG", "TreatmentH"
)
colnames(data_df) <- feature_names
data_df$Outcome <- data_df$Outcome |> as.factor()
str(data_df) 
#> 'data.frame':    200 obs. of  9 variables:
#>  $ Outcome   : Factor w/ 2 levels "neg","pos": 1 2 1 2 1 2 1 2 1 2 ...
#>  $ TreatmentA: num  274.2 479.9 99.6 191.6 473.2 ...
#>  $ TreatmentB: num  227.4 127.9 275.4 361.3 80.7 ...
#>  $ TreatmentC: num  472 362 175 469 428 ...
#>  $ TreatmentD: num  57.5 436.5 197.2 351.2 338.9 ...
#>  $ TreatmentE: num  236.1 359.5 286.3 315.4 32.4 ...
#>  $ TreatmentF: num  77.3 429.2 76.1 449.3 311.2 ...
#>  $ TreatmentG: num  34.8 119.7 100.7 426 298.7 ...
#>  $ TreatmentH: num  362 1.77 36.49 347.83 454.56 ...
results <- model_comparer(data = data_df, "Outcome", for_utest = TRUE)
plot_importance(results$trained_models$lvq, "LVQ", type_plot = "enhanced")

plot_importance(results$trained_models$rf, "RF", type_plot = "basic")


models_results <- resamples(results$trained_models)
summary(models_results)
#> 
#> Call:
#> summary.resamples(object = models_results)
#> 
#> Models: lvq, rf 
#> Number of resamples: 30 
#> 
#> Accuracy 
#>       Min. 1st Qu. Median      Mean  3rd Qu.  Max. NA's
#> lvq 0.3125  0.4375 0.5625 0.5375000 0.625000 0.750    0
#> rf  0.2500  0.4375 0.5000 0.4770833 0.546875 0.625    0
#> 
#> Kappa 
#>       Min. 1st Qu. Median        Mean 3rd Qu. Max. NA's
#> lvq -0.375  -0.125  0.125  0.07500000 0.25000 0.50    0
#> rf  -0.500  -0.125  0.000 -0.04583333 0.09375 0.25    0

bwplot(models_results)


best_model <- results$performance$rf
best_model
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction pos neg
#>        pos  12   8
#>        neg   9  11
#>                                           
#>                Accuracy : 0.575           
#>                  95% CI : (0.4089, 0.7296)
#>     No Information Rate : 0.525           
#>     P-Value [Acc > NIR] : 0.3184          
#>                                           
#>                   Kappa : 0.15            
#>                                           
#>  Mcnemar's Test P-Value : 1.0000          
#>                                           
#>             Sensitivity : 0.5714          
#>             Specificity : 0.5789          
#>          Pos Pred Value : 0.6000          
#>          Neg Pred Value : 0.5500          
#>              Prevalence : 0.5250          
#>          Detection Rate : 0.3000          
#>    Detection Prevalence : 0.5000          
#>       Balanced Accuracy : 0.5752          
#>                                           
#>        'Positive' Class : pos             
#> 

By identifying the most predictive treatments and features, we can inform clinical decision-making and prioritize interventions that maximize positive outcomes.

Here is an example of you can cite your package inside the vignette:

  • TCMC (Mukesha, 2024)

The data set utilized in the example is originally from the National Institute of Diabetes and Digestive and Kidney Diseases.

  • Source: mlbench (Leisch and Dimitriadou, 2024)

Date the vignette was generated.

#> [1] "2024-12-04 14:47:07 EST"

Wallclock time spent generating the vignette.

#> Time difference of 48.489 secs

R session information.

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R Under development (unstable) (2024-10-21 r87258)
#>  os       Ubuntu 24.04.1 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2024-12-04
#>  pandoc   3.1.3 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package              * version    date (UTC) lib source
#>  abind                  1.4-8      2024-09-12 [3] CRAN (R 4.5.0)
#>  backports              1.5.0      2024-05-23 [3] CRAN (R 4.5.0)
#>  bibtex                 0.5.1      2023-01-26 [3] CRAN (R 4.5.0)
#>  Biobase              * 2.67.0     2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#>  BiocGenerics         * 0.53.3     2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#>  BiocManager            1.30.25    2024-08-28 [2] CRAN (R 4.5.0)
#>  BiocStyle            * 2.35.0     2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#>  bookdown               0.41       2024-10-16 [3] CRAN (R 4.5.0)
#>  bslib                  0.8.0      2024-07-29 [3] CRAN (R 4.5.0)
#>  C50                    0.1.8      2023-02-08 [3] CRAN (R 4.5.0)
#>  cachem                 1.1.0      2024-05-16 [3] CRAN (R 4.5.0)
#>  caret                * 6.0-94     2023-03-21 [3] CRAN (R 4.5.0)
#>  class                  7.3-22     2023-05-03 [4] CRAN (R 4.5.0)
#>  cli                    3.6.3      2024-06-21 [3] CRAN (R 4.5.0)
#>  codetools              0.2-20     2024-03-31 [4] CRAN (R 4.5.0)
#>  colorspace             2.1-1      2024-07-26 [3] CRAN (R 4.5.0)
#>  combinat               0.0-8      2012-10-29 [3] CRAN (R 4.5.0)
#>  crayon                 1.5.3      2024-06-20 [3] CRAN (R 4.5.0)
#>  Cubist                 0.4.4      2024-07-02 [3] CRAN (R 4.5.0)
#>  data.table             1.16.2     2024-10-10 [3] CRAN (R 4.5.0)
#>  DelayedArray           0.33.3     2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#>  digest                 0.6.37     2024-08-19 [3] CRAN (R 4.5.0)
#>  dplyr                  1.1.4      2023-11-17 [3] CRAN (R 4.5.0)
#>  e1071                  1.7-16     2024-09-16 [3] CRAN (R 4.5.0)
#>  evaluate               1.0.1      2024-10-10 [3] CRAN (R 4.5.0)
#>  fansi                  1.0.6      2023-12-08 [3] CRAN (R 4.5.0)
#>  fastmap                1.2.0      2024-05-15 [3] CRAN (R 4.5.0)
#>  forcats                1.0.0      2023-01-29 [3] CRAN (R 4.5.0)
#>  foreach                1.5.2      2022-02-02 [3] CRAN (R 4.5.0)
#>  Formula                1.2-5      2023-02-24 [3] CRAN (R 4.5.0)
#>  future                 1.34.0     2024-07-29 [3] CRAN (R 4.5.0)
#>  future.apply           1.11.3     2024-10-27 [3] CRAN (R 4.5.0)
#>  gbm                    2.2.2      2024-06-28 [3] CRAN (R 4.5.0)
#>  generics             * 0.1.3      2022-07-05 [3] CRAN (R 4.5.0)
#>  GenomeInfoDb         * 1.43.2     2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#>  GenomeInfoDbData       1.2.13     2024-10-23 [3] Bioconductor
#>  GenomicRanges        * 1.59.1     2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#>  ggplot2              * 3.5.1      2024-04-23 [3] CRAN (R 4.5.0)
#>  globals                0.16.3     2024-03-08 [3] CRAN (R 4.5.0)
#>  glue                   1.8.0      2024-09-30 [3] CRAN (R 4.5.0)
#>  gower                  1.0.1      2022-12-22 [3] CRAN (R 4.5.0)
#>  gtable                 0.3.6      2024-10-25 [3] CRAN (R 4.5.0)
#>  hardhat                1.4.0      2024-06-02 [3] CRAN (R 4.5.0)
#>  haven                  2.5.4      2023-11-30 [3] CRAN (R 4.5.0)
#>  highr                  0.11       2024-05-26 [3] CRAN (R 4.5.0)
#>  hms                    1.1.3      2023-03-21 [3] CRAN (R 4.5.0)
#>  htmltools              0.5.8.1    2024-04-04 [3] CRAN (R 4.5.0)
#>  httpuv                 1.6.15     2024-03-26 [3] CRAN (R 4.5.0)
#>  httr                   1.4.7      2023-08-15 [3] CRAN (R 4.5.0)
#>  inum                   1.0-5      2023-03-09 [3] CRAN (R 4.5.0)
#>  ipred                  0.9-15     2024-07-18 [3] CRAN (R 4.5.0)
#>  IRanges              * 2.41.2     2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#>  iterators              1.0.14     2022-02-05 [3] CRAN (R 4.5.0)
#>  jquerylib              0.1.4      2021-04-26 [3] CRAN (R 4.5.0)
#>  jsonlite               1.8.9      2024-09-20 [3] CRAN (R 4.5.0)
#>  klaR                   1.7-3      2023-12-13 [3] CRAN (R 4.5.0)
#>  knitr                  1.49       2024-11-08 [3] CRAN (R 4.5.0)
#>  labelled               2.13.0     2024-04-23 [3] CRAN (R 4.5.0)
#>  later                  1.4.1      2024-11-27 [3] CRAN (R 4.5.0)
#>  lattice              * 0.22-6     2024-03-20 [4] CRAN (R 4.5.0)
#>  lava                   1.8.0      2024-03-05 [3] CRAN (R 4.5.0)
#>  libcoin                1.0-10     2023-09-27 [3] CRAN (R 4.5.0)
#>  lifecycle              1.0.4      2023-11-07 [3] CRAN (R 4.5.0)
#>  listenv                0.9.1      2024-01-29 [3] CRAN (R 4.5.0)
#>  lubridate              1.9.3      2023-09-27 [3] CRAN (R 4.5.0)
#>  magrittr               2.0.3      2022-03-30 [3] CRAN (R 4.5.0)
#>  MASS                   7.3-61     2024-06-13 [4] CRAN (R 4.5.0)
#>  Matrix                 1.7-1      2024-10-18 [4] CRAN (R 4.5.0)
#>  MatrixGenerics       * 1.19.0     2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#>  matrixStats          * 1.4.1      2024-09-08 [3] CRAN (R 4.5.0)
#>  mime                   0.12       2021-09-28 [3] CRAN (R 4.5.0)
#>  miniUI                 0.1.1.1    2018-05-18 [3] CRAN (R 4.5.0)
#>  mlbench              * 2.1-5      2024-05-02 [3] CRAN (R 4.5.0)
#>  ModelMetrics           1.2.2.2    2020-03-17 [3] CRAN (R 4.5.0)
#>  munsell                0.5.1      2024-04-01 [3] CRAN (R 4.5.0)
#>  mvtnorm                1.3-2      2024-11-04 [3] CRAN (R 4.5.0)
#>  nlme                   3.1-166    2024-08-14 [4] CRAN (R 4.5.0)
#>  nnet                   7.3-19     2023-05-03 [4] CRAN (R 4.5.0)
#>  parallelly             1.40.0     2024-12-03 [3] CRAN (R 4.5.0)
#>  partykit               1.2-23     2024-12-02 [3] CRAN (R 4.5.0)
#>  pillar                 1.9.0      2023-03-22 [3] CRAN (R 4.5.0)
#>  pkgconfig              2.0.3      2019-09-22 [3] CRAN (R 4.5.0)
#>  plyr                   1.8.9      2023-10-02 [3] CRAN (R 4.5.0)
#>  pROC                   1.18.5     2023-11-01 [3] CRAN (R 4.5.0)
#>  prodlim                2024.06.25 2024-06-24 [3] CRAN (R 4.5.0)
#>  promises               1.3.2      2024-11-28 [3] CRAN (R 4.5.0)
#>  proxy                  0.4-27     2022-06-09 [3] CRAN (R 4.5.0)
#>  purrr                  1.0.2      2023-08-10 [3] CRAN (R 4.5.0)
#>  questionr              0.7.8      2023-01-31 [3] CRAN (R 4.5.0)
#>  R6                     2.5.1      2021-08-19 [3] CRAN (R 4.5.0)
#>  randomForest           4.7-1.2    2024-09-22 [3] CRAN (R 4.5.0)
#>  Rcpp                   1.0.13-1   2024-11-02 [3] CRAN (R 4.5.0)
#>  recipes                1.1.0      2024-07-04 [3] CRAN (R 4.5.0)
#>  RefManageR           * 1.4.0      2022-09-30 [3] CRAN (R 4.5.0)
#>  reshape2               1.4.4      2020-04-09 [3] CRAN (R 4.5.0)
#>  rlang                  1.1.4      2024-06-04 [3] CRAN (R 4.5.0)
#>  rmarkdown              2.29       2024-11-04 [3] CRAN (R 4.5.0)
#>  rpart                  4.1.23     2023-12-05 [4] CRAN (R 4.5.0)
#>  rstudioapi             0.17.1     2024-10-22 [3] CRAN (R 4.5.0)
#>  S4Arrays               1.7.1      2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#>  S4Vectors            * 0.45.2     2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#>  sass                   0.4.9      2024-03-15 [3] CRAN (R 4.5.0)
#>  scales                 1.3.0      2023-11-28 [3] CRAN (R 4.5.0)
#>  sessioninfo          * 1.2.2      2021-12-06 [3] CRAN (R 4.5.0)
#>  shiny                  1.9.1      2024-08-01 [3] CRAN (R 4.5.0)
#>  SparseArray            1.7.2      2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#>  stringi                1.8.4      2024-05-06 [3] CRAN (R 4.5.0)
#>  stringr                1.5.1      2023-11-14 [3] CRAN (R 4.5.0)
#>  SummarizedExperiment * 1.37.0     2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#>  survival               3.7-0      2024-06-05 [4] CRAN (R 4.5.0)
#>  TCMC                 * 0.99.0     2024-12-04 [1] Bioconductor
#>  tibble                 3.2.1      2023-03-20 [3] CRAN (R 4.5.0)
#>  tidyselect             1.2.1      2024-03-11 [3] CRAN (R 4.5.0)
#>  tidyverse              2.0.0      2023-02-22 [3] CRAN (R 4.5.0)
#>  timechange             0.3.0      2024-01-18 [3] CRAN (R 4.5.0)
#>  timeDate               4041.110   2024-09-22 [3] CRAN (R 4.5.0)
#>  UCSC.utils             1.3.0      2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#>  utf8                   1.2.4      2023-10-22 [3] CRAN (R 4.5.0)
#>  vctrs                  0.6.5      2023-12-01 [3] CRAN (R 4.5.0)
#>  withr                  3.0.2      2024-10-28 [3] CRAN (R 4.5.0)
#>  xfun                   0.49       2024-10-31 [3] CRAN (R 4.5.0)
#>  xml2                   1.3.6      2023-12-04 [3] CRAN (R 4.5.0)
#>  xtable                 1.8-4      2019-04-21 [3] CRAN (R 4.5.0)
#>  XVector                0.47.0     2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#>  yaml                   2.3.10     2024-07-26 [2] CRAN (R 4.5.0)
#>  zlibbioc               1.53.0     2024-12-03 [3] Bioconductor 3.21 (R 4.5.0)
#> 
#>  [1] /tmp/RtmpXhJlc9/Rinst3a93b452926105
#>  [2] /home/pkgbuild/packagebuilder/workers/jobs/3657/R-libs
#>  [3] /home/biocbuild/bbs-3.21-bioc/R/site-library
#>  [4] /home/biocbuild/bbs-3.21-bioc/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

4 Bibliography

[1] F. Leisch and E. Dimitriadou. mlbench: Machine Learning Benchmark Problems. R package version 2.1-5. 2024. URL: https://CRAN.R-project.org/package=mlbench.

[2] M. Morgan, V. Obenchain, J. Hester, et al. SummarizedExperiment: A container (S4 class) for matrix-like assays. R package version 1.37.0. 2024. DOI: 10.18129/B9.bioc.SummarizedExperiment. URL: https://bioconductor.org/packages/SummarizedExperiment.

[3] D. Mukesha. TCMC: Compare Classification Models. R package version 0.99.0. 2024. URL: https://github.com/danymukesha/TCMC.