ISAnalytics 1.4.3
ISAnalytics
can be installed quickly in different ways:
devtools
There are always 2 versions of the package active:
RELEASE
is the latest stable versionDEVEL
is the development version, it is the most up-to-date version where
all new features are introducedRELEASE version:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ISAnalytics")
DEVEL version:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# The following initializes usage of Bioc devel
BiocManager::install(version='devel')
BiocManager::install("ISAnalytics")
RELEASE:
if (!require(devtools)) {
install.packages("devtools")
}
devtools::install_github("calabrialab/ISAnalytics",
ref = "RELEASE_3_14",
dependencies = TRUE,
build_vignettes = TRUE)
DEVEL:
if (!require(devtools)) {
install.packages("devtools")
}
devtools::install_github("calabrialab/ISAnalytics",
ref = "master",
dependencies = TRUE,
build_vignettes = TRUE)
ISAnalytics
has a verbose option that allows some functions to print
additional information to the console while they’re executing.
To disable this feature do:
# DISABLE
options("ISAnalytics.verbose" = FALSE)
# ENABLE
options("ISAnalytics.verbose" = TRUE)
Some functions also produce report in a user-friendly HTML format, to set this feature:
# DISABLE HTML REPORTS
options("ISAnalytics.reports" = FALSE)
# ENABLE HTML REPORTS
options("ISAnalytics.reports" = TRUE)
library(ISAnalytics)
We’re not going into too much detail here, but we’re going to explain in a very simple way what a “collision” is and how the function in this package deals with them.
We say that an integration (aka a unique combination of chromosome,
integration locus and strand) is a collision if this combination is shared
between different independent samples: an independent sample is a unique
combination of ProjectID
and SubjectID
(where subjects usually represent
patients). The reason behind this is that it’s highly improbable to observe
the very same integration in two different subjects and this phenomenon might
be an indicator of some kind of contamination in the sequencing phase or in
PCR phase, for this reason we might want to exclude such contamination from
our analysis.
ISAnalytics
provides a function that processes the imported data for the
removal or reassignment of these “problematic” integrations,
remove_collisions()
.
The processing is done using the sequence count value, so the corresponding matrix is needed for this operation.
The remove_collisions()
function follows several logical
steps to decide whether
an integration is a collision and if it is it decides whether to re-assign it or
remove it entirely based on different criterias.
As we said before, a collision is a triplet made of chr
, integration locus
and strand
, which is shared between different independent samples, aka a pair
made of ProjectID
and SubjectID
. The function uses the information stored
in the association file to assess which independent samples are present and
counts the number of independent samples for each integration: those who have a
count > 1 are considered collisions.
chr | integration_locus | strand | seqCount | CompleteAmplificationID | SubjectID | ProjectID |
---|---|---|---|---|---|---|
1 | 123454 | + | 653 | SAMPLE1 | SUBJ01 | PJ01 |
1 | 123454 | + | 456 | SAMPLE2 | SUBJ02 | PJ01 |
Once the collisions are identified, the function follows 3 steps where it tries to re-assign the combination to a single independent sample. The criterias are:
reads_ratio
), the default value is 10.If none of the criterias were sufficient to make a decision, the integration is simply removed from the matrix.
data("integration_matrices", package = "ISAnalytics")
data("association_file", package = "ISAnalytics")
## Multi quantification matrix
no_coll <- remove_collisions(x = integration_matrices,
association_file = association_file,
report_path = NULL)
#> Identifying collisions...
#> Processing collisions...
#>
|
| | 0%
|
|============================ | 25%
|
|======================================================= | 50%
|
|================================================================================== | 75%
|
|==============================================================================================================| 100%
#> Finished!
## Matrix list
separated <- separate_quant_matrices(integration_matrices)
no_coll_list <- remove_collisions(x = separated,
association_file = association_file,
report_path = NULL)
#> Identifying collisions...
#> Processing collisions...
#>
|
| | 0%
|
|============================ | 25%
|
|======================================================= | 50%
|
|================================================================================== | 75%
|
|==============================================================================================================| 100%
#> Finished!
## Only sequence count
no_coll_single <- remove_collisions(x = separated$seqCount,
association_file = association_file,
quant_cols = c(seqCount = "Value"),
report_path = NULL)
#> Identifying collisions...
#> Processing collisions...
#>
|
| | 0%
|
|============================ | 25%
|
|======================================================= | 50%
|
|================================================================================== | 75%
|
|==============================================================================================================| 100%
#> Finished!
Important notes on the association file:
The function accepts different inputs, namely:
quantification_types()
If the option ISAnalytics.reports
is active, an interactive report in
HTML format will be produced at the specified path.
If you’ve given as input the standalone sequence count
matrix to remove_collisions()
, to realign other matrices you have
to call the function realign_after_collisions()
, passing as input the
processed sequence count matrix and the named list of other matrices
to realign.
NOTE: the names in the list must be quantification types.
other_realigned <- realign_after_collisions(
sc_matrix = no_coll_single,
other_matrices = list(fragmentEstimate = separated$fragmentEstimate)
)
R
session information.
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.1.2 (2021-11-01)
#> os Ubuntu 20.04.3 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2022-01-16
#> pandoc 2.5 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.1.2)
#> BiocManager 1.30.16 2021-06-15 [2] CRAN (R 4.1.2)
#> BiocParallel 1.28.3 2022-01-16 [2] Bioconductor
#> BiocStyle * 2.22.0 2022-01-16 [2] Bioconductor
#> bookdown 0.24 2021-09-02 [2] CRAN (R 4.1.2)
#> bslib 0.3.1 2021-10-06 [2] CRAN (R 4.1.2)
#> cli 3.1.0 2021-10-27 [2] CRAN (R 4.1.2)
#> colorspace 2.0-2 2021-06-24 [2] CRAN (R 4.1.2)
#> crayon 1.4.2 2021-10-29 [2] CRAN (R 4.1.2)
#> data.table 1.14.2 2021-09-27 [2] CRAN (R 4.1.2)
#> DBI 1.1.2 2021-12-20 [2] CRAN (R 4.1.2)
#> digest 0.6.29 2021-12-01 [2] CRAN (R 4.1.2)
#> dplyr 1.0.7 2021-06-18 [2] CRAN (R 4.1.2)
#> ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.1.2)
#> evaluate 0.14 2019-05-28 [2] CRAN (R 4.1.2)
#> fansi 1.0.2 2022-01-14 [2] CRAN (R 4.1.2)
#> fastmap 1.1.0 2021-01-25 [2] CRAN (R 4.1.2)
#> fs 1.5.2 2021-12-08 [2] CRAN (R 4.1.2)
#> generics 0.1.1 2021-10-25 [2] CRAN (R 4.1.2)
#> ggplot2 3.3.5 2021-06-25 [2] CRAN (R 4.1.2)
#> ggrepel 0.9.1 2021-01-15 [2] CRAN (R 4.1.2)
#> glue 1.6.0 2021-12-17 [2] CRAN (R 4.1.2)
#> gtable 0.3.0 2019-03-25 [2] CRAN (R 4.1.2)
#> highr 0.9 2021-04-16 [2] CRAN (R 4.1.2)
#> hms 1.1.1 2021-09-26 [2] CRAN (R 4.1.2)
#> htmltools 0.5.2 2021-08-25 [2] CRAN (R 4.1.2)
#> httr 1.4.2 2020-07-20 [2] CRAN (R 4.1.2)
#> ISAnalytics * 1.4.3 2022-01-16 [1] Bioconductor
#> jquerylib 0.1.4 2021-04-26 [2] CRAN (R 4.1.2)
#> jsonlite 1.7.2 2020-12-09 [2] CRAN (R 4.1.2)
#> knitr 1.37 2021-12-16 [2] CRAN (R 4.1.2)
#> lattice 0.20-45 2021-09-22 [2] CRAN (R 4.1.2)
#> lifecycle 1.0.1 2021-09-24 [2] CRAN (R 4.1.2)
#> lubridate 1.8.0 2021-10-07 [2] CRAN (R 4.1.2)
#> magrittr * 2.0.1 2020-11-17 [2] CRAN (R 4.1.2)
#> mnormt 2.0.2 2020-09-01 [2] CRAN (R 4.1.2)
#> munsell 0.5.0 2018-06-12 [2] CRAN (R 4.1.2)
#> nlme 3.1-155 2022-01-13 [2] CRAN (R 4.1.2)
#> pillar 1.6.4 2021-10-18 [2] CRAN (R 4.1.2)
#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.1.2)
#> plyr 1.8.6 2020-03-03 [2] CRAN (R 4.1.2)
#> psych 2.1.9 2021-09-22 [2] CRAN (R 4.1.2)
#> purrr 0.3.4 2020-04-17 [2] CRAN (R 4.1.2)
#> R6 2.5.1 2021-08-19 [2] CRAN (R 4.1.2)
#> Rcapture 1.4-3 2019-12-16 [2] CRAN (R 4.1.2)
#> Rcpp 1.0.8 2022-01-13 [2] CRAN (R 4.1.2)
#> readr 2.1.1 2021-11-30 [2] CRAN (R 4.1.2)
#> RefManageR * 1.3.0 2020-11-13 [2] CRAN (R 4.1.2)
#> rlang 0.4.12 2021-10-18 [2] CRAN (R 4.1.2)
#> rmarkdown 2.11 2021-09-14 [2] CRAN (R 4.1.2)
#> sass 0.4.0 2021-05-12 [2] CRAN (R 4.1.2)
#> scales 1.1.1 2020-05-11 [2] CRAN (R 4.1.2)
#> sessioninfo * 1.2.2 2021-12-06 [2] CRAN (R 4.1.2)
#> stringi 1.7.6 2021-11-29 [2] CRAN (R 4.1.2)
#> stringr 1.4.0 2019-02-10 [2] CRAN (R 4.1.2)
#> tibble 3.1.6 2021-11-07 [2] CRAN (R 4.1.2)
#> tidyr 1.1.4 2021-09-27 [2] CRAN (R 4.1.2)
#> tidyselect 1.1.1 2021-04-30 [2] CRAN (R 4.1.2)
#> tmvnsim 1.0-2 2016-12-15 [2] CRAN (R 4.1.2)
#> tzdb 0.2.0 2021-10-27 [2] CRAN (R 4.1.2)
#> utf8 1.2.2 2021-07-24 [2] CRAN (R 4.1.2)
#> vctrs 0.3.8 2021-04-29 [2] CRAN (R 4.1.2)
#> xfun 0.29 2021-12-14 [2] CRAN (R 4.1.2)
#> xml2 1.3.3 2021-11-30 [2] CRAN (R 4.1.2)
#> yaml 2.2.1 2020-02-01 [2] CRAN (R 4.1.2)
#> zip 2.2.0 2021-05-31 [2] CRAN (R 4.1.2)
#>
#> [1] /tmp/RtmplowxUB/Rinst3081361f546c34
#> [2] /home/biocbuild/bbs-3.14-bioc/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
This vignette was generated using BiocStyle (Oleś, 2022) with knitr (Xie, 2021) and rmarkdown (Allaire, Xie, McPherson, et al., 2021) running behind the scenes.
Citations made with RefManageR (McLean, 2017).
[1] J. Allaire, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 2.11. 2021. URL: https://github.com/rstudio/rmarkdown.
[2] M. W. McLean. “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.
[3] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.22.0. 2022. URL: https://github.com/Bioconductor/BiocStyle.
[4] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.37. 2021. URL: https://yihui.org/knitr/.