Quick start from the import of HiC data to the aggregation of HiC contacts. It includes 4 steps:
Data were obtained from Drosophila melanogaster S2 cells. HiC test dataset: Directly downloaded from the 4DN platform. Genomic coordinates: ChIPseq peaks of Beaf-32 protein in wild type cells (GSM1278639).
For a test, please download HiC data in .hic format (Juicer).
options(timeout = 3600)
temp.dir <- file.path(tempdir(), "HIC_DATA")
dir.create(temp.dir)
Hic.url <- paste0("https://4dn-open-data-public.s3.amazonaws.com/",
"fourfront-webprod/wfoutput/7386f953-8da9-47b0-acb2-931cba810544/",
"4DNFIOTPSS3L.hic")
HicOutput.pth <- file.path(temp.dir, "Control_HIC.hic")
HicOutput.pth <- normalizePath(HicOutput.pth)
#> Warning in normalizePath(HicOutput.pth):
#> path[1]="/tmp/RtmpUTGJVz/HIC_DATA/Control_HIC.hic": No such file or directory
if(.Platform$OS.type == "windows"){
download.file(Hic.url, HicOutput.pth, method = 'auto',
extra = '-k',mode="wb")
}else{
download.file(Hic.url, HicOutput.pth, method = 'auto', extra = '-k')
}
These kind of data can be imported in R with rtracklayer package.
View
seq | start | end | strand | name | score |
---|---|---|---|---|---|
2L | 35594 | 35725 | * | Beaf32_2 | 76 |
2L | 47296 | 47470 | * | Beaf32_3 | 44 |
2L | 65770 | 65971 | * | Beaf32_5 | 520 |
Required genomic information used by the functions during the entire pipeline are a data.frame
containing chromosomes names and sized and the binSize
, corresponding to the HiC matrices at the same resolution.
The package supports the import and normalization of HiC data.
NOTE: Since version 0.99.2, the package supports import of balanced HiC matrices in .hic, .cool/.mcool formats. It also supports the import of ‘o/e’ matrices in .hic format.
HicAggR can import HiC data stored in the main formats: .hic, .cool, .mcool, .h5. The pacakage imports by default the raw counts in R. Therefore, it is necessary to perform the balancing and observed/expected correction steps.
Genomic coordinates data (ChIP seq peaks or any other feature) need to be indexed using the same referenced genome as for HiC data. Then the genomic coordinates are paired in GInteraction objects.
Once data have been imported, interactions are extracted out of the pairs of genomic coordinates.
Submatrices are aggregated as sum, average or median. Then, aggregated matrix is plotted as a heatmap of contact frequencies (in the example, contacts surounding Beaf-32 sites).
sessionInfo()
#> R Under development (unstable) (2024-03-18 r86148)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] HicAggR_0.99.3
#>
#> loaded via a namespace (and not attached):
#> [1] SummarizedExperiment_1.33.3 gtable_0.3.4
#> [3] xfun_0.43 bslib_0.6.2
#> [5] ggplot2_3.5.0 rhdf5_2.47.6
#> [7] Biobase_2.63.0 lattice_0.22-6
#> [9] rhdf5filters_1.15.4 vctrs_0.6.5
#> [11] tools_4.4.0 generics_0.1.3
#> [13] parallel_4.4.0 stats4_4.4.0
#> [15] tibble_3.2.1 fansi_1.0.6
#> [17] highr_0.10 pkgconfig_2.0.3
#> [19] Matrix_1.7-0 data.table_1.15.2
#> [21] S4Vectors_0.41.5 lifecycle_1.0.4
#> [23] GenomeInfoDbData_1.2.12 farver_2.1.1
#> [25] compiler_4.4.0 stringr_1.5.1
#> [27] munsell_0.5.0 codetools_0.2-19
#> [29] InteractionSet_1.31.0 GenomeInfoDb_1.39.9
#> [31] htmltools_0.5.8 sass_0.4.9
#> [33] yaml_2.3.8 tidyr_1.3.1
#> [35] pillar_1.9.0 crayon_1.5.2
#> [37] jquerylib_0.1.4 BiocParallel_1.37.1
#> [39] DelayedArray_0.29.9 cachem_1.0.8
#> [41] abind_1.4-5 tidyselect_1.2.1
#> [43] digest_0.6.35 stringi_1.8.3
#> [45] purrr_1.0.2 dplyr_1.1.4
#> [47] labeling_0.4.3 fastmap_1.1.1
#> [49] grid_4.4.0 colorspace_2.1-0
#> [51] cli_3.6.2 SparseArray_1.3.4
#> [53] magrittr_2.0.3 S4Arrays_1.3.6
#> [55] utf8_1.2.4 withr_3.0.0
#> [57] scales_1.3.0 rmarkdown_2.26
#> [59] XVector_0.43.1 matrixStats_1.2.0
#> [61] gridExtra_2.3 png_0.1-8
#> [63] evaluate_0.23 knitr_1.45
#> [65] GenomicRanges_1.55.4 IRanges_2.37.1
#> [67] rlang_1.1.3 Rcpp_1.0.12
#> [69] glue_1.7.0 BiocGenerics_0.49.1
#> [71] reshape_0.8.9 jsonlite_1.8.8
#> [73] strawr_0.0.91 R6_2.5.1
#> [75] Rhdf5lib_1.25.1 plyr_1.8.9
#> [77] MatrixGenerics_1.15.0 zlibbioc_1.49.3