--- title: "Copy number analysis" author: "Anand Mayakonda" date: "`r Sys.Date()`" output: html_document: toc: true toc_depth: 3 toc_float: true number_sections: true self_contained: yes css: corp-styles.css highlight: pygments vignette: > %\VignetteIndexEntry{04: Copy number analysis} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r} library(maftools) ``` Note: This vignette was not evaluated. # Introduction `maftools` provides a set of functions to facilitate copy number analysis using [ASCAT](https://github.com/VanLoo-lab/ascat) for tumor-normal or tumor-only WGS datasets. Although there exists [ascatNgs](https://github.com/cancerit/ascatNgs), it requires the installation of Perl and C modules to fetch the read counts across the markers. `maftools` bypass these requirements entirely within R with the C code baked in. However, `maftools` only generates the required read counts, BAF, and logR files. Downstream analyses have to be done with [ASCAT](https://github.com/VanLoo-lab/ascat). ASCAT is not available on CRAN or Bioconductor and needs to be installed from [GitHub](https://github.com/VanLoo-lab/ascat) ```{r, eval=FALSE} remotes::install_github(repo = 'VanLoo-lab/ascat/ASCAT') ``` If you use `maftools` functions for CNV analysis, please cite the ASCAT publication ------------------------------------------------------------------------ ***Van Loo P, Nordgard SH, Lingjærde OC, et al. Allele-specific copy number analysis of tumors. Proc Natl Acad Sci U S A. 2010;107(39):16910-16915. doi:10.1073/pnas.1009843107*** ------------------------------------------------------------------------ # Step-1: Get nucleotide counts for genetic markers Below command will generate two tsv files `tumor_nucleotide_counts.tsv` and `normal_nucleotide_counts.tsv` that can be used for downstream analysis. Note that the function will process ~900K SNPs from [Affymetrix Genome-Wide Human SNP 6.0 Array](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL6801). The process can be sped up linearly by increasing `nthreads` which will launch each chromosome on a separate thread. Currently `hg19` and `hg38` are supported. ```{r, eval=FALSE} #Matched normal BAM files are strongly recommended counts = maftools::gtMarkers(t_bam = "tumor.bam", n_bam = "normal.bam", build = "hg19") ``` # Step-2: Prepare input files for ASCAT with `prepAscat()` ## Tumor-Normal pair Below command takes `tumor_nucleotide_counts.tsv` and `normal_nucleotide_counts.tsv` files, filter SNPs with low coverage (default <15), estimate BAF, logR, and generates the input files for ASCAT. ```{r, eval=FALSE} library(ASCAT) ascat.bc = maftools::prepAscat(t_counts = "tumor_nucleotide_counts.tsv", n_counts = "normal_nucleotide_counts.tsv", sample_name = "tumor") # Library sizes: # Tumor: 1830168947 # Normal: 1321201848 # Library size difference: 1.385 # ------ # Counts file: tumor_nucleotide_counts.tsv # Markers: 932148 # Removed 2982 duplicated loci # Markers > 15: 928607 # ------ # Counts file: normal_nucleotide_counts.tsv # Markers: 932148 # Removed 2982 duplicated loci # Markers > 15: 928311 # ------ # Final number SNPs: 928107 # Generated following files: # tumor_nucleotide_counts.tumour.BAF.txt # tumor_nucleotide_counts.tumour.logR.txt # tumor_nucleotide_counts.normal.BAF.txt # tumor_nucleotide_counts.normal.logR.txt # ------ ``` Generated BAF and logR files can be processed with [ASCAT functions](https://www.crick.ac.uk/research/labs/peter-van-loo/software). The below code chunk shows minimal usage with ASCAT. See [here](https://github.com/VanLoo-lab/ascat/tree/master/ExampleData) for further workflow examples. ```{r, eval=FALSE} ascat.bc = ASCAT::ascat.loadData( Tumor_LogR_file = "tumor_nucleotide_counts.tumour.logR.txt", Tumor_BAF_file = "tumor_nucleotide_counts.tumour.BAF.txt", Germline_LogR_file = "tumor_nucleotide_counts.normal.logR.txt", Germline_BAF_file = "tumor_nucleotide_counts.normal.BAF.txt", chrs = c(1:22, "X", "Y"), sexchromosomes = c("X", "Y") ) ASCAT::ascat.plotRawData(ASCATobj = ascat.bc, img.prefix = "tumor") ascat.bc = ASCAT::ascat.aspcf(ascat.bc) ASCAT::ascat.plotSegmentedData(ascat.bc) ascat.output = ASCAT::ascat.runAscat(ascat.bc) ``` ## Tumor only In tumor-only mode, read counts are normalized for median depth of coverage across autosomes. ```{r, eval=FALSE} ascat.bc = maftools::prepAscat_t(t_counts = "tumor_nucleotide_counts.tsv", sample_name = "tumor_only") # Library sizes: # Tumor: 1830168947 # Counts file: tumor_nucleotide_counts.tsv # Markers: 932148 # Removed 2982 duplicated loci # Markers > 15: 928607 # Median depth of coverage (autosomes): 76 # ------ # Generated following files: # tumor_only.tumour.BAF.txt # tumor_only.tumour.logR.txt # ------ ``` The output logR and BAF files can be processed with _ASCAT without matched normal data protocol_: ```{r, eval=FALSE} ascat.bc = ASCAT::ascat.loadData( Tumor_LogR_file = "tumor_only.tumour.logR.txt", Tumor_BAF_file = "tumor_only.tumour.BAF.txt", chrs = c(1:22, "X", "Y"), sexchromosomes = c("X", "Y") ) ASCAT::ascat.plotRawData(ASCATobj = ascat.bc, img.prefix = "tumor_only") ascat.gg = ASCAT::ascat.predictGermlineGenotypes(ascat.bc) ascat.bc = ASCAT::ascat.aspcf(ascat.bc, ascat.gg=ascat.gg) ASCAT::ascat.plotSegmentedData(ascat.bc) ascat.output = ASCAT::ascat.runAscat(ascat.bc) ``` ## CBS segmentation Alternatively, tumor logR files generated by `prepAscat()`/`prepAscat_t()` can be processed with `segmentLogR()` function that performs circular binary segmentation and returns the [DNAcopy](https://bioconductor.org/packages/release/bioc/html/DNAcopy.html) object. ```{r, eval=FALSE} maftools::segmentLogR(tumor_logR = "tumor.tumour.logR.txt", sample_name = "tumor") # Analyzing: tumor # current chromosome: 1 # current chromosome: 2 # current chromosome: 3 # current chromosome: 4 # current chromosome: 5 # current chromosome: 6 # current chromosome: 7 # current chromosome: 8 # current chromosome: 9 # current chromosome: 10 # current chromosome: 11 # current chromosome: 12 # current chromosome: 13 # current chromosome: 14 # current chromosome: 15 # current chromosome: 16 # current chromosome: 17 # current chromosome: 18 # current chromosome: 19 # current chromosome: 20 # current chromosome: 21 # current chromosome: 22 # current chromosome: MT # current chromosome: X # current chromosome: Y # Segments are written to: tumor_only.tumour_cbs.seg # Segments are plotted to: tumor_only.tumour_cbs.png ``` # Processing Mosdepth output [Mosdepth](https://github.com/brentp/mosdepth) offers the fastest way to estimate coverage metrics from WGS bam files. Output generated by mosdepth can be processed with maftools function `plotMosdepth` and `plotMosdepth_t` for CNV analysis by performing segmentation and plotting. Below `mosdepth` command generates `tumor.regions.bed.gz` and `normal.regions.bed.gz` that contains depth of coverage across the genome in fixed windows. ``` mosdepth -n -b 5000 tumor tumor.bam mosdepth -n -b 5000 normal normal.bam ``` The output `{prefix}.regions.bed.gz` can be imported and analyzed with `maftools` in tumor/normal or tumor only mode. If you use the functions for CNV analysis, please cite the mosdepth publication ------------------------------------------------------------------------ ***Pedersen BS, Quinlan AR. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics. 2018;34(5):867-868. doi:10.1093/bioinformatics/btx699*** ------------------------------------------------------------------------ ## Tumor normal pair ```{r, eval=FALSE} plotMosdepth( t_bed = "tumor.regions.bed.gz", n_bed = "normal.regions.bed.gz", segment = TRUE, sample_name = "tumor" ) # Coverage ratio T/N: 1.821 # Running CBS segmentation: # Analyzing: tumor01 # current chromosome: 1 # current chromosome: 2 # current chromosome: 3 # current chromosome: 4 # current chromosome: 5 # current chromosome: 6 # current chromosome: 7 # current chromosome: 8 # current chromosome: 9 # current chromosome: 10 # current chromosome: 11 # current chromosome: 12 # current chromosome: 13 # current chromosome: 14 # current chromosome: 15 # current chromosome: 16 # current chromosome: 17 # current chromosome: 18 # current chromosome: 19 # current chromosome: 20 # current chromosome: 21 # current chromosome: 22 # current chromosome: X # current chromosome: Y # Segments are written to: tumor01_cbs.seg # Plotting ```

## Tumor only Above tumor sample without the germline control, normalized for median depth of coverage ```{r, eval=FALSE} plotMosdepth_t(bed = "tumor.regions.bed.gz") ```

# Session Info ```{r} sessionInfo() ```