A Nextflow pipeline for integrating short-read (SR) and long-read (LR) structural variant (SV) calls, rescuing SR-only SVs using LR data, and generating a polished, merged SV callset.
This pipeline is designed for tumor–normal WGS analyses and focuses on improving SV sensitivity by combining complementary sequencing technologies.
-
Merge LR and SR SVs
- Merge LR and SR VCFs using SURVIVOR
-
Polish merged SVs
- Normalize and convert merged VCFs to BEDPE/VCF formats
-
Classify SVs
- Identify:
- SR-only SVs
- LR-only SVs
- Shared SVs
- Identify:
-
Force calling SR-only SVs
- Generate VCF for force calling
- Re-call SR-only SVs in normal and tumor BAMs using cuteSV
-
Somatic filtering
- Compare tumor vs normal calls
- Filter and retain high-confidence somatic SVs
- Collect supporting read evidence
-
Final integration
- Merge rescued SR SVs with LR SVs
- Final polishing, sorting, compression, and indexing
- Long-read SV VCF
- Short-read SV VCF
- Tumor BAM (+ BAI)
- Normal BAM (+ BAI)
- Reference genome (FASTA)
- final.sr2lr.polished.vcf.gz
- final.sr2lr.polished.vcf.gz.tbi
- Nextflow
- SURVIVOR
- cuteSV
- samtools
- bgzip / tabix
- Perl (custom polishing and filtering scripts)
nextflow run main.nf \
--lrvcf_fn longread.vcf \
--srvcf_fn shortread.vcf \
--norm_fn normal.bam \
--tum_fn tumor.bam \
--reference reference.fa