TronFlow alignment pipeline

public 1yr ago Version: Version 1 0 bookmarks

View Workflow

The TronFlow alignment pipeline is part of a collection of computational workflows for tumor-normal pair somatic variant calling.

Find the documentation here

This pipeline aligns paired and single end FASTQ files with BWA aln and mem algorithms and with BWA mem 2. For RNA-seq STAR is also supported. To increase sensitivity of novel junctions use --star_two_pass_mode (recommended for RNAseq variant calling). It also includes an initial step of read trimming using FASTP.

How to run it

Run it from GitHub as follows:

nextflow run tron-bioinformatics/tronflow-alignment -profile conda --input_files $input --output $output --algorithm aln --library paired

Otherwise download the project and run as follows:

nextflow main.nf -profile conda --input_files $input --output $output --algorithm aln --library paired

Find the help as follows:

$ nextflow run tron-bioinformatics/tronflow-alignment --help
N E X T F L O W ~ version 19.07.0
Launching `main.nf` [intergalactic_shannon] - revision: e707c77d7b
Usage:
 nextflow main.nf --input_files input_files [--reference reference.fasta]
Input:
 * input_fastq1: the path to a FASTQ file (incompatible with --input_files)
 * input_name: name of the sample (only needed if input_fastq1 is used)
 * input_files: the path to a tab-separated values file containing in each row the sample name and two paired FASTQs (incompatible with --fastq1 and --fastq2)
 when `--library paired`, or a single FASTQ file when `--library single`
 Example input file:
 name1	fastq1.1	fastq1.2
 name2	fastq2.1	fastq2.2
 * reference: path to the indexed FASTA genome reference or the star reference folder in case of using star
Optional input:
 * input_fastq2: the path to a second FASTQ file (incompatible with --input_files, incompatible with --library paired)
 * output: the folder where to publish output (default: output)
 * algorithm: determines the BWA algorithm, either `aln`, `mem`, `mem2` or `star` (default `aln`)
 * library: determines whether the sequencing library is paired or single end, either `paired` or `single` (default `paired`)
 * cpus: determines the number of CPUs for each job, with the exception of bwa sampe and samse steps which are not parallelized (default: 8)
 * memory: determines the memory required by each job (default: 32g)
 * inception: if enabled it uses an inception, only valid for BWA aln, it requires a fast file system such as flash (default: false)
 * skip_trimming: skips the read trimming step
 * star_two_pass_mode: activates STAR two-pass mode, increasing sensitivity of novel junction discovery, recommended for RNA variant calling (default: false)
 * additional_args: additional alignment arguments, only effective in BWA mem, BWA mem 2 and STAR (default: none) 
Output:
 * A BAM file \${name}.bam and its index
 * FASTP read trimming stats report in HTML format \${name.fastp_stats.html}
 * FASTP read trimming stats report in JSON format \${name.fastp_stats.json}

Input tables

The table with FASTQ files expects two tab-separated columns without a header

Sample name	FASTQ 1	FASTQ 2
sample_1	/path/to/sample_1.1.fastq	/path/to/sample_1.2.fastq
sample_2	/path/to/sample_2.1.fastq	/path/to/sample_2.2.fastq

Reference genome

The reference genome has to be provided in FASTA format and it requires two set of indexes:

FAI index. Create with samtools faidx your.fasta
BWA indexes. Create with bwa index your.fasta

For bwa-mem2 a specific index is needed:

bwa-mem2 index your.fasta

For star a reference folder prepared with star has to be provided. In order to prepare it will need the reference genome in FASTA format and the gene annotations in GTF format. Run a command as follows:

STAR --runMode genomeGenerate --genomeDir $YOUR_FOLDER --genomeFastaFiles $YOUR_FASTA --sjdbGTFfile $YOUR_GTF

References

Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler Transform. Bioinformatics, Epub. https://doi.org/10.1093/bioinformatics/btp698
Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu; fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 1 September 2018, Pages i884–i890, https://doi.org/10.1093/bioinformatics/bty560
Vasimuddin Md, Sanchit Misra, Heng Li, Srinivas Aluru. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE Parallel and Distributed Processing Symposium (IPDPS), 2019.
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25. PMID: 23104886; PMCID: PMC3530905.

Code Snippets

"""
# --input_files needs to be forced, otherwise it is inherited from profile in tests
fastp \
--in1 ${fastq1} \
--in2 ${fastq2} \
--out1 ${fastq1.baseName}.trimmed.fq.gz \
--out2 ${fastq2.baseName}.trimmed.fq.gz \
--json ${name}.fastp_stats.json \
--html ${name}.fastp_stats.html \
--thread ${params.cpus}

echo ${params.manifest} >> software_versions.${task.process}.txt
fastp --version 2>> software_versions.${task.process}.txt
"""

NextFlow fastp From line 21 of modules/01_fastp.nf

"""
# --input_files needs to be forced, otherwise it is inherited from profile in tests
fastp \
--in1 ${fastq1} \
--out1 ${fastq1.baseName}.trimmed.fq.gz \
--json ${name}.fastp_stats.json \
--html ${name}.fastp_stats.html \
--thread ${params.cpus}

echo ${params.manifest} >> software_versions.${task.process}.txt
fastp --version 2>> software_versions.${task.process}.txt
"""

NextFlow fastp From line 55 of modules/01_fastp.nf

"""
bwa aln -t ${task.cpus} ${params.reference} ${fastq} > ${fastq.baseName}.sai

echo ${params.manifest} >> software_versions.${task.process}.txt
echo "bwa=0.7.17"  >> software_versions.${task.process}.txt
"""

NextFlow BWA From line 17 of modules/02_bwa_aln.nf

"""
bwa sampe ${params.reference} ${sai1} ${sai2} ${fastq1} ${fastq2} | samtools view -uS - | samtools sort - > ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
echo "bwa=0.7.17"  >> software_versions.${task.process}.txt
samtools --version >> software_versions.${task.process}.txt
"""

NextFlow SAMtools BWA From line 42 of modules/02_bwa_aln.nf

"""
bwa samse ${params.reference} ${sai} ${fastq} | samtools view -uS - | samtools sort - > ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
echo "bwa=0.7.17"  >> software_versions.${task.process}.txt
"""

NextFlow SAMtools BWA From line 68 of modules/02_bwa_aln.nf

"""
bwa sampe ${params.reference} <( bwa aln -t ${params.cpus} ${params.reference} ${fastq1} ) \
<( bwa aln -t ${params.cpus} ${params.reference} ${fastq2} ) ${fastq1} ${fastq2} \
| samtools view -uS - | samtools sort - > ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
echo "bwa=0.7.17"  >> software_versions.${task.process}.txt
samtools --version >> software_versions.${task.process}.txt
"""

NextFlow SAMtools BWA From line 94 of modules/02_bwa_aln.nf

"""
bwa-mem2 mem ${params.additional_args} -t ${task.cpus} ${params.reference} ${fastq1} ${fastq2} | samtools view -uS - | samtools sort - > ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
bwa-mem2 version  >> software_versions.${task.process}.txt
samtools --version >> software_versions.${task.process}.txt
"""

NextFlow SAMtools Bwa-mem2 From line 18 of modules/02_bwa_mem_2.nf

"""
bwa-mem2 mem ${params.additional_args} -t ${task.cpus} ${params.reference} ${fastq} | samtools view -uS - | samtools sort - > ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
bwa-mem2 version  >> software_versions.${task.process}.txt
samtools --version >> software_versions.${task.process}.txt
"""

NextFlow SAMtools Bwa-mem2 From line 44 of modules/02_bwa_mem_2.nf

"""
bwa mem ${params.additional_args} -t ${task.cpus} ${params.reference} ${fastq1} ${fastq2} | samtools view -uS - | samtools sort - > ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
echo "bwa=0.7.17"  >> software_versions.${task.process}.txt
samtools --version >> software_versions.${task.process}.txt
"""

NextFlow SAMtools BWA From line 18 of modules/02_bwa_mem.nf

"""
bwa mem ${params.additional_args} -t ${task.cpus} ${params.reference} ${fastq} | samtools view -uS - | samtools sort - > ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
echo "bwa=0.7.17"  >> software_versions.${task.process}.txt
samtools --version >> software_versions.${task.process}.txt
"""

NextFlow SAMtools BWA From line 44 of modules/02_bwa_mem.nf

"""
STAR --genomeDir ${params.reference} ${two_pass_mode_param} ${params.additional_args} \
--readFilesCommand "gzip -d -c -f" \
--readFilesIn ${fastq1} ${fastq2} \
--outSAMmode Full \
--outSAMattributes Standard \
--outSAMunmapped None \
--outReadsUnmapped Fastx \
--outFilterMismatchNoverLmax 0.02 \
--runThreadN ${task.cpus} \
--outSAMtype BAM SortedByCoordinate \
--outFileNamePrefix ${name}.

mv ${name}.Aligned.sortedByCoord.out.bam ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
STAR --version >> software_versions.${task.process}.txt
"""

NextFlow STAR From line 20 of modules/02_star.nf

"""
STAR --genomeDir ${params.reference} ${two_pass_mode_param} ${params.additional_args} \
--readFilesCommand "gzip -d -c -f" \
--readFilesIn ${fastq} \
--outSAMmode Full \
--outSAMattributes Standard \
--outSAMunmapped None \
--outReadsUnmapped Fastx \
--outFilterMismatchNoverLmax 0.02 \
--runThreadN ${task.cpus} \
--outSAMtype BAM SortedByCoordinate \
--outFileNamePrefix ${name}.

mv ${name}.Aligned.sortedByCoord.out.bam ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
STAR --version >> software_versions.${task.process}.txt
"""

NextFlow STAR From line 58 of modules/02_star.nf

"""
samtools index -@ ${task.cpus} ${bam}

echo ${params.manifest} >> software_versions.${task.process}.txt
samtools --version >> software_versions.${task.process}.txt
"""