TronFlow alignment pipeline

public public 1yr ago Version: Version 1 0 bookmarks

The TronFlow alignment pipeline is part of a collection of computational workflows for tumor-normal pair somatic variant calling.

Find the documentation here

This pipeline aligns paired and single end FASTQ files with BWA aln and mem algorithms and with BWA mem 2. For RNA-seq STAR is also supported. To increase sensitivity of novel junctions use --star_two_pass_mode (recommended for RNAseq variant calling). It also includes an initial step of read trimming using FASTP.

How to run it

Run it from GitHub as follows:

nextflow run tron-bioinformatics/tronflow-alignment -profile conda --input_files $input --output $output --algorithm aln --library paired

Otherwise download the project and run as follows:

nextflow main.nf -profile conda --input_files $input --output $output --algorithm aln --library paired

Find the help as follows:

$ nextflow run tron-bioinformatics/tronflow-alignment --help
N E X T F L O W ~ version 19.07.0
Launching `main.nf` [intergalactic_shannon] - revision: e707c77d7b
Usage:
 nextflow main.nf --input_files input_files [--reference reference.fasta]
Input:
 * input_fastq1: the path to a FASTQ file (incompatible with --input_files)
 * input_name: name of the sample (only needed if input_fastq1 is used)
 * input_files: the path to a tab-separated values file containing in each row the sample name and two paired FASTQs (incompatible with --fastq1 and --fastq2)
 when `--library paired`, or a single FASTQ file when `--library single`
 Example input file:
 name1	fastq1.1	fastq1.2
 name2	fastq2.1	fastq2.2
 * reference: path to the indexed FASTA genome reference or the star reference folder in case of using star
Optional input:
 * input_fastq2: the path to a second FASTQ file (incompatible with --input_files, incompatible with --library paired)
 * output: the folder where to publish output (default: output)
 * algorithm: determines the BWA algorithm, either `aln`, `mem`, `mem2` or `star` (default `aln`)
 * library: determines whether the sequencing library is paired or single end, either `paired` or `single` (default `paired`)
 * cpus: determines the number of CPUs for each job, with the exception of bwa sampe and samse steps which are not parallelized (default: 8)
 * memory: determines the memory required by each job (default: 32g)
 * inception: if enabled it uses an inception, only valid for BWA aln, it requires a fast file system such as flash (default: false)
 * skip_trimming: skips the read trimming step
 * star_two_pass_mode: activates STAR two-pass mode, increasing sensitivity of novel junction discovery, recommended for RNA variant calling (default: false)
 * additional_args: additional alignment arguments, only effective in BWA mem, BWA mem 2 and STAR (default: none) 
Output:
 * A BAM file \${name}.bam and its index
 * FASTP read trimming stats report in HTML format \${name.fastp_stats.html}
 * FASTP read trimming stats report in JSON format \${name.fastp_stats.json}

Input tables

The table with FASTQ files expects two tab-separated columns without a header

Sample name FASTQ 1 FASTQ 2
sample_1 /path/to/sample_1.1.fastq /path/to/sample_1.2.fastq
sample_2 /path/to/sample_2.1.fastq /path/to/sample_2.2.fastq

Reference genome

The reference genome has to be provided in FASTA format and it requires two set of indexes:

  • FAI index. Create with samtools faidx your.fasta

  • BWA indexes. Create with bwa index your.fasta

For bwa-mem2 a specific index is needed:

bwa-mem2 index your.fasta

For star a reference folder prepared with star has to be provided. In order to prepare it will need the reference genome in FASTA format and the gene annotations in GTF format. Run a command as follows:

STAR --runMode genomeGenerate --genomeDir $YOUR_FOLDER --genomeFastaFiles $YOUR_FASTA --sjdbGTFfile $YOUR_GTF

References

  • Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler Transform. Bioinformatics, Epub. https://doi.org/10.1093/bioinformatics/btp698

  • Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu; fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 1 September 2018, Pages i884–i890, https://doi.org/10.1093/bioinformatics/bty560

  • Vasimuddin Md, Sanchit Misra, Heng Li, Srinivas Aluru. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE Parallel and Distributed Processing Symposium (IPDPS), 2019.

  • Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25. PMID: 23104886; PMCID: PMC3530905.

Code Snippets

21
22
23
24
25
26
27
28
29
30
31
32
33
34
"""
# --input_files needs to be forced, otherwise it is inherited from profile in tests
fastp \
--in1 ${fastq1} \
--in2 ${fastq2} \
--out1 ${fastq1.baseName}.trimmed.fq.gz \
--out2 ${fastq2.baseName}.trimmed.fq.gz \
--json ${name}.fastp_stats.json \
--html ${name}.fastp_stats.html \
--thread ${params.cpus}

echo ${params.manifest} >> software_versions.${task.process}.txt
fastp --version 2>> software_versions.${task.process}.txt
"""
55
56
57
58
59
60
61
62
63
64
65
66
"""
# --input_files needs to be forced, otherwise it is inherited from profile in tests
fastp \
--in1 ${fastq1} \
--out1 ${fastq1.baseName}.trimmed.fq.gz \
--json ${name}.fastp_stats.json \
--html ${name}.fastp_stats.html \
--thread ${params.cpus}

echo ${params.manifest} >> software_versions.${task.process}.txt
fastp --version 2>> software_versions.${task.process}.txt
"""
17
18
19
20
21
22
"""
bwa aln -t ${task.cpus} ${params.reference} ${fastq} > ${fastq.baseName}.sai

echo ${params.manifest} >> software_versions.${task.process}.txt
echo "bwa=0.7.17"  >> software_versions.${task.process}.txt
"""
42
43
44
45
46
47
48
"""
bwa sampe ${params.reference} ${sai1} ${sai2} ${fastq1} ${fastq2} | samtools view -uS - | samtools sort - > ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
echo "bwa=0.7.17"  >> software_versions.${task.process}.txt
samtools --version >> software_versions.${task.process}.txt
"""
68
69
70
71
72
73
"""
bwa samse ${params.reference} ${sai} ${fastq} | samtools view -uS - | samtools sort - > ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
echo "bwa=0.7.17"  >> software_versions.${task.process}.txt
"""
 94
 95
 96
 97
 98
 99
100
101
102
"""
bwa sampe ${params.reference} <( bwa aln -t ${params.cpus} ${params.reference} ${fastq1} ) \
<( bwa aln -t ${params.cpus} ${params.reference} ${fastq2} ) ${fastq1} ${fastq2} \
| samtools view -uS - | samtools sort - > ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
echo "bwa=0.7.17"  >> software_versions.${task.process}.txt
samtools --version >> software_versions.${task.process}.txt
"""
18
19
20
21
22
23
24
"""
bwa-mem2 mem ${params.additional_args} -t ${task.cpus} ${params.reference} ${fastq1} ${fastq2} | samtools view -uS - | samtools sort - > ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
bwa-mem2 version  >> software_versions.${task.process}.txt
samtools --version >> software_versions.${task.process}.txt
"""
44
45
46
47
48
49
50
"""
bwa-mem2 mem ${params.additional_args} -t ${task.cpus} ${params.reference} ${fastq} | samtools view -uS - | samtools sort - > ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
bwa-mem2 version  >> software_versions.${task.process}.txt
samtools --version >> software_versions.${task.process}.txt
"""
18
19
20
21
22
23
24
"""
bwa mem ${params.additional_args} -t ${task.cpus} ${params.reference} ${fastq1} ${fastq2} | samtools view -uS - | samtools sort - > ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
echo "bwa=0.7.17"  >> software_versions.${task.process}.txt
samtools --version >> software_versions.${task.process}.txt
"""
44
45
46
47
48
49
50
"""
bwa mem ${params.additional_args} -t ${task.cpus} ${params.reference} ${fastq} | samtools view -uS - | samtools sort - > ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
echo "bwa=0.7.17"  >> software_versions.${task.process}.txt
samtools --version >> software_versions.${task.process}.txt
"""
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
"""
STAR --genomeDir ${params.reference} ${two_pass_mode_param} ${params.additional_args} \
--readFilesCommand "gzip -d -c -f" \
--readFilesIn ${fastq1} ${fastq2} \
--outSAMmode Full \
--outSAMattributes Standard \
--outSAMunmapped None \
--outReadsUnmapped Fastx \
--outFilterMismatchNoverLmax 0.02 \
--runThreadN ${task.cpus} \
--outSAMtype BAM SortedByCoordinate \
--outFileNamePrefix ${name}.

mv ${name}.Aligned.sortedByCoord.out.bam ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
STAR --version >> software_versions.${task.process}.txt
"""
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
"""
STAR --genomeDir ${params.reference} ${two_pass_mode_param} ${params.additional_args} \
--readFilesCommand "gzip -d -c -f" \
--readFilesIn ${fastq} \
--outSAMmode Full \
--outSAMattributes Standard \
--outSAMunmapped None \
--outReadsUnmapped Fastx \
--outFilterMismatchNoverLmax 0.02 \
--runThreadN ${task.cpus} \
--outSAMtype BAM SortedByCoordinate \
--outFileNamePrefix ${name}.

mv ${name}.Aligned.sortedByCoord.out.bam ${name}.bam

echo ${params.manifest} >> software_versions.${task.process}.txt
STAR --version >> software_versions.${task.process}.txt
"""
18
19
20
21
22
23
"""
samtools index -@ ${task.cpus} ${bam}

echo ${params.manifest} >> software_versions.${task.process}.txt
samtools --version >> software_versions.${task.process}.txt
"""
ShowHide 8 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/TRON-Bioinformatics/tronflow-alignment
Name: tronflow-alignment-pipeline
Version: Version 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...