Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
layout: page
First follow the instructions here:
Step by step guide on how to use my pipelines
Click
here
for an introduction to Snakemake
ABOUT
This is a pipeline to map short reads to a reference assembly. It outputs the mapped reads, a qualimap report and does variant calling.
Tools used:
-
Bwamem2 - mapping
-
Samtools - processing
-
Qualimap - mapping summary
-
Freebayes - variant calling
|
|
|:--:|
|
Pipeline workflow
|
Edit config.yaml with the paths to your files
OUTDIR: /path/to/output
READS_DIR: /path/to/reads/ # don't add the reads files, just the directory where they are
ASSEMBLY: /path/to/assembly
PREFIX: <output name>
-
OUTDIR - directory where snakemake will run and where the results will be written to
-
READS_DIR - path to the directory that contains the reads
-
ASSEMBLY - path to the assembly file
-
PREFIX - prefix for the final mapped reads file
If you want the results to be written to this directory (not to a new directory), and comment out
OUTDIR: /path/to/output
in the config file
For the mapping step you should have one _1 fastq file and one _2 fastq file in
READS_DIR
. If you have several _1 and _2 fastq files from the same sample, you can combine them so you have one file for all _1 reads and one for all the _2 reads. This can be done by concatenating them using
cat
, if the original files are not compressed (
fastq
or
fq
extension), or
zcat
if the original files are compressed (
fastq.gz
or
fq.gz
extension).
Example where your files are in the same directory and are compressed:
zcat *_1.fastq.gz > <new file name>_1.fastq.gz
zcat *_2.fastq.gz > <new file name>_2.fastq.gz
RESULTS
-
dated file with an overview of the files used to run the pipeline (for documentation purposes)
-
sorted_reads directory with the file containing the mapped reads
-
results directory containing the qualimap results
-
variant_calling directory containing the variant calling VCF file and file with VCF statistics
Code Snippets
13 14 15 16 | run: with open(output[0], "w") as outfile: outfile.write(f"Files used to run {pipeline} on {running_date}\n") for key, value in config.items(): |
50 51 | shell: "bwa-mem2 index {input}" |
66 67 68 69 70 | shell: """ module load samtools bwa-mem2 mem -t {resources.cpus} {input.assembly} {input.reads} | samblaster -r | samtools view -b - > {output} """ |
83 84 | shell: "module load samtools && samtools sort -m 2G -@ {resources.cpus} -O bam {input} > {output}" |
97 98 | shell: "module load samtools && samtools index -@ {resources.cpus} {input}" |
112 113 | shell: "unset DISPLAY && qualimap bamqc -bam {input.bam} --java-mem-size=16G -nt 1 -outdir {params.outdir}" |
126 127 128 129 130 | shell: """ module load freebayes samtools vcflib/gcc/64/0.00.2019.07.10 freebayes -f {input.reference} --use-best-n-alleles 4 --min-base-quality 10 --min-alternate-fraction 0.2 --haplotype-length 0 --ploidy 2 --min-alternate-count 2 --bam {input.bam} | vcffilter -f 'QUAL > 20' | bgzip -c > {output} """ |
141 142 | shell: "module load bcftools && tabix -p vcf {input}" |
152 153 154 155 156 | shell: """ module load bcftools bcftools stats {input.vcf} > {output} """ |
Support
- Future updates
Related Workflows





