Short read mapping and variant calling

public public 1yr ago Version: v0.1.0 0 bookmarks

layout: page

Link to the repository

First follow the instructions here:

Step by step guide on how to use my pipelines
Click here for an introduction to Snakemake

ABOUT

This is a pipeline to map short reads to a reference assembly. It outputs the mapped reads, a qualimap report and does variant calling.

Tools used:

  • Bwamem2 - mapping

  • Samtools - processing

  • Qualimap - mapping summary

  • Freebayes - variant calling

| DAG | |:--:| | Pipeline workflow |

Edit config.yaml with the paths to your files

OUTDIR: /path/to/output
READS_DIR: /path/to/reads/ # don't add the reads files, just the directory where they are
ASSEMBLY: /path/to/assembly
PREFIX: <output name>
  • OUTDIR - directory where snakemake will run and where the results will be written to

  • READS_DIR - path to the directory that contains the reads

  • ASSEMBLY - path to the assembly file

  • PREFIX - prefix for the final mapped reads file

If you want the results to be written to this directory (not to a new directory), and comment out OUTDIR: /path/to/output in the config file

For the mapping step you should have one _1 fastq file and one _2 fastq file in READS_DIR . If you have several _1 and _2 fastq files from the same sample, you can combine them so you have one file for all _1 reads and one for all the _2 reads. This can be done by concatenating them using cat , if the original files are not compressed ( fastq or fq extension), or zcat if the original files are compressed ( fastq.gz or fq.gz extension). Example where your files are in the same directory and are compressed:

zcat *_1.fastq.gz > <new file name>_1.fastq.gz
zcat *_2.fastq.gz > <new file name>_2.fastq.gz

RESULTS

  • dated file with an overview of the files used to run the pipeline (for documentation purposes)

  • sorted_reads directory with the file containing the mapped reads

  • results directory containing the qualimap results

  • variant_calling directory containing the variant calling VCF file and file with VCF statistics

Code Snippets

13
14
15
16
run:
    with open(output[0], "w") as outfile:
        outfile.write(f"Files used to run {pipeline} on {running_date}\n")
        for key, value in config.items():     
50
51
shell:
    "bwa-mem2 index {input}"
66
67
68
69
70
shell:
    """
    module load samtools
    bwa-mem2 mem -t {resources.cpus} {input.assembly} {input.reads} | samblaster -r | samtools view -b - > {output}
    """
83
84
shell: 
    "module load samtools && samtools sort -m 2G -@ {resources.cpus} -O bam {input} > {output}"
97
98
shell:
    "module load samtools && samtools index -@ {resources.cpus} {input}"
112
113
shell: 
    "unset DISPLAY && qualimap bamqc -bam {input.bam} --java-mem-size=16G -nt 1 -outdir {params.outdir}"
126
127
128
129
130
shell:
    """
    module load freebayes samtools vcflib/gcc/64/0.00.2019.07.10
    freebayes -f {input.reference} --use-best-n-alleles 4 --min-base-quality 10 --min-alternate-fraction 0.2 --haplotype-length 0 --ploidy 2 --min-alternate-count 2 --bam {input.bam} | vcffilter -f 'QUAL > 20' | bgzip -c > {output}
    """
141
142
shell:
    "module load bcftools && tabix -p vcf {input}"
152
153
154
155
156
    shell:
        """
module load bcftools
bcftools stats {input.vcf} > {output}
        """
ShowHide 4 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/CarolinaPB/WUR_mapping-variant-calling
Name: wur_mapping-variant-calling
Version: v0.1.0
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...