Amplicon Sequencing snakemake workflow

public 1yr ago Version: v0.1.0 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

AmpSeeker is a snakemake workflow for Amplicon Sequencing data analysis. The pipeline is a work in progress, however, it currently implements:

BCL to Fastq conversion
Genome alignment
Variant calling
Quality control
Coverage
Visualisation of reads in IGV
VCF to DataFrame/.xlsx
Allele frequency calculation
Principal component analysis
Geographic sample maps

The workflow uses a combination of papermill and jupyter book, so that users can visually explore the results in a local webpage for convenience.

Usage

Please see the wiki for more information on running the workflow. If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).

Testing

Test cases are in the subfolder .test . They are automatically executed via continuous integration with Github Actions .

Code Snippets

shell:
    """
    samtools faidx {input.ref} 2> {log}
    """

SnakeMake SAMtools From line 12 of rules/alignment-variantcalling.smk

shell:
    """
    bwa index {input.ref} 2> {log}
    """

SnakeMake BWA From line 30 of rules/alignment-variantcalling.smk

shell:
    """
    bwa mem -t {threads} -R {params.tag} {input.ref} {input.reads} 2> {log.align} |
    samtools sort -@{threads} -o {output} 2> {log.sort}
    """

SnakeMake SAMtools BWA From line 58 of rules/alignment-variantcalling.smk

shell:
    "samtools index {input} {output} 2> {log}"

SnakeMake SAMtools From line 73 of rules/alignment-variantcalling.smk

shell:
    """
    bcftools mpileup -Ov -f {params.ref} -R {params.regions} --max-depth {params.depth} {input.bam} 2> {log.mpileup} |
    bcftools call -m -Ov 2> {log.call} | bcftools sort -Ov -o {output.calls} 2> {log.call}
    """

SnakeMake BCFtools From line 96 of rules/alignment-variantcalling.smk

shell:
    """
    bcftools mpileup -Ov -f {params.ref} --max-depth {params.depth} {input.bam} 2> {log.mpileup} |
    bcftools call -m -Ov 2> {log.call} | bcftools sort -Ov -o {output.calls} 2> {log.call}
    """

SnakeMake BCFtools From line 121 of rules/alignment-variantcalling.smk

shell:
    "snpEff download {params.ref} -dataDir {params.dir} 2> {log}"

SnakeMake snpEff From line 142 of rules/alignment-variantcalling.smk

shell:
    """
    snpEff eff {params.db} -dataDir {params.dir} -csvStats {output.csvStats} {input.calls} > {output.calls} 2> {log}
    """

SnakeMake snpEff From line 164 of rules/alignment-variantcalling.smk

shell:
    """
    papermill {input.nb} {output.nb} -k AmpSeq_python -p metadata_path {input.metadata} -p genome_name {params.reference_name} -p reference_fasta {input.genome} -p reference_gff3 {input.gff3} 2> {log}
    cp {output.nb} {output.docs_nb} 2>> {log}
    """

SnakeMake papermill From line 20 of rules/analysis.smk

shell:
    """
    papermill {input.nb} {output.nb} -k AmpSeq_python -p metadata_path {input.metadata} -p dataset {params.dataset} -p vcf_path {input.vcf} -p colour_column {params.column} 2> {log}
    cp {output.nb} {output.docs_nb} 2>> {log}
    """

SnakeMake papermill From line 43 of rules/analysis.smk

shell:
    """
    papermill {input.nb} {output.nb} -k AmpSeq_python -p metadata_path {input.metadata} -p dataset {params.dataset} 2> {log}
    cp {output.nb} {output.docs_nb} 2>> {log}
    """

SnakeMake papermill From line 63 of rules/analysis.smk

shell:
    """
    papermill {input.nb} {output.nb} -k AmpSeq_python -p metadata_path {input.metadata} -p dataset {params.dataset} -p bed_path {input.bed} -p cohort_column {params.cohort_column} -p vcf_path {input.vcf} 2> {log}
    cp {output.nb} {output.docs_nb} 2>> {log}
    """

SnakeMake papermill From line 87 of rules/analysis.smk

shell:
    """
    papermill {input.nb} {output.nb} -k AmpSeq_python -p dataset {params.dataset} -p wkdir {params.wkdir} 2> {log}
    cp {output.nb} {output.docs_nb} 2>> {log}
    """

SnakeMake papermill From line 108 of rules/analysis.smk

shell:
    "bcl-convert --bcl-input-directory {input.illumina_in_dir} --output-directory {output.reads_dir} --sample-sheet {input.sample_csv} --force 2> {log}"

SnakeMake BCL Convert From line 12 of rules/bcl-convert.smk

shell:
    """
    while IFS="," read -r _ sample_id _ _ read1 read2; do

        if [[ $sample_id == "RGSM" ]]; then
           continue
        fi

        echo renaming $read1 and $read2 to ${{sample_id}}_1.fastq.gz and ${{sample_id}}_2.fastq.gz 
        mv $read1 resources/reads/${{sample_id}}_1.fastq.gz
        mv $read2 resources/reads/${{sample_id}}_2.fastq.gz
    done < {input.fastq_list}
    """

SnakeMake From line 32 of rules/bcl-convert.smk

shell: 
    """
    python -m ipykernel install --user --name=AmpSeq_python 2> {log}
    """

SnakeMake ipykernel From line 25 of rules/common.smk

shell:
    """
    jupyter-book build --all {input.pages} --path-output results/ampseeker-results &&
    ln -sf results/ampseeker-results/_build/html/index.html AmpSeeker-results.html
    """

SnakeMake From line 20 of rules/jupyter-book.smk

shell:
    """
    papermill -k AmpSeq_python {input.input_nb} {output.out_nb} -p wkdir {params.wkdir} 2> {log}
    """

SnakeMake papermill From line 46 of rules/jupyter-book.smk

shell:
    """
    papermill {input.nb} {output.nb} -k AmpSeq_python -p metadata_path {input.metadata} 2> {log}
    cp {output.nb} {output.docs_nb} 2>> {log}
    """

SnakeMake papermill From line 17 of rules/qc-notebooks.smk

shell:
    """
    papermill {input.nb} {output.nb} -k AmpSeq_python -p metadata_path {input.metadata} -p index_read_qc {params.index_qc} -p wkdir {params.wd} 2> {log}
    cp {output.nb} {output.docs_nb} 2>> {log}
    """

SnakeMake papermill From line 39 of rules/qc-notebooks.smk

shell:
    """
    papermill {input.nb} {output.nb} -k AmpSeq_python -p metadata_path {input.metadata} -p bed_targets_path {input.targets} -p wkdir {params.wkdir} 2> {log}
    cp {output.nb} {output.docs_nb} 2>> {log}
    """

SnakeMake papermill From line 62 of rules/qc-notebooks.smk

shell:
    """
    zcat resources/reads/*I1*.fastq.gz | fastqc stdin --outdir results/qc/index-read-qc/ 2>> {log}
    mv results/qc/index-read-qc/stdin_fastqc.html results/qc/index-read-qc/I1.html 2>> {log}
    mv results/qc/index-read-qc/stdin_fastqc.zip results/qc/index-read-qc/I1.zip 2>> {log}

    zcat resources/reads/*I2*.fastq.gz | fastqc stdin --outdir results/index-read-qc/ 2>> {log}
    mv results/qc/index-read-qc/stdin_fastqc.html results/qc/index-read-qc/I2.html 2>> {log}
    mv results/qc/index-read-qc/stdin_fastqc.zip results/qc/index-read-qc/I2.zip 2>> {log}
    """

SnakeMake FastQC From line 11 of rules/qc.smk

wrapper:
    "v1.25.0/bio/fastp"

SnakeMake fastp From line 32 of rules/qc.smk

shell:
    """
    mosdepth {params.prefix} {input.bam} --fast-mode --threads {threads} 2> {log}
    """

SnakeMake mosdepth From line 54 of rules/qc.smk

shell:
    """
    samtools flagstat {input.bam} > {output} 2> {log}
    """

SnakeMake SAMtools From line 72 of rules/qc.smk

shell:
    """
    qualimap bamqc -bam {input.bam} -outdir {output.folder} 2> {log}
    """

SnakeMake QualiMap From line 88 of rules/qc.smk

shell:
    """
    bcftools stats {input.vcf} > {output.stats} 2> {log}
    """

SnakeMake BCFtools From line 102 of rules/qc.smk

wrapper: 
    "v2.2.1/bio/multiqc"

SnakeMake MultiQC From line 124 of rules/qc.smk

wrapper:
    "v1.17.4-17-g62b55d45/bio/bgzip"

SnakeMake From line 13 of rules/utilities.smk

shell:
    """
    tabix {input.calls} 2> {log}
    """

SnakeMake tabix From line 25 of rules/utilities.smk

shell:
    """
    bcftools merge --threads {threads} -o {output.vcf} -O v {input.vcfs} --force-samples 2> {log}
    """

SnakeMake BCFtools From line 41 of rules/utilities.smk

shell:
    """
    bcftools merge --threads {threads} -o {output.vcf} -O v {input.vcfs} 2> {log}
    """

SnakeMake BCFtools From line 57 of rules/utilities.smk

shell:
    """
    bcftools merge --threads {threads} -o {output.vcf} -O v {input.vcfs} 2> {log}
    """

SnakeMake BCFtools From line 73 of rules/utilities.smk

wrapper:
    "v1.17.4-17-g62b55d45/bio/bgzip"

SnakeMake From line 85 of rules/utilities.smk

shell:
    """
    tabix {input.vcfgz} 2> {log}
    """

SnakeMake tabix From line 97 of rules/utilities.smk

shell:
    """
    bcftools merge -o {output.vcf} -Ov {input.vcf} 2> {log}
    """

SnakeMake BCFtools From line 113 of rules/utilities.smk

__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "[email protected]"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    """
    (bgzip -c {extra} --threads {snakemake.threads} \
        {snakemake.input} > {snakemake.output}) {log}
    """
)

Python Snakemake From line 1 of bgzip/wrapper.py

__author__ = "Sebastian Kurscheid"
__copyright__ = "Copyright 2019, Sebastian Kurscheid"
__email__ = "[email protected]"
__license__ = "MIT"

from snakemake.shell import shell
import re

extra = snakemake.params.get("extra", "")
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


# Assert input
n = len(snakemake.input.sample)
assert (
    n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."


# Input files
if n == 1:
    reads = "--in1 {}".format(snakemake.input.sample)
else:
    reads = "--in1 {} --in2 {}".format(*snakemake.input.sample)


# Output files
trimmed_paths = snakemake.output.get("trimmed", None)
if trimmed_paths:
    if n == 1:
        trimmed = "--out1 {}".format(snakemake.output.trimmed)
    else:
        trimmed = "--out1 {} --out2 {}".format(*snakemake.output.trimmed)

        # Output unpaired files
        unpaired = snakemake.output.get("unpaired", None)
        if unpaired:
            trimmed += f" --unpaired1 {unpaired} --unpaired2 {unpaired}"
        else:
            unpaired1 = snakemake.output.get("unpaired1", None)
            if unpaired1:
                trimmed += f" --unpaired1 {unpaired1}"
            unpaired2 = snakemake.output.get("unpaired2", None)
            if unpaired2:
                trimmed += f" --unpaired2 {unpaired2}"

        # Output merged PE reads
        merged = snakemake.output.get("merged", None)
        if merged:
            if not re.search(r"--merge\b", extra):
                raise ValueError(
                    "output.merged specified but '--merge' option missing from params.extra"
                )
            trimmed += f" --merged_out {merged}"
else:
    trimmed = ""


# Output failed reads
failed = snakemake.output.get("failed", None)
if failed:
    trimmed += f" --failed_out {failed}"


# Stats
html = "--html {}".format(snakemake.output.html)
json = "--json {}".format(snakemake.output.json)


shell(
    "(fastp --thread {snakemake.threads} "
    "{extra} "
    "{adapters} "
    "{reads} "
    "{trimmed} "
    "{json} "
    "{html} ) {log}"
)

Python Snakemake JSON From line 1 of fastp/wrapper.py

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "[email protected]"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
# Set this to False if multiqc should use the actual input directly
# instead of parsing the folders where the provided files are located
use_input_files_only = snakemake.params.get("use_input_files_only", False)

if not use_input_files_only:
    input_data = set(path.dirname(fp) for fp in snakemake.input)
else:
    input_data = set(snakemake.input)

output_dir = path.dirname(snakemake.output[0])
output_name = path.basename(snakemake.output[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "multiqc"
    " {extra}"
    " --force"
    " -o {output_dir}"
    " -n {output_name}"
    " {input_data}"
    " {log}"
)