Bacterial Genome Assembly with Snakemake Workflow

public 1yr ago 0 bookmarks

View Workflow

miseq_bac_assembly_annot_workflow — View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

This is the template for a new Snakemake workflow. Replace this text with a comprehensive description covering the purpose and domain. Insert your code into the respective folders, i.e. scripts , rules , and envs . Define the entry point of the workflow in the Snakefile and the main configuration in the config.yaml file.

Usage

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).

Step 1: Obtain a copy of this workflow

Create a new github repository using this workflow as a template .
Clone the newly created repository to your local system, into the place where you want to perform the data analysis.

Step 2: Configure workflow

Configure the workflow according to your needs via editing the files in the config/ folder. Adjust config.yaml to configure the workflow execution, and samples.tsv to specify your sample setup.

Step 3: Install Snakemake

Install Snakemake using conda :

conda create -c bioconda -c conda-forge -n snakemake snakemake

For installation details, see the instructions in the Snakemake documentation .

Step 4: Execute workflow

Activate the conda environment:

conda activate snakemake

Test your configuration by performing a dry-run via

snakemake --use-conda -n

Execute the workflow locally via

snakemake --use-conda --cores $N

using $N cores or run it in a cluster environment via

snakemake --use-conda --cluster qsub --jobs 100

Step 5: Investigate results

After successful execution, you can create a self-contained interactive HTML report with all results via:

snakemake --report report.html

This report can, e.g., be forwarded to your collaborators. An example (using some trivial test data) can be seen here .

Step 6: Commit changes

Whenever you change something, don't forget to commit the changes back to your github copy of the repository:

git commit -a
git push

Code Snippets

shell:
    "prokka {params.prokka} --cpus {threads} --outdir {output} --prefix {wildcards.sample} {input} 2> {log}"

SnakeMake metaprokka From line 13 of rules/annotation.smk

shell:
    """
    spades.py {params.spades} --threads {threads} -1 {input.fq1} -2 {input.fq2} -o {output[0]} 2> {log}
    seqkit seq {params.seqkit} {output[0]}/scaffolds.fasta > {output[1]}
    """

SnakeMake seqkit SPAdes From line 16 of rules/assembly.smk

shell:
    """
    skesa {params.skesa} --cores {threads} --reads {input.fq1},{input.fq2} --contigs_out {output[0]} 2> {log}
    seqkit seq {params.seqkit} {output[0]} > {output[1]}
    """

SnakeMake seqkit skesa From line 37 of rules/assembly.smk

shell:
    "quast.py {params.quast} --threads {threads} -l spades,skesa -o {output} {input[0]} {input[1]} 2> {log}"

SnakeMake QUAST From line 56 of rules/assembly.smk

run:
    import pandas as pd
    from shutil import copy

    quast = pd.read_csv(f"{input.quast}/report.tsv", sep="\t", header=0).set_index("Assembly", drop=False)
    quast.drop('Assembly', axis='columns', inplace=True)

    score = { i : 0 for i in quast.columns.to_list() }
    number_contigs = quast.loc['# contigs'].to_dict()
    largest_contig = quast.loc['Largest contig'].to_dict()
    total_length = quast.loc['Total length'].to_dict()
    n50 = quast.loc['N50'].to_dict()
    n75 = quast.loc['N75'].to_dict()
    predict_genes = quast.loc['# predicted genes (unique)'].to_dict()

    score[min(number_contigs, key=number_contigs.get)] += 1
    score[max(largest_contig, key=largest_contig.get)] += 1
    score[max(total_length, key=total_length.get)] += 1
    score[max(n50, key=n50.get)] += 1
    score[max(n75, key=n75.get)] += 1
    score[max(predict_genes, key=predict_genes.get)] += 3

    assembly = max(score, key=score.get)

    print(score)

    if assembly == 'spades':
        copy(f'{input.spades}', f'{output[0]}') 
    elif assembly == 'skesa':
        copy(f'{input.skesa}', f'{output[0]}') 

SnakeMake Pandas QUAST skesa From line 66 of rules/assembly.smk

shell:
    "busco {params.busco} --cpu {threads} -i {input} --out_path $(dirname {output}) -o $(basename {output}) 2> {log}"

SnakeMake From line 109 of rules/assembly.smk

shell:
    "trim_galore {params.trim} --basename {wildcards.sample} --cores {threads} --output_dir {output.out_dir} {input} 2> {log}"

SnakeMake Trim_Galore From line 15 of rules/quality.smk

ShowHide 1 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/osvaldoreisss/miseq_bac_assembly_annot_workflow

Name: miseq_bac_assembly_annot_workflow

Version: 1

Badge:

Insert copied code into your website to add a link to this workflow.

License: MIT License

Keywords:

skesa metaprokka Pandas QUAST seqkit Snakemake SPAdes Trim_Galore Sequence assembly

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free