A Snakemake workflow to cluster proteins using MMseqs2

public public 1yr ago Version: v0.0.9000 0 bookmarks
Snakemake logo

A Snakemake workflow to cluster proteins using MMseqs2.

Installation

Conda and Snakedeploy (recommended)

Install dependencies with conda:

# Install snakemake, snakedeploy, and eido in a new conda environment
conda create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy eido
# Activate the environment
conda activate snakemake

Create a project directory for running the workflow:

mkdir -p mmseqs-cluster
cd mmseqs-cluster

Deploy a specific release (recommended):

# Check what releases are available (e.g. using the GitHub CLI or Subversion)
gh release list --repo leightonpayne/mmseqs-cluster
svn ls https://github.com/leightonpayne/mmseqs-cluster/tags/
snakedeploy deploy-workflow https://github.com/leightonpayne/mmseqs-cluster . --tag <RELEASE>

Deploy the development version (optional):

snakedeploy deploy-workflow https://github.com/leightonpayne/mmseqs-cluster . --branch master

Configuration

This workflow uses the PEP , or Portable Encapsulated Projects specification for defining input and recording metadata.

Read config/README.md for configuration instructions.

Usage

To run the workflow, navigate to the base directory and run the command:

snakemake --cores all --use-conda

Code Snippets

10
11
shell:
    "mmseqs createdb {input} {output} &> {log}"
31
32
33
34
35
36
37
38
39
shell:
    """
    export MMSEQS_FORCE_MERGE=1
    mkdir -p {params.tmpdir}
    mmseqs cluster {input} {output} {params.tmpdir} \
    --min-seq-id {params.min_seq_id} \
    -c {params.coverage} \
    --threads {params.threads} &> {log}
    """
52
53
54
55
56
shell:
    """
    mmseqs createtsv {input.database} {input.database} \
    {input.cluster_database} {output} &> {log}
    """
72
73
74
75
76
shell:
    """
    export MMSEQS_FORCE_MERGE=1
    mmseqs createseqfiledb {input.database} {input.cluster_database} {output.cluster_sequences_database} &> {log}
    """
11
12
wrapper:
    "https://github.com/leightonpayne/snakemake-wrappers/raw/master/seqkit/rmdup/wrapper.py"
ShowHide 4 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/leightonpayne/mmseqs-cluster
Name: mmseqs-cluster
Version: v0.0.9000
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...