🍄 Qiime2 ITS classifiers for the UNITE database

public public 1yr ago Version: v9.0-v25.07.2023-qiime2- 0 bookmarks

A pipeline to build Qiime2 taxonomy classifiers for the UNITE database .

Download a pre-trained classifier here! 🎁

Running Snakemake workflow

Set up:

  • Install Mambaforge and configure Bioconda .

  • Install the version of Qiime2 you want using the recomended environment name. (For a faster install, you can replace conda with mamba .)

  • Install Snakemake into an environment, then activate that environment.

Configure:

  • Open up config/config.yaml and configure it to your liking. (For example, you may need to update the name of your Qiime2 environment.)

Run:

snakemake --cores 8 --use-conda --resources mem_mb=10000

This takes about 15 hours on my machine

Run on a slurm cluster:

More specifically, The University of Florida HiPerGator supercomputer, with access generously provided by the Kawahara Lab !

screen # We connect to a random login node, so we may not be able...
screen -r # to reconnect with this later on.
snakemake --jobs 12 --slurm \
 --use-envmodules --rerun-incomplete --latency-wait 10 \
 --default-resources slurm_account=kawahara slurm_partition=hpg-milan

Reports:

snakemake --report results/report.html
snakemake --forceall --dag --dryrun | dot -Tpdf > results/dag.pdf

Code Snippets

40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
shell:
    """
    mkdir -p downloads

    # Version 9 update. Get DOIs from here: https://unite.ut.ee/repository.php
    # To get URLs you can download directly, plug them into this API:
    # https://api.plutof.ut.ee/v1/public/dois/?format=api&identifier=10.15156/BIO/2483915

    # 9.0	2023-07-18	Fungi	19 051	143 384	Current	https://doi.org/10.15156/BIO/2938079
    wget -qO- https://files.plutof.ut.ee/public/orig/FB/78/FB78E30E44793FB02E5A4D3AE18EB4A6621A2FAEB7A4E94421B8F7B65D46CA4A.tgz | \
      tar xz -C downloads --strip-components 1 # sh_qiime_release_25.07.2023.tgz      # normal

    # 9.0	2023-07-18	Fungi	19 051	187 443	Current	https://doi.org/10.15156/BIO/2938080
    wget -qO- https://files.plutof.ut.ee/public/orig/37/71/3771274B094D9CA6252DF01359756B60A2FBEEF87854CC01C2577182DBB123C7.tgz | \
      tar xz -C downloads --strip-components 1 # sh_qiime_release_s_25.07.2023.tgz    # add s for 97% singletons

    # 9.0	2023-07-18	All eukaryotes	19 451	215 454	Current	https://doi.org/10.15156/BIO/2938081
    wget -qO- https://files.plutof.ut.ee/public/orig/1C/C2/1CC2477429B3A703CC1C7A896A7EFF457BB0D471877CB8D18074959DBB630D10.tgz | \
      tar xz -C downloads --strip-components 1 # sh_qiime_release_all_25.07.2023.tgz  # add all for Euks

    # 9.0	2023-07-18	All eukaryotes	19 451	307 276	Current	https://doi.org/10.15156/BIO/2938082
    wget -qO- https://files.plutof.ut.ee/public/orig/7D/0C/7D0C329980D2C644CC157A8C76BBD11E78DB8B13286C98D4FEB6ECAC79D67D6F.tgz | \
      tar xz -C downloads --strip-components 1 # sh_qiime_release_s_all_25.07.2023.tgz # and s and all for 97% Euks singletons

    """
72
73
74
75
76
shell: "qiime tools import \
        --type FeatureData[Sequence] \
        --input-format MixedCaseDNAFASTAFormat \
        --input-path {input}/sh_refs_qiime_{wildcards.ver}_{wildcards.id}_{wildcards.type}{wildcards.date}_dev.fasta \
        --output-path {output}"
84
85
86
87
88
shell: "qiime tools import \
        --type FeatureData[Taxonomy] \
        --input-format HeaderlessTSVTaxonomyFormat \
        --input-path {input}/sh_taxonomy_qiime_{wildcards.ver}_{wildcards.id}_{wildcards.type}{wildcards.date}_dev.txt \
        --output-path {output}"
102
103
104
105
106
shell: "qiime feature-classifier fit-classifier-naive-bayes \
        --p-classify--chunk-size 10000 \
        --i-reference-reads    {input.ref} \
        --i-reference-taxonomy {input.tax} \
        --o-classifier {output}"
ShowHide 3 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/colinbrislawn/unite-train
Name: unite-train
Version: v9.0-v25.07.2023-qiime2-
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: BSD 3-Clause "New" or "Revised" License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...