Tools and utilities for making use of mastiff

public public 1yr ago 0 bookmarks

"That's a beautiful collection of public data you have here, it would be a shame if someone made it searchable." -- paraphrased

This repo provides examples that use sourmash to build FracMinHash sketches and search ~485,000 public metagenomes in the SRA with them in real time, using mastiff .

Quickstart - Jupyter Notebook

Click on the binder button below, and then select "Run... Run all cells", or hit the fast-forward button.

Quickstart - snakemake

Deposit sequences of interest in sequences/ . Then run:

snakemake -j 1 --use-conda

and look in mastiff_out/ .

sourmash logo

Code Snippets

15
16
17
18
shell: """
    sourmash sketch dna -p k=21,noabund,scaled=1000 {input} \
         --name-from-first -o {output}
"""
SnakeMake From line 15 of main/Snakefile
25
26
27
28
shell: """
    curl -H "Content-Type: application/json" --data-binary \
          @{input} https://mastiff.sourmash.bio/search -o {output}
"""
ShowHide 1 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/sourmash-bio/2022-search-sra-with-mastiff
Name: 2022-search-sra-with-mastiff
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: BSD 3-Clause "New" or "Revised" License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...