Nextstrain build for mpox virus

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

This is the Nextstrain build for MPXV (mpox virus). Output from this build is visible at nextstrain.org/monkeypox . The lineages within the recent mpox outbreaks in humans are defined in a separate lineage-designation repository .

Usage

Provision input data

Input sequences and metadata can be retrieved from data.nextstrain.org

Note that these data are generously shared by many labs around the world. If you analyze and plan to publish using these data, please contact these labs first.

Within the analysis pipeline, these data are fetched from data.nextstrain.org and written to data/ with:

nextstrain build --docker . data/sequences.fasta data/metadata.tsv

Run analysis pipeline

Run pipeline to produce "overview" tree for /monkeypox/mpxv with:

nextstrain build --docker --cpus 1 . --configfile config/config_mpxv.yaml

Run pipeline to produce "outbreak" tree for /monkeypox/hmpxv1 with:

nextstrain build --docker --cpus 1 . --configfile config/config_hmpxv1.yaml

Adjust the number of CPUs to what your machine has available if you want to perform alignment and tree building a bit faster.

Deploying

Run the python script scripts/deploy.py to deploy the staging build to production.

This will also automatically create a dated build where each node has a unique (random) ID so it can be targeted in shared links/narratives.

python scripts/deploy.py --build-names hmpxv1 mpxv

If a dated build already exists it is not overwritten by default. To overwrite, pass -f .

To deploy a locally built build to staging, use the --staging flag.

To not deploy a dated build to production, add the --no-dated flag.

Visualize results

View results with:

nextstrain view auspice/

Configuration

Configuration takes place in config/config.yml by default. The analysis pipeline is contained in workflow/snakemake_rule/core.smk . This can be read top-to-bottom, each rule specifies its file inputs and output and pulls its parameters from config . There is little redirection and each rule should be able to be reasoned with on its own.

Data use

We gratefully acknowledge the authors, originating and submitting laboratories of the genetic sequences and metadata for sharing their work. Please note that although data generators have generously shared data in an open fashion, that does not mean there should be free license to publish on this data. Data generators should be cited where possible and collaborations should be sought in some circumstances. Please try to avoid scooping someone else's work. Reach out if uncertain.

Installation

Follow the standard installation instructions for Nextstrain's suite of software tools.

If you don't use the nextstrain CLI but a custom conda environment, make sure that you have tsv-utils and seqkit installed, e.g. using:

conda install -c bioconda tsv-utils seqkit

Nextstrain build vs Snakemake

The above commands use the Nextstrain CLI and nextstrain build along with Docker to run using Nextalign v2. Alternatively, if you install Nextalign/Nextclade v2 locally you can run the pipeline with:

snakemake -j 1 -p --configfile config/config_mpxv.yaml
snakemake -j 1 -p --configfile config/config_hmpxv1.yaml

Update colors to include new countries

Update colors_hmpxv1.tsv to group countries by region based on countries present in its metadata.tsv :

python3 scripts/update_colours.py --colors config/colors_hmpxv1.tsv \
 --metadata results/hmpxv1/metadata.tsv --output config/colors_hmpxv1.tsv

and similarly update colors_mpxv.tsv :

python3 scripts/update_colours.py --colors config/colors_mpxv.tsv \
 --metadata results/mpxv/metadata.tsv --output config/colors_mpxv.tsv

Update example data

Example data is used by CI . It can also be used as a small subset of real-world data.

Example data should be updated every time metadata schema is changed or a new clade/lineage emerges. To update, run:

nextstrain build --docker . update_example_data -F

Code Snippets

import argparse
import pdb
import pandas as pd

# Forced colours MUST NOT appear in the ordering TSV
forced_colors = {
}

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description="Assign colors based on ordering",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter
    )

    parser.add_argument('--ordering', type=str, required=True, help="input ordering file")
    parser.add_argument('--color-schemes', type=str, required=True, help="input color schemes file")
    parser.add_argument('--metadata', type=str, help="if provided, restrict colors to only those found in metadata")
    parser.add_argument('--output', type=str, required=True, help="output colors tsv")
    args = parser.parse_args()

    assignment = {}
    with open(args.ordering) as f:
        for line in f.readlines():
            array = line.lstrip().rstrip().split("\t")
            if len(array) == 2:
                name = array[0]
                trait = array[1]
                if name not in assignment:
                    assignment[name] = [trait]
                else:
                    assignment[name].append(trait)

    # if metadata supplied, go through and
    # 1. remove assignments that don't exist in metadata
    # 2. remove assignments that have 'focal' set to 'False' in metadata
    if args.metadata:
        metadata = pd.read_csv(args.metadata, delimiter='\t')
        for name, trait in assignment.items():
            # Items not to exclude if not (yet) present in metadata to solve bootstrapping issue
            if name in metadata and name not in ['clade_membership', 'outbreak', 'lineage']:
                subset_present = [x for x in assignment[name] if x in metadata[name].unique()]
                assignment[name] = subset_present
            if name in metadata and 'focal' in metadata:
                focal_list = metadata.loc[metadata['focal'] == True, name].unique()
                subset_focal = [x for x in assignment[name] if x in focal_list]
                assignment[name] = subset_focal

    schemes = {}
    counter = 0
    with open(args.color_schemes) as f:
        for line in f.readlines():
            counter += 1
            array = line.lstrip().rstrip().split("\t")
            schemes[counter] = array

    with open(args.output, 'w') as f:
        for trait_name, trait_array in assignment.items():
            if len(trait_array)==0:
                print(f"No traits found for {trait_name}")
                continue
            if len(schemes)<len(trait_array):
              print(f"WARNING: insufficient colours available for trait {trait_name} - reusing colours!")
              remain = len(trait_array)
              color_array = []
              while(remain>0):
                if (remain>len(schemes)):
                  color_array = [*color_array, *schemes[len(schemes)]]
                  remain -= len(schemes)
                else:
                  color_array = [*color_array, *schemes[remain]]
                  remain = 0
            else:
              color_array = schemes[len(trait_array)]
            extra_trait_values = list(forced_colors.get(trait_name, {}).keys())
            extra_color_values = list(forced_colors.get(trait_name, {}).values())

            zipped = list(zip(trait_array+extra_trait_values, color_array+extra_color_values))
            for trait_value, color in zipped:
                f.write(trait_name + "\t" + trait_value + "\t" + color + "\n")
            f.write("\n")

Python Pandas From line 1 of scripts/assign-colors.py

import json, argparse

if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="remove time info",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
    )
    parser.add_argument(
        "--input-node-data", type=str, required=True, help="input data"
    )
    parser.add_argument(
        "--output-node-data",
        type=str,
        metavar="JSON",
        required=True,
        help="output Auspice JSON",
    )
    args = parser.parse_args()
    with open(args.input_node_data) as fh:
        data = json.load(fh)
    new_node_data = {}
    for name, node in data["nodes"].items():
        old_clade_name = node["clade_membership"]
        outbreak_name = ""
        lineage_name = ""

        # if it starts with clade -> it's a clade
        # if it starts with outbreak -> it's outbreak, need to look up clade
        # if it starts with lineage -> it's clade IIb, outbreak hMPXV-1
        if old_clade_name.startswith("clade"):
            clade_name = old_clade_name.split()[1]
        # Need to set up clade dictionary for when we have other outbreaks
        # if old_clade_name.startswith('outbreak'):
        #     outbreak_name = old_clade_name.split()[1]
        #     clade_name = clade[outbreak_name]
        elif old_clade_name.startswith("outgroup"):
            clade_name = "outgroup"
        else:
            clade_name = "IIb"
            outbreak_name = "hMPXV-1"
            lineage_name = old_clade_name

        new_node_data[name] = {
            "clade_membership": clade_name,
            "outbreak": outbreak_name,
            "lineage": lineage_name,
        }
        if "clade_annotation" in node:
            new_node_data[name]["clade_annotation"] = node["clade_annotation"]
            if node["clade_annotation"] == "A":
                new_node_data[name]["clade_annotation"] = "hMPXV-1 A"

    data["nodes"] = new_node_data
    with open(args.output_node_data, "w") as fh:
        json.dump(data, fh)

Python From line 1 of scripts/clades_renaming.py

import argparse
from datetime import datetime
from augur.io import read_metadata
import json

## Script originally from https://github.com/nextstrain/ncov/blob/master/scripts/construct-recency-from-submission-date.py

def get_recency(date_str, ref_date):
    date_submitted = datetime.strptime(date_str, '%Y-%m-%d').toordinal()
    ref_day = ref_date.toordinal()

    delta_days = ref_day - date_submitted
    if delta_days<=0:
        return 'New'
    elif delta_days<3:
        return '1-2 days ago'
    elif delta_days<8:
        return '3-7 days ago'
    elif delta_days<15:
        return 'One week ago'
    elif delta_days<31:
        return 'One month ago'
    elif delta_days < 121:
        return '1-3 months ago'
    elif delta_days < 365:
        return '3-12 months ago'
    elif delta_days < 365*4:
        return '1-3 years ago'
    elif delta_days < 365*16:
        return '3-15 years ago'
    elif delta_days>=31:
        return 'Older than 15 years'

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description="Assign each sequence a field that specifies when it was added",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter
    )

    parser.add_argument('--metadata', type=str, required=True, help="metadata file")
    parser.add_argument('--metadata-id-columns', nargs="+", help="names of possible metadata columns containing identifier information, ordered by priority. Only one ID column will be inferred.")
    parser.add_argument('--output', type=str, required=True, help="output json")
    args = parser.parse_args()

    meta = read_metadata(args.metadata, id_columns=args.metadata_id_columns).to_dict(orient="index")

    node_data = {'nodes':{}}
    ref_date = datetime.now()

    for strain, d in meta.items():
        if 'date_submitted' in d and d['date_submitted'] and d['date_submitted'] != "undefined":
            node_data['nodes'][strain] = {'recency': get_recency(d['date_submitted'], ref_date)}

    with open(args.output, 'wt') as fh:
        json.dump(node_data, fh)

Python JSON Augur From line 1 of scripts/construct-recency-from-submission-date.py

from collections import defaultdict
import argparse
from treetime import TreeAnc
from Bio import Phylo

if __name__=="__main__":
    parser = argparse.ArgumentParser(
        description="remove time info",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter
    )

    parser.add_argument('--alignment', type=str, required=True, help="input sequences")
    parser.add_argument('--input-tree', type=str, required=True, help="input nwk")
    parser.add_argument('--root', type=str, required=False, help="root node")
    parser.add_argument('--output', type=str, required=True, help="output nwk")
    args = parser.parse_args()

    T = Phylo.read(args.input_tree, 'newick')

    if args.root:
        T.root_with_outgroup(args.root)
    else:
        T.root_at_midpoint()

    tt = TreeAnc(tree=T, aln=args.alignment, gtr='JC69')
    tt.optimize_tree(prune_short=True)

    # make list of mutations that are phylogenetically informative (not gaps of N)
    for n in T.find_clades():
        n.relevant_mutations = set()
        for mut in n.mutations:
            if (mut[0] in 'ACGT') and (mut[2] in 'ACGT'):
                n.relevant_mutations.add(mut)

    print(f"### Checking for immediate reversions\n")
    reversions = list()
    for clade in T.find_clades():
        for child in clade.clades:
            if child.is_terminal():
                continue
            for grandchild in child.clades:
                if grandchild.is_terminal():
                    continue
                # Check if one of grandchild mutation reverts one of child
                for mut_child in child.relevant_mutations:
                    for mut_grandchild in grandchild.relevant_mutations:
                        if mut_child[1] == mut_grandchild[1] and mut_child[2] == mut_grandchild[0]:
                            reversions.append(
                                {
                                    "parent": clade,
                                    "child": child,
                                    "grandchild": grandchild,
                                    "mut_child": mut_child,
                                    "mut_grandchild": mut_grandchild
                                }
                            )
                            print(f"Below {clade}: {mut_child} in {child.name} reverted in {grandchild.name}")

    for reversion in reversions:
        # Remove reversion from grandchild
        reversion["grandchild"].relevant_mutations.remove(reversion["mut_grandchild"])
        # Remove grandchild from child
        reversion["child"].clades.remove(reversion["grandchild"])
        # If there are mutations, add grandchild as child of parent
        if reversion["grandchild"].relevant_mutations != reversion["parent"].relevant_mutations:
            reversion["parent"].clades.append(reversion["grandchild"])
        else:
            # Otherwise add grandchild clades to parent
            reversion["parent"].clades.extend(reversion["grandchild"].clades)

    # find mutations that occur multiple times in branches leading to children of a node.
    # use these mutations to group clades to merge later.
    max_iter = 5
    for ii in range(max_iter):
        print(f"###\nIteration: {ii+1}\n")
        nodes_to_merge = defaultdict(list)
        for n in T.get_nonterminals():
            shared_mutations = defaultdict(list)
            for c in n:
                for mut in c.relevant_mutations:
                    shared_mutations[mut].append(c)

            for mut in shared_mutations:
                if len(shared_mutations[mut])>1:
                    nodes_to_merge[(n,tuple(shared_mutations[mut]))].append(mut)

        if len(nodes_to_merge)==0:
            print("No more shared mutations -- breaking out of loop.")
            break

        already_touched = set()
        for (parent, children), mutations in sorted(nodes_to_merge.items(), key=lambda x:len(x[1]), reverse=True):
            if any([c in already_touched for c in children]):
                continue

            print("####\nmerging clades:\n\t", '\n\t'.join([f"{c.name} with mutations {c.relevant_mutations} and {c.count_terminals()} tips" for c in children]))
            print("shared mutations:", mutations)
            print("\n")

            parent.clades = [c for c in parent if c not in children]
            new_clade = Phylo.BaseTree.Clade(branch_length=tt.one_mutation*len(mutations))
            new_clade.relevant_mutations = set(mutations)
            for c in children:
                left_over_mutations = c.relevant_mutations.difference(mutations)
                if len(left_over_mutations):
                    c.relevant_mutations = left_over_mutations
                    c.branch_length = tt.one_mutation*len(c.relevant_mutations)
                    new_clade.clades.append(c)
                else:
                    new_clade.clades.extend(c.clades)
                already_touched.add(c)

            parent.clades.append(new_clade)

    Phylo.write(T, args.output, 'newick')

Python Biopython treetime TreeTime From line 1 of scripts/fix_tree.py

from collections import defaultdict
import json, argparse




if __name__=="__main__":
    parser = argparse.ArgumentParser(
        description="calculate mutation context json",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter
    )

    parser.add_argument('--tree', type=str, required=True, help="tree file")
    parser.add_argument('--mutations', type=str, required=True, help="mutations")
    parser.add_argument('--output', type=str, metavar="JSON", required=True, help="output Auspice JSON")
    args = parser.parse_args()


    with open(args.mutations) as fh:
        data = json.load(fh)['nodes']


    terminal_muts = defaultdict(lambda: defaultdict(int))
    internal_muts = defaultdict(lambda: defaultdict(int))

    node_data = {}

    for name, node in data.items():
        GA_count = 0
        CT_count = 0
        total_muts = 0
        for mut in node["muts"]:
            a, pos, d = mut[0], int(mut[1:-1]), mut[-1]
            if a in 'ACGT' and d in 'ACGT':
                total_muts += 1
                if a+d == 'GA':
                    GA_count += 1
                elif a+d == 'CT':
                    CT_count += 1
        GA_CT_count = GA_count + CT_count
        if total_muts:
            node_data[name] = {"GA_CT_fraction": GA_CT_count/total_muts}
        else:
            node_data[name] = {"GA_CT_fraction": None }


        dinuc_count = 0
        if GA_CT_count:
            #node_data[name]["CT_fraction"] = CT_count/GA_CT_count
            for mut in node["muts"]:
                a, pos, d = mut[0], int(mut[1:-1]), mut[-1]
                if a in 'ACGT' and d in 'ACGT':
                    if a+d == 'GA' and node['sequence'][pos]=='A':
                        dinuc_count+=1
                    elif a+d == 'CT' and node['sequence'][pos-2]=='T':
                        dinuc_count+=1
            node_data[name]["dinuc_context_fraction"] = dinuc_count/GA_CT_count
        else:
            node_data[name]["dinuc_context_fraction"] = None
            #node_data[name]["CT_fraction"] = None

    with open(args.output, 'w') as fh:
        json.dump({"nodes":node_data}, fh)

Python JSON From line 1 of scripts/mutation_context.py

import argparse
import json
from collections import defaultdict


def sample_date(node):
    """
    Returns the sample date in numeric form.
    In the future, we could examine the 'raw_date' attr here to decide whether to ignore
    some sequences, as 'numdate' is the inferred (timetree) date which can hide
    uncertainty in actual sampling date,
    """
    if "raw_date" not in node: # internal node or tip with no date info
        return
    return node['numdate']



if __name__=="__main__":
    parser = argparse.ArgumentParser(
        description="remove time info",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter
    )

    parser.add_argument('--input-node-data', type=str, required=True, help="input data")
    parser.add_argument('--output-node-data', type=str, metavar="JSON", required=True, help="output Auspice JSON")
    args = parser.parse_args()


    with open(args.input_node_data) as fh:
        data = json.load(fh)

    new_node_data = {}

    for name, node in data["nodes"].items():
        try:
            new_node_data[name] = {
                "mutation_length": node["mutation_length"],
                "branch_length": node["branch_length"]
            }
            sdate = sample_date(node)
            if sdate:
                new_node_data[name]["sample_date"] = sdate
        except KeyError:
            # internal node or tip with no date info
            pass

    data["nodes"] = new_node_data
    with open(args.output_node_data, 'w') as fh:
        json.dump(data, fh)

Python JSON From line 1 of scripts/remove_timeinfo.py

import pandas as pd
import argparse
from Bio import SeqIO

if __name__=="__main__":
    parser = argparse.ArgumentParser(
        description="Reverse-complement reverse-complemented sequence",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter
    )

    parser.add_argument('--metadata', type=str, required=True, help="input metadata")
    parser.add_argument('--sequences', type=str, required=True, help="input sequences")
    parser.add_argument('--output', type=str, required=True, help="output sequences")
    args = parser.parse_args()

    metadata = pd.read_csv(args.metadata, sep='\t')

    # Read in fasta file
    with open(args.sequences, 'r') as f_in:
        with open(args.output, 'w') as f_out:
            for seq in SeqIO.parse(f_in, 'fasta'):
                # Check if metadata['reverse'] is True
                try:
                    if metadata.loc[metadata['strain'] == seq.id, 'reverse'].values[0] == True:
                        # Reverse-complement sequence
                        seq.seq = seq.seq.reverse_complement()
                        print("Reverse-complementing sequence:", seq.id)
                except:
                    print("No reverse complement for:", seq.id)

                # Write sequences to file
                SeqIO.write(seq, f_out, 'fasta')

Python Pandas Biopython From line 1 of scripts/reverse_reversed_sequences.py

import pandas as pd
import json, argparse
from augur.io import read_metadata

def replace_name_recursive(node, lookup):
    if node["name"] in lookup:
        node["name"] = lookup[node["name"]]

    if "children" in node:
        for child in node["children"]:
            replace_name_recursive(child, lookup)

if __name__=="__main__":
    parser = argparse.ArgumentParser(
        description="Swaps out the strain names in the Auspice JSON with the final strain name",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter
    )

    parser.add_argument('--input-auspice-json', type=str, required=True, help="input auspice_json")
    parser.add_argument('--metadata', type=str, required=True, help="input data")
    parser.add_argument('--metadata-id-columns', nargs="+", help="names of possible metadata columns containing identifier information, ordered by priority. Only one ID column will be inferred.")
    parser.add_argument('--display-strain-name', type=str, required=True, help="field to use as strain name in auspice")
    parser.add_argument('--output', type=str, metavar="JSON", required=True, help="output Auspice JSON")
    args = parser.parse_args()

    metadata = read_metadata(args.metadata, id_columns=args.metadata_id_columns)
    name_lookup = {}
    for ri, row in metadata.iterrows():
        strain_id = row.name
        name_lookup[strain_id] = args.display_strain_name if pd.isna(row[args.display_strain_name]) else row[args.display_strain_name]

    with open(args.input_auspice_json, 'r') as fh:
        data = json.load(fh)

    replace_name_recursive(data['tree'], name_lookup)

    with open(args.output, 'w') as fh:
        json.dump(data, fh)

Python Pandas Augur From line 1 of scripts/set_final_strain_name.py

shell:
    """
    cp {input.auspice_json} {output.auspice_json}
    cp {input.root_sequence} {output.root_sequence_json}
    """

SnakeMake From line 31 of master/Snakefile

shell:
    "rm -rfv {params}"

SnakeMake From line 62 of master/Snakefile

shell:
    "rm -rfv {params}"

SnakeMake From line 73 of master/Snakefile

shell:
    """
    augur filter \
        --metadata {input.metadata} \
        --metadata-id-columns {params.strain_id} \
        --sequences {input.sequences} \
        --include-where strain=MK783032 strain=MK783030 \
        --group-by clade lineage \
        --subsample-max-sequences 50 \
        --subsample-seed 0 \
        --output-metadata {output.metadata} \
        --output-sequences {output.sequences}
    """

SnakeMake Augur From line 19 of snakemake_rules/chores.smk

shell:
    """
    augur filter \
        --sequences {input.sequences} \
        --metadata {input.metadata} \
        --metadata-id-columns {params.strain_id} \
        --exclude {input.exclude} \
        --output-sequences {output.sequences} \
        --output-metadata {output.metadata} \
        --min-date {params.min_date} \
        --min-length {params.min_length} \
        --exclude-where QC_rare_mutations=bad \
        --output-log {output.log}
    """

SnakeMake Augur From line 36 of snakemake_rules/core.smk

shell:
    """
    augur filter \
        --sequences {input.sequences} \
        --metadata {input.metadata} \
        --metadata-id-columns {params.strain_id} \
        --output-strains {output.strains} \
        {params.group_by} \
        {params.sequences_per_group} \
        {params.exclude} \
        {params.other_filters} \
        --output-log {output.log}
    """

SnakeMake Augur From line 67 of snakemake_rules/core.smk

shell:
    """
    augur filter \
        --metadata-id-columns {params.strain_id} \
        --sequences {input.sequences} \
        --metadata {input.metadata} \
        --exclude-all \
        --include {input.strains} {input.include}\
        --output-sequences {output.sequences} \
        --output-metadata {output.metadata}
    """

SnakeMake Augur From line 96 of snakemake_rules/core.smk

shell:
    """
    python3 scripts/reverse_reversed_sequences.py \
        --metadata {input.metadata} \
        --sequences {input.sequences} \
        --output {output}
    """

SnakeMake From line 115 of snakemake_rules/core.smk

shell:
    """
    nextalign run \
        --jobs {threads} \
        --reference {input.reference} \
        --genemap {input.genemap} \
        --max-indel {params.max_indel} \
        --seed-spacing {params.seed_spacing} \
        --retry-reverse-complement \
        --output-fasta - \
        --output-insertions {output.insertions} \
        {input.sequences} | seqkit seq -i > {output.alignment}
    """

SnakeMake seqkit nextclade From line 141 of snakemake_rules/core.smk

shell:
    """
    augur mask \
        --sequences {input.sequences} \
        --mask {input.mask} \
        --mask-from-beginning {params.from_start} \
        --mask-from-end {params.from_end} --output {output}
    """

SnakeMake Augur From line 171 of snakemake_rules/core.smk

shell:
    """
    augur tree \
        --alignment {input.alignment} \
        --exclude-sites {input.tree_mask} \
        --tree-builder-args="-redo" \
        --output {output.tree} \
        --nthreads {threads}
    """

SnakeMake Augur From line 190 of snakemake_rules/core.smk

shell:
    """
    python3 scripts/fix_tree.py \
        --alignment {input.alignment} \
        --input-tree {input.tree} \
        {params.root} \
        --output {output.tree}
    """

SnakeMake From line 211 of snakemake_rules/core.smk

shell:
    """
    augur refine \
        --tree {input.tree} \
        --alignment {input.alignment} \
        --metadata {input.metadata} \
        --metadata-id-columns {params.strain_id} \
        --output-tree {output.tree} \
        --timetree \
        --root {params.root} \
        --precision 3 \
        --keep-polytomies \
        {params.clock_rate} \
        {params.clock_std_dev} \
        --output-node-data {output.node_data} \
        --coalescent {params.coalescent} \
        --date-inference {params.date_inference} \
        --date-confidence \
        --clock-filter-iqd {params.clock_filter_iqd}
    """

SnakeMake Augur From line 251 of snakemake_rules/core.smk

shell:
    """
    augur ancestral \
        --tree {input.tree} \
        --alignment {input.alignment} \
        --output-node-data {output.node_data} \
        --inference {params.inference}
    """

SnakeMake Augur From line 283 of snakemake_rules/core.smk

shell:
    """
    augur translate \
        --tree {input.tree} \
        --ancestral-sequences {input.node_data} \
        --reference-sequence {input.genemap} \
        --output {output.node_data}
    """

SnakeMake Augur From line 302 of snakemake_rules/core.smk

shell:
    """
    augur traits \
        --tree {input.tree} \
        --metadata {input.metadata} \
        --metadata-id-columns {params.strain_id} \
        --output {output.node_data} \
        --columns {params.columns} \
        --confidence \
        --sampling-bias-correction {params.sampling_bias_correction}
    """

SnakeMake Augur From line 327 of snakemake_rules/core.smk

shell:
    """
    augur clades \
        --tree {input.tree} \
        --mutations {input.nuc_muts} {input.aa_muts} \
        --clades {input.clades} \
        --output-node-data {output.node_data} 2>&1 | tee {log}
    """

SnakeMake Augur From line 352 of snakemake_rules/core.smk

shell:
    """
    python scripts/clades_renaming.py \
    --input-node-data {input} \
    --output-node-data {output.node_data}
    """

SnakeMake From line 367 of snakemake_rules/core.smk

shell:
    """
    python3 scripts/mutation_context.py \
        --tree {input.tree} \
        --mutations {input.node_data} \
        --output {output.node_data}
    """

SnakeMake From line 381 of snakemake_rules/core.smk

shell:
    """
    python3 scripts/remove_timeinfo.py --input-node-data {input} --output-node-data {output}
    """

SnakeMake From line 395 of snakemake_rules/core.smk

shell:
    """
    python3 scripts/construct-recency-from-submission-date.py \
        --metadata {input.metadata} \
        --metadata-id-columns {params.strain_id} \
        --output {output} 2>&1
    """

SnakeMake From line 410 of snakemake_rules/core.smk

shell:
    """
    python3 scripts/assign-colors.py \
        --ordering {input.ordering} \
        --color-schemes {input.color_schemes} \
        --output {output.colors} \
        --metadata {input.metadata} 2>&1
    """

SnakeMake From line 426 of snakemake_rules/core.smk

shell:
    """
    augur export v2 \
        --tree {input.tree} \
        --metadata {input.metadata} \
        --metadata-id-columns {params.strain_id} \
        --node-data {input.branch_lengths} {input.nt_muts} {input.aa_muts} {input.mutation_context} {input.clades} {input.recency}\
        --colors {input.colors} \
        --lat-longs {input.lat_longs} \
        --description {input.description} \
        --auspice-config {input.auspice_config} \
        --include-root-sequence \
        --output {output.auspice_json}
    """

SnakeMake Augur From line 460 of snakemake_rules/core.smk

shell:
    """
    python3 scripts/set_final_strain_name.py --metadata {input.metadata} \
            --metadata-id-columns {params.strain_id} \
            --input-auspice-json {input.auspice_json} \
            --display-strain-name {params.display_strain_field} \
            --output {output.auspice_json}
    cp {input.root_sequence} {output.root_sequence}
    """

SnakeMake From line 487 of snakemake_rules/core.smk

shell:
    """
    curl https://mpox-lapis.genspectrum.org/v1/sample/fasta --output {output.sequences}
    """

SnakeMake From line 4 of snakemake_rules/download_via_lapis.smk

shell:
    """
    curl https://mpox-lapis.genspectrum.org/v1/sample/details?dataFormat=csv | \
        tr -d "\r" |
        sed -E 's/("([^"]*)")?,/\\2\\t/g' > {output.metadata}
    """

SnakeMake From line 13 of snakemake_rules/download_via_lapis.smk

shell:
    """
    nextstrain remote upload {params.deploy_url} {input}
    """

SnakeMake nextclade From line 24 of snakemake_rules/nextstrain_automation.smk

shell:
    """
    ./bin/notify-on-deploy {params.deploy_url} {params.slack_ts}
    """

SnakeMake From line 38 of snakemake_rules/nextstrain_automation.smk

shell:
    """
    curl -fsSL --compressed {params.sequences_url:q} --output {output.sequences}
    curl -fsSL --compressed {params.metadata_url:q} --output {output.metadata}
    """

SnakeMake From line 10 of snakemake_rules/prepare.smk

shell:
    """
    gzip --decompress --keep {input.metadata}
    xz --decompress --keep {input.sequences}
    """

SnakeMake From line 26 of snakemake_rules/prepare.smk

ShowHide 34 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://nextstrain.org/monkeypox/hmpxv1

Name: monkeypox

Version: 1

Badge:

Insert copied code into your website to add a link to this workflow.

License: MIT License

Keywords:

JSON nextclade treetime Augur Biopython Pandas seqkit Snakemake TreeTime

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free