Microbiome Analysis Workflow based on a qiime2

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

This repository contains the analysis workflow for Debelius et al (doi: 10.1101/2021.03.23.436606 ).

Installation

The analysis was done based on a qiime2-2020.11 environment with RESRIPt and q2-sidle installed.

To build this environment, install qiime2-202011 according to your system requirements. Then, follow the q2-sidle installation instructions. You will also need snakemake > 5.3.

conda install --channel bioconda snakemake=5.3

Finally, you will need to clone and download the repository in whatever way works best for you.

Input Files

Simulation

To run the simulation, you will need to download the 100nt OTU table (ID 45113, CRC32: 7066cf83) and metadata from Qiita Study 850 . The OTU table should be imported into QIIME 2 using the command,

qiime tools import \
 --type 'FeatureTable[Frequency]' \
 --source-fomat BIOMV210Format \
 --input-path 45113_otu_table.biom \
 --output-path start-table.qza

And placed in the folder specified as the config file ( input_dir under simulation ). (By default, this folder is ipynb/data/inputs/simulation ).

The metadata should also be downloaded from the qiita study (found under the sample information tab), and renamed to start-metadata.txt and moved to the simulation input folder ipynb/data/inputs/simulation .

Real Data

Real data can be downloaded from ENA study PRJEB37382 , accessions ERR4704801-ERR4704848. The should be placed in the location specified in the config file. By default, they are expected in the ipynb/data/inputs/real/seqs directory. Metadata is inferred from the filenames.

Database Files

The analysis is currently based on the Silva 128 QIIME release, available here . The files should be imported into QIIME 2 and saved to the ipynb/data/reference/silva-ori/ folder. The SEPP reference tree can be downloaded from the qiime2 resources page . The formatted Optivag and Greengenes databases are already in the repository.

Running the workflow

You can be through easily through snakemake based on the current repository set up.

snakemake

Workflow steps

Simulation

Reference Simulation . The greengenes database is used to filter the reference OTU table to exclude sequences with more than 5 degenerates, and then a reference dataset is simulated from the original data by averaging abundance across multiple individuals of the same age group.
- filter_degenerete_reference
- filter_ref_table
- simulate_reference_samples
Regional Simulation . Regions of the 16s gene are extracted and combined with the full length reference table to build a regional "ASV" table and representative sequence set. These are the starting points for our three test methods.
- extract_regions
- generate_asvs_and_derep_tables
Database Preparation . The database is prepared by filtering to remove sequences with a large number of degenerate nucleotides from the reference and database sequences which do not have class-level annotations. These are used to generate per-region
- filter_silva_reference
- extract_sidle_regions
- prepare_extracted_region
Reference Reconstruction . The simulated reference table, the greengenes 13_8 taxonomy and the greengenes 13_8 tree are copied from their respective locations.
- copy_sim_reference
OTU Clustering . The table is generated by merging all ASV representative sequences and the feature tables from all the individual regions and then clustering them closed reference at 99% identity against the Silva 128 reference database
- concatenate_single_asv_counts
- concatenate_single_rep_seqs
- cluster_otus
- get_otu_tree
- get_otu_taxonomy
ASV denosing . The table is generated by merging all ASV counts. Taxonomy comes from naive bayesian classification of all sequences; sequences which don't have phylum level annotation are excluded. The tree is built by fragment insertion of the merged sequences
- concatenate_single_asv_counts
- concatenate_single_rep_seqs
- classify_taxonomy
- get_asv_rep_seq
- get_asv_table
- build_insertion_tree
Sidle . For Sidle, the regional sequences are aligned to the regional kmers. The alignments, regional database maps, and feature counts are used to reconstruct the table. Taxonomy is reconstructed using the database map to identify unresolved sequences. To build the phylogenetic tree, we generate consensus fragments and then insert them into them in the Silva 128 backbone.
- align_sidle_regions
- reconstruct_table_single
- reconstruct_taxonomy
- reconstruct_fragments_single
- reconstructed_fragment_insertion
Alpha Diversity . Within sample (alpha) diversity is calculated from multiple rarefaction. We look at observed features, Shannon, Pielou's evenness and faith's diversity. The results are summarized using the Table1-CompareAlpha.ipynb notebook. This generated Table 1.
- rarefy_iteration
- alpha_iter
- alpha_table
Beta Diversity . The between sample (beta) diversity is also calculated from rarified tables, and then, we look at three feature-based metrics (unweighted UniFrac, weighted UniFrac, and Bray-Curtis) as well as a collapsed metric (genus level Bray-Curtis). The results are summarized in the ipynb/Table2-CompareBeta.ipynb notebook. This generates Table 2.
- rarefy_iteration
- genus_braycurtis
- beta_phylogenetic
- braycurtis
- adonis
- beta_table
Taxonomic Comparison . We compared the taxonomy at class level using the merged, unnormalized tables through the ipynb/Figure1-Taxonomy.ipynb . This generates Figure 2 and Table S3.
- compare_taxonomy

Real Data Analysis

Data Preparation . We import the sequences into QIIME 2, and then denoise them using dada2 before denoising the sequences.
- build_manifest
- import_seqs
- trim_data_v13
- denoise_v13
- trim_data_v34
- denoise_v34
- trim_posthoc
Database Preparation . For benchmarking the greengenes database needs to be prepared ( filter_greengenes_reference ), otherwise, the Optivag database is assumed to be tidied. Then, we extract the regions and prepare them from sidle alignment.
- filter_greengenes_reference
- extract_sidle_regions
- prepare_extracted_region
Reference Reconstruction . The V34 region alone was used as the reference for the simulation. We used the paired end table, classified the taxonomy using a naive bayesian classifier for the full length sequences, and then filtered the table to remove anything missing a phylum level or kingdom level designation.
- classify_regional_taxonomy
- get_real_reference_data
- get_real_representative_seqs
OTU Clustering . The table is generated by merging all ASV representative sequences and the feature tables from all the individual regions and then clustering them closed reference at 99% identity
- concatenate_single_asv_counts
- concatenate_single_rep_seqs
- cluster_otus
- get_otu_taxonomy
ASV denosing . The table is generated by merging all ASV counts. Taxonomy comes from naive bayesian classification of all sequences; sequences which don't have phylum level annotation are excluded. The tree is built by fragment insertion of the merged sequences
- concatenate_single_asv_counts
- concatenate_single_rep_seqs
- classify_taxonomy
- get_asv_rep_seq
- get_asv_table
Sidle . For Sidle, the regional sequences are aligned to the regional kmers. The alignments, regional database maps, and feature counts are used to reconstruct the table. Taxonomy is reconstructed using the database map to identify unresolved sequences.
- align_sidle_regions
- reconstruct_table_single
- reconstruct_taxonomy
Data Synthesis . The taxonomic annotation was checked and the tables evaluated using the ipynb/Figure2-RealData.ipynb notebook.
- real_data_figure

Benchmarking

The snake file contains a section dedicated to preparing the sidle reads, and it will output a table. This will generate a series of benchmarking files, in the benchmark folder. The Matlab folder contains the modified m scripts to run with SMURF.

Code Snippets

suppressMessages(library("argparser"))
suppressMessages(library("vegan"))


# Sets up an argparser to make this tractable
parser <- arg_parser(
  description="Basic Adonis Functionality for age and set"
)
parser <- add_argument(parser,
  '--distance-matrix', 
  help='the distance matrix',
)
parser <- add_argument(parser,
  '--metadata', 
  help=('Path to the metadata file'),
)
parser <- add_argument(parser,
  '--output',
  help=('The location for the summary file')
)

args <- parse_args(parser)

# reads in the distance matrix as a dataframe and sets the rownames as the first column
df <- read.csv2(args$distance_matrix, sep='\t', row.names='X')
map <- read.csv2(args$metadata, sep='\t', row.names='sample.id')
map <- map[row.names(df),]

dm  <- dist(df[row.names(df), row.names(df)])

age <- adonis2(dm ~ age, data=map)

res <- data.frame(
  age = c(age$R2[1], age$`Pr(>F)`[1], age$F[1]),
  row.names = c('R2', 'p_999', 'F')
  )

write.table(res, file=args$output, quote=FALSE, sep='\t', col.names = NA)

R argparser vegan From line 4 of bin/adonis_single.R

description = """
This script will build a set of synthetic samples based on resampling of the
american gut data. We'll do this by reading in the biom table and then 
randomly selecting a subset of samples using non ambigious samples. 

The samples will then be split into two approximately equal sized groups and 
used to generate a set of normalized, simulated samples
"""

import argparse
import os

import biom
import numpy as np
import pandas as pd

from qiime2 import Artifact


def subsample_data(metadata, table, age_split_fun, id_prefixes,
                   colors=None,
                   group_size=10, 
                   n_samples=20, seed=1234):
    """
    Generates a set of synthetic sequences based on the American Gut using
    seperation between people in their 20s and their 60s

    Parameters
    ----------
    keep_map : DataFrame
        A map of the samples with the age encoded in `age_cat` and the 
        index as the sample number
    samples : biom.Table
        The samples used for simulation
    age_split_fun: function
    group_size : int
        The number of samples to be combined to make the simulated sample
    split_size : int
        The number of samples to use in each set. (This is used to adjust)
        for uneven group sizes in the two age groups
    n_samples : int
        The number of samples to simulate for each age group

    Return 
    ------
    biom.Table
        All of the synthesized samples

    """
    # Sets a seed for consistency
    np.random.seed(seed)

    # Gets the age catgories 
    metadata['age_cat'] = metadata['age'].apply(age_split_fun)
    metadata.dropna(subset=['age_cat'], inplace=True)

    table.filter(metadata.index, axis='sample', inplace=True)
    table.add_metadata(metadata[['age_cat']].to_dict(orient='index'), axis='sample')

    print(id_prefixes)

    tables = [
        table.copy().filter(lambda v, id_, md: md['age_cat'] == prefix,
                            axis='sample', inplace=False)
        for prefix in id_prefixes
    ]

    synthetic = [
        tidy_names(pd.concat(axis=1, objs=[
            build_subsample(table, group_size, prefix) 
            for i in range(n_samples)]), 
        prefix)
        for prefix, table in zip(*(id_prefixes, tables))
    ]
    print(synthetic[0].head())

    all_samples = pd.concat(axis=1, sort=False, objs=synthetic)
    all_samples.fillna(0, inplace=True)

    all_biom = biom.Table(data=all_samples.values, 
                          sample_ids=all_samples.columns,
                          observation_ids=all_samples.index
                          )
    metadata = pd.DataFrame.from_dict(orient='index', data={
        id_: {'set': id_.split('.')[1],
              'age': id_.split('.')[2],
              }
        for id_ in all_biom.ids(axis='sample')
        })
    metadata.index.set_names('sample-id', inplace=True)
    metadata['composite'] = metadata.apply(lambda x: "{age}-{set}".format(**x),
                                            axis=1)
    if colors is not None:
        metadata['color'] = metadata['composite'].replace(colors)

    return all_biom, metadata

def tidy_names(table, prefix):
    table.rename(inplace=True, columns={
        int(i): f"sample.1.{prefix}.{i}" for i in table.columns
        })
    return table

def build_subsample(table, sample_size, group):
    """
    Builds a subsampled table from a biom object
    """
    sub_t = table.copy().subsample(sample_size, axis='sample', by_id=True)
    sub_t.filter(lambda v, id_, md: (v > 0).sum() > np.round(0.4 * sample_size), 
                 axis='observation', 
                 inplace=True)
    single = sub_t.copy().collapse(lambda id_, md: md['age_cat'], 
                                   axis='sample')
    single.norm(axis='sample', inplace=True)
    single.filter(lambda v, id_, md: (v > 2e-5).all(), 
                  axis='observation', 
                  inplace=True)
    single.norm(axis='sample', inplace=True)
    return pd.Series(np.round(single.data(group) * 5e5).round(0),
                     index=single.ids(axis='observation'))


parser = argparse.ArgumentParser(description=description)
parser.add_argument(
    '--keep-map',
    help=("The map of the samples to be used for simulation. This must "
          "contain  a sample identifier labeled `sample_name` and the age"
          "labeled age_cat"
          ),
    )
parser.add_argument(
    '--sample-table',
    help=("A biom table containing sample counts. The sample list should "
          "match those present in `--keep-map`"),
    )
parser.add_argument(
    '--simulated-table',
    help=('Filepath to the qiime artifact containing all the samples'),
    )
parser.add_argument(
    '--simulated-metadata',
    help=('Filepath to the qiime artifact containing all the samples'),
    )
parser.add_argument(
    '--samples-into-simulation',
    help=("The number of samples to combine into each simulated sample"),
    default=10,
    type=int,
    )
parser.add_argument(
    '--group-size',
    help=("The number of samples in each group to simulate for the set."),
    default=20,
    type=int,
    )
parser.add_argument(
    '--seed',
    help=('a random seed'),
    default=1234,
    type=int,
    )

if __name__ == '__main__':
    args = parser.parse_args()

    # Loads group
    keep_map = pd.read_csv(args.keep_map, sep='\t', dtype=str)   
    keep_map.set_index('sample_name', inplace=True)
    keep_map['age'] = keep_map['age'].astype(float)

    # Loads table
    sample_table = Artifact.load(args.sample_table).view(biom.Table)

    def cat_age(x):
        if pd.isnull(x):
            return x
        if x <= 3:
            return 'infant'
        elif (3 < x) & (x <= 12):
            return 'child'
        elif (20 < x) & (x <= 60):
            return 'adult'
        else:
            return np.nan

    colors = {'infant-1': '#1f78b4', 'infant-2': '#a6cee3', 
              'adult-1': '#e31a1c', 'adult-2': '#fb9a99'}


    all_biom, metadata = \
        subsample_data(metadata=keep_map.copy(), 
                       table=sample_table, 
                       age_split_fun=cat_age, 
                       id_prefixes=['infant', 'adult'],
                       group_size=args.samples_into_simulation,
                       n_samples=args.group_size,
                       seed=args.seed,
                       )
    all_table = Artifact.import_data('FeatureTable[Frequency]', all_biom)
    all_table.save(args.simulated_table)

    metadata.to_csv(args.simulated_metadata, sep='\t')

Python Pandas numpy QIIME2.0 From line 3 of bin/simulate_samples.py

description = """
Simulates the an ASV table and dereplicated sequences for OTU clustering
and meta analysis exploration
"""


import argparse
import hashlib

import biom
import numpy as np
import pandas as pd
import regex
import skbio
from skbio import DNA

from qiime2 import Artifact


def build_regional_table(samples, seqs, read_length):
    """
    Amplifies a regional table from the original table

    Parameters
    ----------
    samples : DataFrame
        A biom-style dataframe where the index is sequences (corresponding)
        to the sequences in seqs

    """
    # Filters the sequences and table down to those present in the sequences
    degen_count = seqs.astype(str).apply(
        lambda x: len(regex.findall('[RYSWKMBDHVN]', str(x)))
        )
    len_check = seqs.astype(str).apply(lambda x: (len(x) >= np.absolute(read_length)))
    print(len_check.sum())
    degen_keep = degen_count.index[(degen_count <= 10) & len_check].values
    var_ids = [id_ for id_ in samples.index if (id_ in degen_keep)]
    samples = samples.loc[var_ids]

    table_seqs = seqs[var_ids].copy()

    # Expands the degenerate sequences
    exp_ = {id_: np.array([str(s) for s in seq_.expand_degenerates()])
            for id_, seq_ in table_seqs.iteritems()}

    # Generates the table by expanding the sequence where degenerates are
    # randomnly sampled and then combining the sequences into a table
    forward_reads = dict([])
    reverse_reads = dict([])
    joined_reads = dict([])
    for sample in samples.columns:
        expanded = pd.Series(np.hstack([
                np.random.choice(exp_[id_], size=count, replace=True)
                for id_, count in 
                samples.loc[samples[sample] > 0, sample].iteritems()])
        ).apply(lambda x: x[:read_length])
        joined_reads[sample] = expanded.value_counts()

    joined_table = pd.DataFrame.from_dict(orient='index', data=joined_reads).fillna(0)
    joined_table = joined_table.loc[joined_table.sum(axis=1) > 10]
    joined_reads = pd.Series({
        hash_seq(seq): DNA(seq) 
        for seq in joined_table.columns
    })
    joined_table.rename(columns={seq: hash_seq(seq) for seq in joined_table.columns},
                        inplace=True)
    jointable = Artifact.import_data('FeatureTable[Frequency]', joined_table, pd.DataFrame)
    joinseqs = Artifact.import_data('FeatureData[Sequence]', joined_reads, pd.Series)

    return jointable, joinseqs


def hash_seq(x):
    """
    A lazy function to hash things
    """
    return hashlib.md5(x.encode()).hexdigest()




parser = argparse.ArgumentParser(description=description)
parser.add_argument(
    '--sequences',
    help=('The extracted regional sequences'),
    )
parser.add_argument(
    '--sample-table',
    help=("A table of the samples to be simulated."),
    )
parser.add_argument(
    '--read-length',
    help=("length of the final simulated sequences. We treat these as paired"
          "end sequences"),
    type=int,
    )
parser.add_argument(
    '--table',
    help=('Path to save the joined-sequence table')
    )
parser.add_argument(
    '--reads',
    help=('Path to save the representative sequences for the joined table'),
    )
parser.add_argument(
    '--random',
    help=("A seed for randomn number generation"),
    default=None,
    )

if __name__ == '__main__':
    args = parser.parse_args()

    # Gets the regional sequences
    seqs =  Artifact.load(args.sequences).view(pd.Series)

    # Gets the table
    table = Artifact.load(args.sample_table).view(pd.DataFrame).fillna(0).astype(int).T

    # Simulates the samples
    if args.random is not None:
        np.random.seed(int(args.random))
    # fwd_table, rev_table, join_table, fwd_seqs, rev_seqs, join_seqs = \
    join_table, join_seqs = \
        build_regional_table(table, seqs, args.read_length)

    # fwd_table.save(args.fwd_table)
    # fwd_seqs.save(args.fwd_reads)
    # rev_table.save(args.rev_table)
    # rev_seqs.save(args.rev_reads)
    join_table.save(args.table)
    join_seqs.save(args.reads)

Python Pandas numpy QIIME2.0 scikit-bio regex From line 3 of bin/simulate_tables_exact.py

shell:
	"""
	qiime sidle filter-degenerate-sequences \
	 --i-sequences {input} \
	 --p-n-workers 2 \
	 --p-max-degen 5 \
	 --o-filtered-sequences {output}
	"""

SnakeMake QIIME2.0 From line 29 of main/Snakefile

shell:
	"""
	qiime feature-table filter-features \
	 --i-table {input.table} \
	 --m-metadata-file {input.sequences} \
	 --o-filtered-table {output}
	"""

SnakeMake QIIME2.0 From line 48 of main/Snakefile

shell:
	"""
	python bin/simulate_samples.py \
	--keep-map {input.metadata} \
	--sample-table {input.table} \
	--group-size {params.group_size} \
	--simulated-table {output.table} \
	--simulated-metadata {output.metadata}
	"""

SnakeMake From line 67 of main/Snakefile

shell:
	"""
	qiime feature-classifier extract-reads \
	 --i-sequences {input} \
	 --p-f-primer {params.fwd} \
	 --p-r-primer {params.rev} \
	 --p-min-length 50 \
	 --p-max-length 700 \
	 --p-n-jobs 5 \
	 --o-reads {output}
	"""

SnakeMake QIIME2.0 From line 91 of main/Snakefile

shell:
	"""
	python bin/simulate_tables_exact.py \
	 --sequences {input.seqs} \
	 --sample-table {input.table} \
	 --read-length {params.length} \
	 --random {params.seed} \
	 --table {output.joined_table} \
	 --reads {output.joined_repset}
	"""

SnakeMake From line 113 of main/Snakefile

run:
	import os
	import numpy as np
	import pandas as pd

	samples = inputs[0]
	region = params[0]
	joined = params[1]

	def get_sample_descriptor(fp):
	    name = fp.split('.')[0]
	    abs_fp = os.path.abspath(fp)
	    position, study_info, sample_info = name.split("__")
	    [pool_id, rep, region, pcr, dir_] = sample_info.split('_')

	    dir_label = {'R1': 'forward', 'R2': 'reverse'}[dir_]

	    summary = pd.Series(
	        data={'%s-absolute-filepath' % dir_label: abs_fp,
	              'sample-id': '{}_{}'.format(pool_id, rep),
	              'pool': pool_id,
	              'replicate': rep,
	              'region': region,
	              'pcr_type': pcr},
	        name='_'.join(sample_info.split('_')[:-1]))
	    return summary

	manifest = pd.DataFrame([get_sample_descriptor(fp) 
							 for fp in samples if ('R1.' in fp)])
	rev_summary = pd.DataFrame([get_sample_descriptor(fp) 
								for fp in samples if ('R2.' in fp)])

	if direction:
		manifest['reverse-absolute-filepath'] = rev_summary['reverse-absolute-filepath']
	else:
		manifest.rename(columns={'forward-absolute-filepath': 'absolute-filepath'}, 
						inplace=True)
	manifest.set_index('sample-id', inplace=True)
	manifest.to_csv(str(output), sep='\t')

SnakeMake Pandas numpy From line 140 of main/Snakefile

shell:
	"""
	qiime tools import \
	 --type {params.type} \
	 --input-path {input} \
	 --input-format {params.format} \
	 --output-path {output}
	"""

SnakeMake QIIME2.0 From line 190 of main/Snakefile

shell:
	"""
	qiime cutadapt trim-single \
	 --i-demultiplexed-sequences {input} \
	 --p-front {params.f_primer} \
	 --p-no-discard-untrimmed \
	 --o-trimmed-sequences {output}
	"""

SnakeMake Cutadapt QIIME2.0 From line 210 of main/Snakefile

shell:
	"""
	qiime dada2 denoise-single \
	 --i-demultiplexed-seqs {input} \
	 --p-trunc-len 280 \
	 --p-n-threads 4 \
	 --o-table {output.table} \
	 --o-representative-sequences {output.seqs} \
	 --o-denoising-stats {output.stats}
	"""

SnakeMake QIIME2.0 From line 228 of main/Snakefile

shell:
	"""
	qiime cutadapt trim-paired \
	 --i-demultiplexed-sequences {input} \
	 --p-front-f {params.f_primer} \
	 --p-front-r {params.r_primer} \
	 --p-no-discard-untrimmed \
	 --o-trimmed-sequences {output}
	"""

SnakeMake Cutadapt QIIME2.0 From line 251 of main/Snakefile

shell:
	"""
	qiime dada2 denoise-paired \
	 --i-demultiplexed-seqs {input} \
	 --p-trunc-len-f 280 \
	 --p-trunc-len-r 280 \
	 --p-n-threads 4 \
	 --o-table {output.table} \
	 --o-representative-sequences {output.seqs} \
	 --o-denoising-stats {output.stats}
	"""

SnakeMake QIIME2.0 From line 270 of main/Snakefile

shell:
	"""
	qiime sidle trim-dada2-posthoc \
	 --i-table {input.table}  \
	 --i-representative-sequences {input.seqs} \
	 --p-trim-length {params.length} \
	 --p-hashed-feature-ids \
	 --o-trimmed-table {output.table} \
	 --o-trimmed-representative-sequences {output.seqs} \
	"""

SnakeMake QIIME2.0 From line 297 of main/Snakefile

shell:
	"""
	qiime cutadapt trim-paired \
	 --i-demultiplexed-sequences {input} \
	 --p-front-f {params.fwd} \
	 --p-front-r {params.rev} \
	 --p-discard-untrimmed \
	 --o-trimmed-sequences {output}
	"""

SnakeMake Cutadapt QIIME2.0 From line 322 of main/Snakefile

shell:
	"""
	qiime vsearch join-pairs \
	--i-demultiplexed-seqs {input} \
	 --o-joined-sequences {output}
	"""

SnakeMake QIIME2.0 VSEARCH From line 343 of main/Snakefile

shell:
	"""
	qiime quality-filter q-score \
	 --i-demux {input} \
	 --o-filtered-sequences {output.seqs}\
	 --o-filter-stats {output.stats}
	"""

SnakeMake QIIME2.0 From line 358 of main/Snakefile

shell:
	"""
	qiime deblur denoise-16S \
	  --i-demultiplexed-seqs {input} \
	  --p-trim-length {params} \
	  --o-representative-sequences {output.seqs} \
	  --o-table {output.table} \
	  --p-sample-stats \
	  --o-stats {output.stats}
	"""

SnakeMake QIIME2.0 From line 377 of main/Snakefile

shell:
	"""
	qiime sidle filter-degenerate-sequences \
	 --i-sequences {input.seqs} \
	 --p-max-degen {params.degen} \
	 --p-n-workers 4 \
	 --o-filtered-sequences {params.tmp}

	qiime taxa filter-seqs \
	 --i-sequences {params.tmp} \
	 --i-taxonomy {input.taxa} \
	 --p-include {params.include} \
	 --p-exclude {params.exclude} \
	 --p-mode "contains" \
	 --o-filtered-sequences {output.seqs} 

	rm {params.tmp}
	"""

SnakeMake QIIME2.0 From line 407 of main/Snakefile

shell:
	"""
	qiime feature-classifier extract-reads \
	 --i-sequences {input} \
	 --p-f-primer {params.fwd} \
	 --p-r-primer {params.rev} \
	 --o-reads {output}
	"""

SnakeMake QIIME2.0 From line 436 of main/Snakefile

shell:
	"""
	qiime sidle prepare-extracted-region \
	 --i-sequences {input} \
	 --p-fwd-primer {params.fwd} \
	 --p-rev-primer {params.rev} \
	 --p-region {params.region} \
	 --p-trim-length {params.trim} \
	 --p-n-workers 2 \
	 --o-collapsed-kmers {output.kmers} \
	 --o-kmer-map {output.map_}
	"""

SnakeMake QIIME2.0 From line 458 of main/Snakefile

shell:
	"""
	cp {input.table} {output.table}
	cp {input.taxa} {output.taxa}
	cp {input.tree} {output.tree}
	"""

SnakeMake From line 487 of main/Snakefile

shell:
	"""
	qiime feature-classifier classify-sklearn \
	 --i-reads {input.seqs} \
	 --i-classifier {input.classifier} \
	 --o-classification {output}
	"""

SnakeMake QIIME2.0 From line 502 of main/Snakefile

shell:
	"""
	qiime taxa filter-table \
	 --i-table {input.table} \
	 --i-taxonomy {input.taxonomy} \
	 --p-include ';c__' \
	 --p-mode contains \
	 --o-filtered-table {output}
	"""

SnakeMake QIIME2.0 From line 516 of main/Snakefile

shell:
	"""
	qiime feature-table filter-seqs \
	 --i-table {input.table} \
	 --i-data {input.seqs} \
	 --o-filtered-data {output}
	"""

SnakeMake QIIME2.0 From line 532 of main/Snakefile

run:
	from qiime2 import Artifact
	import qiime2.plugins.feature_table.actions as q2_table

	tables = [Artifact.load(fp) for fp in input]

	merged = q2_table.merge(tables, overlap_method='sum').merged_table
	merged.save(str(output))

SnakeMake QIIME2.0 From line 550 of main/Snakefile

run:
	from qiime2 import Artifact
	import qiime2.plugins.feature_table.actions as q2_table

	tables = [Artifact.load(fp) for fp in input]

	merged = q2_table.merge_seqs(tables).merged_data
	merged.save(str(output))

SnakeMake QIIME2.0 From line 565 of main/Snakefile

shell:
	"""
	qiime vsearch cluster-features-closed-reference \
	 --i-sequences {input.seqs} \
	 --i-table {input.table} \
	 --i-reference-sequences {input.ref} \
	 --p-perc-identity {params.perc_identity} \
	 --p-threads {params.threads} \
	 --o-clustered-table {output.table} \
	 --o-clustered-sequences {output.centroids} \
	 --o-unmatched-sequences {output.discard}
	"""

SnakeMake QIIME2.0 VSEARCH From line 592 of main/Snakefile

shell:
	"""
	cp {input} {output}
	"""

SnakeMake From line 610 of main/Snakefile

shell:
	"""
	cp {input} {output}
	"""

SnakeMake From line 620 of main/Snakefile

shell:
	"""
	qiime feature-classifier classify-sklearn \
	 --i-reads {input.seqs} \
	 --i-classifier {input.classifier} \
	 --o-classification {output}
	"""

SnakeMake QIIME2.0 From line 636 of main/Snakefile

shell:
	"""
	qiime taxa filter-table \
	 --i-table {input.table} \
	 --i-taxonomy {input.taxonomy} \
	 --p-include ";D_1__" \
	 --p-mode contains \
	 --o-filtered-table {output}
	"""

SnakeMake QIIME2.0 From line 652 of main/Snakefile

shell:
	"""
	qiime feature-table filter-seqs \
	 --i-table {input.table} \
	 --i-data {input.seqs} \
	 --o-filtered-data {output}
	"""

SnakeMake QIIME2.0 From line 668 of main/Snakefile

shell:
	"""
	qiime fragment-insertion sepp \
		--i-representative-sequences {input.seqs} \
		--i-reference-database {input.reference} \
		--o-tree {output.tree} \
		--p-threads {params.threads} \
		--o-placements {output.placements}
	"""

SnakeMake QIIME2.0 From line 685 of main/Snakefile

shell:
	"""
	qiime sidle align-regional-kmers \
	 --i-kmers {input.kmers} \
	 --i-rep-seq {input.rep_seq} \
	 --p-region {params.region} \
	 --p-max-mismatch {params.mismatch} \
	 --p-n-workers 2 \
	 --p-chunk-size 75 \
	 --o-regional-alignment {output.alignment} \
	 --verbose
	"""

SnakeMake QIIME2.0 From line 712 of main/Snakefile

run:
	from qiime2 import Artifact
	import qiime2.plugins.sidle.actions as q2_sidle

	num_regions = len(params[0])
	regions = params[0]
	nworkers = params[1]

	kmers = [Artifact.load(str(fp_)) 
			 for fp_ in input[:num_regions]]
	align = [Artifact.load(str(fp_)) 
			 for fp_ in input[num_regions:2*num_regions]]
	tables = [Artifact.load(str(fp_)) 
				for fp_ in input[2*num_regions:]]

	table, summary, db_map = q2_sidle.reconstruct_counts(
		kmer_map=kmers,
		regional_alignment=align,
		regional_table=tables,
		region=regions,
		region_normalize='average',
		n_workers=3,
		# n_workers=nworkers,
		# verbose=True,
		)

	table.save(str(output[0]))
	summary.save(str(output[1]))
	db_map.save(str(output[2]))

SnakeMake QIIME2.0 From line 739 of main/Snakefile

shell:
	"""
	qiime sidle reconstruct-taxonomy \
	 --i-reconstruction-map {input.db_map} \
	 --i-taxonomy {input.taxa} \
	 --p-database {params.database} \
	 --p-define-missing merge \
	 --o-reconstructed-taxonomy {output}
	"""

SnakeMake QIIME2.0 From line 780 of main/Snakefile

run:
	from qiime2 import Artifact
	import qiime2.plugins.sidle.actions as q2_sidle

	summary = Artifact.load(str(input[0]))
	db_map = Artifact.load(str(input[1]))
	aligned_seqs = Artifact.load(str(input[2]))
	kmer_maps = [Artifact.load(str(fp)) for fp in input[3:]]

	regions = params[0]

	representative_fragments = q2_sidle.reconstruct_fragment_rep_seqs(
		reconstruction_map=db_map,
		reconstruction_summary=summary,
		aligned_sequences=aligned_seqs,
		kmer_map=kmer_maps,
		region=regions
		).representative_fragments
	representative_fragments.save(str(output))

SnakeMake QIIME2.0 From line 800 of main/Snakefile

shell:
	"""
	qiime fragment-insertion sepp \
		--i-representative-sequences {input.fragments} \
		--i-reference-database {input.reference} \
		--p-threads 6 \
		--o-tree {output.tree} \
		--o-placements {output.placements}
	"""

SnakeMake QIIME2.0 From line 828 of main/Snakefile

shell:
	"""
	qiime feature-table rarefy \
	 --i-table {input.table} \
	 --p-sampling-depth {params.depth} \
	 --o-rarefied-table {output.table}
	"""

SnakeMake QIIME2.0 From line 852 of main/Snakefile

run:
	from qiime2 import Artifact
	import qiime2.plugins.diversity.actions as q2_diversity

	table = Artifact.load(str(input[0]))
	tree = Artifact.load(str(input[1]))
	metric = params[0]

	if metric == 'faith_pd':
		alpha = q2_diversity.alpha_phylogenetic(
			table=table,
			phylogeny=tree,
			metric=metric,
			).alpha_diversity
	else:
		alpha = q2_diversity.alpha(
			table=table,
			metric=metric,
			).alpha_diversity
	alpha.save(output[0])

SnakeMake QIIME2.0 From line 870 of main/Snakefile

shell:
	"""
	qiime diversity beta-phylogenetic \
	 --i-table {input.table} \
	 --i-phylogeny {input.tree} \
	 --p-metric {params.metric} \
	 --o-distance-matrix {output}
	"""

SnakeMake QIIME2.0 From line 901 of main/Snakefile

shell:
	"""
	qiime taxa collapse \
	 --i-table {input.table} \
	 --i-taxonomy {input.taxonomy} \
	 --p-level 6 \
	 --o-collapsed-table {output.table}

	qiime diversity beta \
	--i-table {output.table} \
	--p-metric braycurtis \
	--o-distance-matrix {output.dm}
	"""

SnakeMake QIIME2.0 From line 917 of main/Snakefile

shell:
	"""
	qiime diversity beta \
	--i-table {input.table} \
	--p-metric braycurtis \
	--o-distance-matrix {output}
	"""

SnakeMake QIIME2.0 From line 937 of main/Snakefile

shell:
	"""
	qiime tools export \
	 --input-path {input.dm} \
	 --output-path {params.dir_}

	Rscript bin/adonis_single.R \
	 -d {params.dir_}/distance-matrix.tsv \
	 -m {input.meta} \
	 -o {output}

	rm -r {params.dir_}
	"""