Modular Shotgun Sequence Analysis Workflow: Oecophylla - Harnessing Snakemake

public 1yr ago 0 bookmarks

View Workflow

Canonically pronounced ee-co-fill-uh , Oecophylla is a Snakemake wrapper for shotgun sequence analysis.

Rather than being a single monolithic tool, Oecophylla is composed of a series of modules , each of which performs a series of related tasks on the data---examples include qc , assemble , or taxonomy . These modules can be independently installed, and are easy to add or change if you want to try a new analysis on an existing data set.

Because Oecophylla is written using the Snakemake bioinformatics workflow system, it inherits many of that tool's advantages, including reproducible execution, automatic updating of downstream steps after upstream modifications, and cluster-enabled parallel execution. To learn more about Snakemake, please read the documentation .

Documentation

You can find instructions on installation and use of Oecophylla at the documentation on ReadTheDocs.io.

Code Snippets

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        dbname = os.path.basename(output.db)
        h5name = os.path.basename(output.h5)
        shell("""
                set +u; {params.env}; set -u

                anvi-gen-contigs-database -f {input} \
                -o {temp_dir}/{dbname} \
                -n '{wildcards.bin_sample} contigs db' 2> {log} 1>&2

                anvi-run-hmms -c {temp_dir}/{dbname} --num-threads {threads} 2>> {log} 1>&2

                scp {temp_dir}/{dbname} {output.db}
                scp {temp_dir}/{h5name} {output.h5}
              """)

SnakeMake scp Anvio From line 16 of anvio/anvio.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        out_name = os.path.basename(output.gene_calls)
        db = anvio_dir + "{s}/{s}.db".format(s=wildcards.bin_sample)
        shell("""
                set +u; {params.env}; set -u

                anvi-get-dna-sequences-for-gene-calls -c {db} -o {temp_dir}/{out_name} 2> {log} 1>&2

                scp {temp_dir}/{out_name} {output.gene_calls}
              """)

SnakeMake scp Anvio From line 45 of anvio/anvio.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        hits_name = os.path.basename(output.hits)
        report_name = os.path.basename(output.report)
        db = anvio_dir + "{s}/{s}.db".format(s=wildcards.bin_sample)
        shell("""
                set +u; {params.cent_env}; set -u

                centrifuge -f --threads {threads} \
                -x {params.centrifuge_db} \
                {input.fa} \
                -S {temp_dir}/{hits_name} \
                --report-file {temp_dir}/{report_name}

                scp {temp_dir}/{hits_name} {output.hits}
                scp {temp_dir}/{report_name} {output.report}

                set +u; {params.anvi_env}; set -u

                anvi-import-taxonomy -c {db} \
                -i {output.report} {output.hits} \
                -p centrifuge 2> {log} 1>&2
              """)

SnakeMake scp Anvio Centrifuge From line 74 of anvio/anvio.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        prof_dir = os.path.dirname(output.prof)
        db = anvio_dir + "{s}/{s}.db".format(s=wildcards.bin_sample)
        shell("""
                set +u; {params.env}; set -u

                scp {input.bam} {temp_dir}/{wildcards.bin_sample}_{wildcards.abund_sample}.bam
                scp {input.bai} {temp_dir}/{wildcards.bin_sample}_{wildcards.abund_sample}.bam.bai

                anvi-profile -i {temp_dir}/{wildcards.bin_sample}_{wildcards.abund_sample}.bam \
                --num-threads {threads} --write-buffer-size 1000 \
                -c {db} \
                --skip-SNV-profiling \
                --overwrite-output-destinations \
                -o {temp_dir}/out 2> {log} 1>&2

                scp {temp_dir}/out/* {prof_dir}/.
              """)

    lambda wildcards: expand(map_dir + "{bin_sample}/mapping/{bin_sample}_{abund_sample}.bam",
           abund_sample=bin_config[wildcards.bin_sample],
           bin_sample=wildcards.bin_sample)

SnakeMake scp Anvio From line 116 of anvio/anvio.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        merge_dir = os.path.dirname(output.prof)
        db = anvio_dir + "{s}/{s}.db".format(s=wildcards.bin_sample)
        shell("""
                set +u; {params.env}; set -u

                anvi-merge {input.profiles} \
                -o {temp_dir}/SAMPLES_MERGED \
                -c {db} \
                -W 2> {log} 1>&2

                scp -r {temp_dir}/SAMPLES_MERGED/* {merge_dir}/.
              """)

SnakeMake scp Anvio From line 156 of anvio/anvio.rule

run:
    db = anvio_dir + "{s}/{s}.db".format(s=wildcards.bin_sample)
    shell("""
            set +u; {params.env}; set -u

            anvi-import-collection -p {input.prof} \
            -c {db} \
            -C "MaxBin2" \
            --contigs-mode \
            {input.bins} 2> {log} 1>&2
          """)

SnakeMake From line 184 of anvio/anvio.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        db = anvio_dir + "{s}/{s}.db".format(s=wildcards.bin_sample)
        shell("""
                set +u; {params.env}; set -u

                anvi-summarize -p {input.prof} \
                -c {db} \
                -o {temp_dir}/{wildcards.bin_sample}_samples-summary_CONCOCT \
                -C CONCOCT 2> {log} 1>&2

                scp {temp_dir}/{wildcards.bin_sample}_samples-summary_CONCOCT/bins_summary.txt {output.txt}
                scp {temp_dir}/{wildcards.bin_sample}_samples-summary_CONCOCT/index.html {output.report}

                tar -czvf {temp_dir}/{wildcards.bin_sample}_samples-summary_CONCOCT.tar.gz {temp_dir}/{wildcards.bin_sample}_samples-summary_CONCOCT

                scp {temp_dir}/{wildcards.bin_sample}_samples-summary_CONCOCT.tar.gz {output.tar}
              """)

SnakeMake scp Anvio From line 210 of anvio/anvio.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        mem_b = params.memory * 1000000000
        outdir = os.path.dirname(output[0])
        shell("""
              set +u; {params.env}; set -u

              #rm -r {outdir}

              megahit -1 {input.forward} -2 {input.reverse} \
              -m {mem_b} -t {threads} -o {temp_dir}/megahit \
              --out-prefix {wildcards.sample} \
              2> {log} 1>&2

              scp {temp_dir}/megahit/{{*.log,*.fa,*.txt}} {outdir}/.
              """)

SnakeMake MEGAHIT scp From line 19 of assemble/assemble.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        outdir = os.path.dirname(output[0])
        contigs_fp = os.path.abspath(os.path.join(outdir,'contigs.fasta'))
        shell("""
              set +u; {params.env}; set -u

              spades.py --meta -t {threads} -m {params.mem} \
              -1 {input.forward} -2 {input.reverse} -o {temp_dir} \
              2> {log} 1>&2

              scp -r {temp_dir}/{{spades.log,*.fasta,assembly_graph*,*.paths}} {outdir}/.

              ln -s {contigs_fp} {outdir}/{wildcards.sample}.contigs.fa
              """)

SnakeMake SPAdes scp From line 52 of assemble/assemble.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        outdir = os.path.dirname(output[0])
        contigs_fp = os.path.abspath(os.path.join(outdir,'contigs.fasta'))
        shell("""
              set +u; {params.env}; set -u

              spades.py -t {threads} -m {params.mem} \
              -1 {input.forward} -2 {input.reverse} -o {temp_dir} \
              2> {log} 1>&2

              scp -r {temp_dir}/{{spades.log,*.fasta,assembly_graph*,*.paths}} {outdir}/.

              ln -s {contigs_fp} {outdir}/{wildcards.sample}.contigs.fa
              """)

SnakeMake SPAdes scp From line 84 of assemble/assemble.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        outdir = os.path.dirname(output.done)
        assemblies = ' '.join(input)
        shell("""
              set +u; {params.env}; set -u

              metaquast.py -t {threads} -o {temp_dir}/metaquast \
              {assemblies} 2> {log} 1>&2

              scp -r {temp_dir}/metaquast/* {outdir}/.

              touch {output.done}
              """)

SnakeMake scp MetaQUAST From line 116 of assemble/assemble.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        outdir = os.path.dirname(output.done)
        assemblies = ' '.join(input)
        shell("""
              set +u; {params.env}; set -u

              quast.py -t {threads} -o {temp_dir}/quast \
              {assemblies} 2> {log} 1>&2

              scp -r {temp_dir}/quast/* {outdir}/.

              touch {output.done}
              """)

SnakeMake QUAST scp From line 147 of assemble/assemble.rule

run:
    out_dir = os.path.dirname(output[0])
    shell("""
          set +u; {params.env}; set -u
          multiqc -f -o {out_dir} {assemble_dir}/*/quast/report.tsv 2> {log} 1>&2
          """)

SnakeMake MultiQC From line 176 of assemble/assemble.rule

run:
    f_fastqs = ' '.join(input.forward)
    r_fastqs = ' '.join(input.reverse)
    shell("""
          cat {f_fastqs} > {output.forward} 2> {log}
          cat {r_fastqs} > {output.reverse} 2> {log}
          """)

SnakeMake From line 207 of assemble/assemble.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        out_fp = os.path.abspath(output[0])
        for bam in input.bams:
            fn = os.path.basename(bam).split('.')[0]
            shell("""
                    set +u; {params.env}; set -u;

                    samtools view {bam} --threads {threads} -h | \
                    pileup.sh -Xmx8g out={temp_dir}/{fn}_pileup.txt 2> {log} 1>&2

                    awk '{{print $1"\t"$2}}' {temp_dir}/{fn}_pileup.txt | \
                    grep -v '^#' > {temp_dir}/{fn}_abund.txt
                  """)

        shell("""
                cd {temp_dir}
                mkdir {wildcards.bin_sample}_abund_files
                mv *.txt {wildcards.bin_sample}_abund_files/.
                tar -czvf {wildcards.bin_sample}_abund_files.tar.gz {wildcards.bin_sample}_abund_files
                scp {wildcards.bin_sample}_abund_files.tar.gz {out_fp}
              """)

SnakeMake SAMtools scp From line 17 of bin/bin.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        out_base = os.path.dirname(output[0])
        if params.maxbin is None:
            params.maxbin = ''
        shell("""
                set +u; {params.env}; set -u;

                mkdir {temp_dir}/abund_files
                tar -xzvf {input.abund} -C {temp_dir}/abund_files --strip-components=1

                touch {temp_dir}/abund_list.txt

                for f in {temp_dir}/abund_files/*.txt
                do
                    echo $f >> {temp_dir}/abund_list.txt
                done

                run_MaxBin.pl -contig {input.ref} \
                -out {temp_dir}/{wildcards.bin_sample} \
                -abund_list {temp_dir}/abund_list.txt \
                {params.maxbin} -thread {threads} 2> {log} 1>&2

                rm -r {temp_dir}/abund_files

                scp {temp_dir}/* {out_base}/.
              """)

SnakeMake scp SqueezeMeta From line 56 of bin/bin.rule

run:
    bin_dir = os.path.dirname(input[0])
    bins = ''
    for file in os.listdir(bin_dir):
        if file.endswith(".fasta"):
            bin = 'Bin_' + str(file.split('.')[-2])
            with open(os.path.join(bin_dir, file),'r') as f:
                for line in f:
                    if line.strip().startswith('>'):
                        name = line.strip()[1:]
                        bins += "{0}\t{1}\n".format(name, bin)

    with open(output[0], 'w') as out:
        out.write(bins)

SnakeMake From line 96 of bin/bin.rule

run:
    output_base = os.path.splitext(output[0])[0]

    shell("""
          set +u; {params.env}; set -u

          mash sketch -r -k {params.kmer} \
          -m {params.m} -s {params.size} \
          -o {output_base} {input.forward}
          """)

SnakeMake Mash From line 20 of distance/distance.rule

run:
    from itertools import combinations
    for i, j in combinations(input, 2):
        shell('mash dist {i} {j} >> {output[0]}')

SnakeMake Mash From line 45 of distance/distance.rule

run:
    from skbio.stats.distance import DissimilarityMatrix
    import pandas as pd
    import numpy as np

    mash_vec = pd.read_csv(input[0], sep = '\t', header=None)

    # get sorted list of samples
    samples = sorted(set(mash_vec[0]) | set(mash_vec[1]))

    dm = np.zeros([len(samples),len(samples)])
    pm = np.zeros([len(samples),len(samples)])

    # fill matrices with values
    for s1, s2, d, p in zip(mash_vec[0],mash_vec[1],mash_vec[2],mash_vec[3]):
        i1 = samples.index(s1)
        i2 = samples.index(s2)
        print('s1: %s, s2: %s, i1: %s, i2: %s, d: %s, p: %s' % (s1, s2, i1, i2, d, p))
        dm[i1,i2] = d
        dm[i2,i1] = d
        pm[i1,i2] = p
        pm[i2,i1] = p

    ids = [os.path.basename(x) for x in samples]
    sk_dm = DissimilarityMatrix(dm, ids=ids)
    sk_pm = DissimilarityMatrix(pm, ids=ids)

    sk_dm.write(output['dist_matrix'])
    sk_pm.write(output['p_matrix'])

SnakeMake Pandas numpy scikit-bio From line 66 of distance/distance.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        shell("""
              set +u; {params.env}; set -u
              zcat {input.forward} {input.reverse} > {temp_dir}/{wildcards.sample}

              sourmash compute --scaled {params.scaled} \
              -k {params.kmer} -o {output.sketch} \
              {temp_dir}/{wildcards.sample}
              """)

SnakeMake From line 123 of distance/distance.rule

run:
    from skbio.stats.distance import DistanceMatrix
    from skbio.io import write
    import pandas as pd
    import numpy as np

    i = ' '.join(input)

    shell("""set +u; {params.env}; set -u
             sourmash compare {i} --csv {output.sim_list}""")

    sim = pd.read_csv(output['sim_list'])

    ids = [os.path.basename(x) for x in sim.columns]

    dist = (1 - sim).values
    # because numerical overflow, set diagonal to zero explicitly
    np.fill_diagonal(dist, 0)
    dm = DistanceMatrix(dist, ids=ids)
    dm.write(output.dist_matrix)

SnakeMake Pandas numpy scikit-bio From line 147 of distance/distance.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        for file in input:
            shell("cp {0} {1}/.".format(file, temp_dir))
        shell("""
              set +u; {params.env}; set -u

              humann2_join_tables --input {temp_dir} \
              --output {output.joint_prof} 2> {log} 1>&2

              humann2_reduce_table --input {output.joint_prof} \
              --output {output.max_prof} --function max \
              --sort-by level 2>> {log} 1>&2
              """)

SnakeMake HUMAnN2 From line 24 of function/function.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        shell("""
              set +u; {params.env}; set -u

              zcat {input.forward} {input.reverse} > {temp_dir}/input.fastq

              humann2 --input {temp_dir}/input.fastq \
              --output {temp_dir}/{wildcards.sample} \
              --output-basename {wildcards.sample} \
              --nucleotide-database {params.nt_db} \
              --protein-database {params.aa_db} \
              --taxonomic-profile {input.metaphlan_in} \
              --o-log {log} \
              --threads {threads} \
              {params.other} 2> {log} 1>&2

              scp {temp_dir}/{wildcards.sample}/{wildcards.sample}_genefamilies.tsv {output.genefamilies}
              scp {temp_dir}/{wildcards.sample}/{wildcards.sample}_pathcoverage.tsv {output.pathcoverage}
              scp {temp_dir}/{wildcards.sample}/{wildcards.sample}_pathabundance.tsv {output.pathabundance}
              """)

SnakeMake scp From line 66 of function/function.rule

run:
    out_dir = os.path.join(func_dir, "humann2")
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        shell("""
              set +u; {params.env}; set -u

              # copy per-sample tables
              mkdir {temp_dir}/genef
              cp {input.genef} {temp_dir}/genef/.

              mkdir {temp_dir}/pathc
              cp {input.pathc} {temp_dir}/pathc/.

              mkdir {temp_dir}/patha
              cp {input.patha} {temp_dir}/patha/.

              # combine tables
              humann2_join_tables --input {temp_dir}/genef \
              --output {temp_dir}/genefamilies.tsv \
              --file_name genefamilies 2> {log} 1>&2

              humann2_join_tables --input {temp_dir}/pathc \
              --output {temp_dir}/pathcoverage.tsv \
              --file_name pathcoverage 2>> {log} 1>&2

              humann2_join_tables --input {temp_dir}/patha \
              --output {temp_dir}/pathabundance.tsv \
              --file_name pathabundance 2>> {log} 1>&2


              # normalize
              humann2_renorm_table \
              --input {temp_dir}/genefamilies.tsv \
              --output {temp_dir}/genefamilies_cpm.tsv \
              --units cpm -s n 2>> {log} 1>&2

              humann2_renorm_table \
              --input {temp_dir}/pathcoverage.tsv \
              --output {temp_dir}/pathcoverage_relab.tsv \
              --units relab -s n 2>> {log} 1>&2

              humann2_renorm_table \
              --input {temp_dir}/pathabundance.tsv \
              --output {temp_dir}/pathabundance_relab.tsv \
              --units relab -s n 2>> {log} 1>&2


              # stratify
              humann2_split_stratified_table \
              --input {temp_dir}/genefamilies_cpm.tsv \
              --output {temp_dir} 2>> {log} 1>&2

              humann2_split_stratified_table \
              --input {temp_dir}/pathcoverage_relab.tsv \
              --output {temp_dir} 2>> {log} 1>&2

              humann2_split_stratified_table \
              --input {temp_dir}/pathabundance_relab.tsv \
              --output {temp_dir} 2>> {log} 1>&2


              # convert to biom
              for f in {temp_dir}/*.tsv
              do
              fn=$(basename "$f")
              biom convert -i $f -o {temp_dir}/"${{fn%.*}}".biom --to-hdf5
              done


              # copy bioms to output
              cp {temp_dir}/*.biom {out_dir}/.
              """)

SnakeMake HUMAnN2 From line 124 of function/function.rule

run:
    out_dir = os.path.dirname(output[0])
    # out_dir = os.path.join(func_dir, "{sample}/shogun")
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        shell("""
              set +u; {params.env}; set -u

              cp {input.profile} {temp_dir}/{wildcards.sample}.txt

              for level in genus species strain
              do
                shogun functional \
                --database {params.db} \
                --input {temp_dir}/{wildcards.sample}.txt \
                --output {temp_dir} \
                --level $level \
                2>> {log} 1>&2
              done

              cp {temp_dir}/{wildcards.sample}.*.txt {out_dir}/
              """)

SnakeMake From line 243 of function/function.rule

run:
    for level in ('genus', 'species', 'strain'):
        for target in ('kegg', 'modules', 'modcov', 'pathways', 'pathcov'):
            item = '%s_%s' % (level, target)
            pandas2biom(output[item],
                        combine_profiles(zip(samples, input[item])))

SnakeMake From line 321 of function/function.rule

run:
    prepend = '{0}_{1}_contig_'.format(wildcards.bin_sample,
                                       config['params']['mapping_assembler'])

    simplify_headers(input[0], prepend=prepend,
                     output_fp=output.fasta, header_fp=output.headers)

SnakeMake From line 12 of map/map.rule

run:
    outdir = os.path.dirname(output[0])
    shell("""
          set +u; {params.env}; set -u;

          bowtie2-build {input} {outdir}/{wildcards.bin_sample} 2> {log} 1>&2""")

SnakeMake Bowtie 2 From line 33 of map/map.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        idx_base = os.path.join(os.path.dirname(input.idx[0]),
                                wildcards.bin_sample)
        shell("""
                set +u; {params.env}; set -u;

                bowtie2 -x {idx_base} -p {threads} --no-unal \
                -q -1 {input.forward} -2 {input.reverse} 2> {log.bowtie} | \
                samtools sort -O bam -l 0 -T {temp_dir} -o {temp_dir}/out.bam 2> {log.other}

                samtools index {temp_dir}/out.bam

                scp {temp_dir}/out.bam {output.bam}
                scp {temp_dir}/out.bam.bai {output.bai}
              """)

SnakeMake SAMtools Bowtie 2 scp From line 58 of map/map.rule

run:
    out_dir = os.path.dirname(output[0])
    in_fastqs = ' '.join(input.forward + input.reverse)
    shell("""
          set +u; {params.env}; set -u

          fastqc --threads {threads} --outdir {out_dir} {in_fastqs} 2> {log} 1>&2

          touch {output}
          """)

SnakeMake FastQC From line 16 of qc/qc.rule

run:
    out_dir = os.path.dirname(output[0])
    shell("""
          set +u; {params.env}; set -u

          multiqc -f -o {out_dir} {raw_dir}/*/fastqc_per_file 2> {log} 1>&2
          """)

SnakeMake MultiQC From line 42 of qc/qc.rule

run:
    f_fastqs = ' '.join(input.forward)
    r_fastqs = ' '.join(input.reverse)
    shell("""
          cat {f_fastqs} > {output.forward} 2> {log}
          cat {r_fastqs} > {output.reverse} 2> {log}
          """)

SnakeMake From line 65 of qc/qc.rule

run:
    out_dir = os.path.dirname(output[0])
    print(out_dir)
    shell("""
          set +u; {params.env}; set -u

          fastqc --threads {threads} --outdir {out_dir} {input.forward} {input.reverse} 2> {log} 1>&2
          """)

SnakeMake FastQC From line 90 of qc/qc.rule

run:
    out_dir = os.path.dirname(output[0])
    shell("""
          set +u; {params.env}; set -u

          multiqc -f -o {out_dir} {raw_dir}/*/fastqc_per_sample 2> {log} 1>&2
          """)

SnakeMake MultiQC From line 114 of qc/qc.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        f_fp = os.path.basename(output.forward)
        r_fp = os.path.basename(output.reverse)
        shell("""
              set +u; {params.env}; set -u

              atropos --threads {threads} {params.atropos} --report-file {log} --report-formats txt -o {temp_dir}/{f_fp} -p {temp_dir}/{r_fp} -pe1 {input.forward} -pe2 {input.reverse}

              scp {temp_dir}/{f_fp} {output.forward}
              scp {temp_dir}/{r_fp} {output.reverse}
              """)

SnakeMake scp From line 151 of qc/qc.rule

run:
    if params.filter_db is None:
        f_fp = os.path.abspath(input.forward)
        r_fp = os.path.abspath(input.reverse)
        shell("""
              ln -s {f_fp} {output.forward}
              ln -s {r_fp} {output.reverse}

              echo 'No DB provided; sample not filtered.' > {log.bowtie}
              echo 'No DB provided; sample not filtered.' > {log.other}
              """)
    else:
        with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
            shell("""
                  set +u; {params.env}; set -u

                  bowtie2 -p {threads} -x {params.filter_db} --very-sensitive -1 {input.forward} -2 {input.reverse} 2> {log.bowtie}| \
                  samtools view -f 12 -F 256 -b -o {temp_dir}/{wildcards.sample}.unsorted.bam 2> {log.other} 

                  samtools sort -T {temp_dir}/{wildcards.sample} -@ {threads} -n \
                  -o {temp_dir}/{wildcards.sample}.bam {temp_dir}/{wildcards.sample}.unsorted.bam 2> {log.other} 

                  bedtools bamtofastq -i {temp_dir}/{wildcards.sample}.bam -fq {temp_dir}/{wildcards.sample}.R1.trimmed.filtered.fastq -fq2 {temp_dir}/{wildcards.sample}.R2.trimmed.filtered.fastq 2> {log.other}

                  pigz -p {threads} -c {temp_dir}/{wildcards.sample}.R1.trimmed.filtered.fastq > {temp_dir}/{wildcards.sample}.R1.trimmed.filtered.fastq.gz
                  pigz -p {threads} -c {temp_dir}/{wildcards.sample}.R2.trimmed.filtered.fastq > {temp_dir}/{wildcards.sample}.R2.trimmed.filtered.fastq.gz

                  cp {temp_dir}/{wildcards.sample}.R1.trimmed.filtered.fastq.gz {output.forward}
                  cp {temp_dir}/{wildcards.sample}.R2.trimmed.filtered.fastq.gz {output.reverse} 
                  """)

SnakeMake SAMtools BEDTools Bowtie 2 From line 198 of qc/qc.rule

run:
    out_dir = os.path.dirname(output[0])
    shell("""
          set +u; {params.env}; set -u

          fastqc --threads {threads} --outdir {out_dir} {input.forward} {input.reverse} 2> {log} 1>&2
          """)

SnakeMake FastQC From line 248 of qc/qc.rule

run:
    out_dir = os.path.dirname(output[0])
    shell("""
          set +u; {params.env}; set -u

          multiqc -f -s -o {out_dir} {qc_dir}/*/fastqc_per_sample {qc_dir}/logs 2> {log} 1>&2
          """)

SnakeMake MultiQC From line 271 of qc/qc.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        shell("""
              set +u; {params.env}; set -u

              # get stem file path
              stem={output.profile}
              stem=${{stem%.profile.txt}}

              # merge input files
              zcat {input.forward} {input.reverse} > {temp_dir}/input.fastq

              # run MetaPhlAn2 to generate taxonomic profile
              metaphlan2.py {temp_dir}/input.fastq \
                --input_type fastq \
                --mpa_pkl {params.db}.pkl \
                --bowtie2db {params.db} \
                --nproc {threads} \
                --tmp_dir {temp_dir} \
                --bowtie2out {temp_dir}/map.tmp \
                -o {output.profile} \
                2> {log} 1>&2

              # keep mapping file
              if [[ "{params.map}" == "True" ]]
              then
                gzip -c {temp_dir}/map.tmp > $stem.map.txt.gz
              fi
              """)

SnakeMake From line 35 of taxonomy/taxonomy.rule

run:
    table = combine_profiles(zip(samples, input))
    pandas2biom(output[0], table)
    name2tid = None
    if params['name2tid']:
        with open(params['name2tid'], 'r') as f:
            name2tid = dict(x.split('\t') for x in f.read().splitlines())
    for level in params['levels'].split(','):
        pandas2biom('%s/metaphlan2/combined_profile.%s.biom'
                    % (taxonomy_dir, level),
                    extract_level(table, level[0].lower(), delim='|',
                                  dic=name2tid))

SnakeMake From line 86 of taxonomy/taxonomy.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        shell("""
              set +u; {params.env}; set -u

              # get stem file path
              stem={output.report}
              stem=${{stem%.report.txt}}

              # run Kraken to align reads against reference genomes
              kraken {input.forward} {input.reverse} \
                --db {params.db} \
                --paired \
                --fastq-input \
                --gzip-compressed \
                --only-classified-output \
                --threads {threads} \
                1> {temp_dir}/map.tmp \
                2> {log}

              # generate hierarchical report
              kraken-report {temp_dir}/map.tmp \
                --db {params.db} \
                1> {output.report} \
                2>> {log}

              # generate lineage to count table
              kraken-mpa-report {temp_dir}/map.tmp \
                --db {params.db} \
                1> {output.profile} \
                2>> {log}

              # keep mapping file
              if [[ "{params.map}" == "True" ]]
              then
                gzip -c {temp_dir}/map.tmp > $stem.map.txt.gz
              fi

              # run Bracken to re-estimate abundance at given rank
              if [[ ! -z {params.levels} ]]
              then
                IFS=',' read -r -a levels <<< "{params.levels}"
                for level in "${{levels[@]}}"
                do
                  est_abundance.py -i {output.report} \
                    -k {params.kmers} \
                    -t 10 \
                    -l $(echo $level | head -c 1 | tr a-z A-Z) \
                    -o $stem.redist.$level.txt \
                    2>> {log} 1>&2
                  rm $stem.report_bracken.txt
                done
              fi
              """)

SnakeMake Kraken From line 130 of taxonomy/taxonomy.rule

run:
    pandas2biom(output[0], combine_profiles(zip(samples, input)))
    for level in params['levels'].split(','):
        redists = ['%s/%s/kraken/%s.redist.%s.txt'
                   % (taxonomy_dir, sample, sample, level)
                   for sample in samples]
        pandas2biom('%s/kraken/combined_redist.%s.biom'
                    % (taxonomy_dir, level),
                    combine_bracken(zip(samples, redists)))

SnakeMake From line 203 of taxonomy/taxonomy.rule

run:
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        shell("""
              set +u; {params.env}; set -u

              # get stem file path
              stem={output.report}
              stem=${{stem%.report.txt}}

              # run Centrifuge to align reads against reference genomes
              centrifuge \
                -1 {input.forward} \
                -2 {input.reverse} \
                -x {params.db} \
                -p {threads} \
                -S {temp_dir}/map.tmp \
                --report-file {output.profile} \
                2> {log} 1>&2

              # generate Kraken-style hierarchical report
              centrifuge-kreport {temp_dir}/map.tmp \
                -x {params.db} \
                1> {output.report} \
                2>> {log}

              # keep mapping file
              if [[ "{params.map}" == "True" ]]
              then
                gzip -c {temp_dir}/map.tmp > $stem.map.txt.gz
              fi
              """)

SnakeMake Centrifuge From line 244 of taxonomy/taxonomy.rule

run:
    # this is not a typo. Centrifuge produces Kraken-style reports.
    comb, lv2tids = combine_kraken(zip(samples, input))
    pandas2biom(output[0], comb)
    for level in params['levels'].split(','):
        pandas2biom('%s/centrifuge/combined_profile.%s.biom'
                    % (taxonomy_dir, level),
                    comb[comb.index.isin(lv2tids[level[0].upper()])])

SnakeMake From line 295 of taxonomy/taxonomy.rule

run:
    aln2ext = {'utree': 'tsv', 'burst': 'b6', 'bowtie2': 'sam'}
    ext = aln2ext[params['aligner']]
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        shell("""
              set +u; {params.env}; set -u

              # get stem file path
              stem={output.profile}
              stem=${{stem%.profile.txt}}

              # interleave paired fastq's and convert to fasta
              seqtk mergepe {input.forward} {input.reverse} | \
              seqtk seq -A > {temp_dir}/{wildcards.sample}.fna

              # map reads to reference database
              shogun align \
              --aligner {params.aligner} \
              --threads {threads} \
              --database {params.db} \
              --input {temp_dir}/{wildcards.sample}.fna \
              --output {temp_dir} \
              2> {log} 1>&2

              # build taxonomic profile based on read map
              shogun assign_taxonomy \
              --aligner {params.aligner} \
              --database {params.db} \
              --input {temp_dir}/alignment.{params.aligner}.{ext} \
              --output {output.profile} \
              2> {log} 1>&2

              # keep mapping file
              if [[ "{params.map}" == "True" ]]
              then
                gzip -c {temp_dir}/alignment.{params.aligner}.{ext} > $stem.{params.aligner}.{ext}.gz
              fi

              # redistribute reads to given taxonomic ranks
              if [[ ! -z {params.levels} ]]
              then
                IFS=',' read -r -a levels <<< "{params.levels}"
                for level in "${{levels[@]}}"
                do
                  shogun redistribute \
                  --database {params.db} \
                  --level $level \
                  --input {output.profile} \
                  --output $stem.redist.$level.txt \
                  2> {log} 1>&2
                done
              fi
              """)

SnakeMake Bowtie 2 seqtk From line 334 of taxonomy/taxonomy.rule

run:
    pandas2biom(output[0], combine_profiles(zip(samples, input)))
    for level in params['levels'].split(','):
        redists = ['%s/%s/shogun/%s.redist.%s.txt'
                   % (taxonomy_dir, sample, sample, level)
                   for sample in samples]
        pandas2biom('%s/shogun/combined_redist.%s.biom'
                    % (taxonomy_dir, level),
                    combine_profiles(zip(samples, redists)))

SnakeMake From line 406 of taxonomy/taxonomy.rule

    shell:
        "rm -rf " + raw_dir

rule clean_qc:
    shell:
        "rm -rf " + qc_dir

rule clean_assemble:
    shell:
        "rm -rf " + assemble_dir

rule clean_map:
    shell:
        "rm -rf " + map_dir

rule clean_bin:
    shell:
        "rm -rf " + bin_dir

rule clean_anvio:
    shell:
        "rm -rf " + anvio_dir

rule clean:
    shell:
        "rm -rf " +
            raw_dir +
        " " + 
        "results"