EukRecover: Pipeline to recover eukaryotic MAGs using CONCOCT, metaBAT2 and EukCC's merging algorythm.

public public 1yr ago Version: Version 1 0 bookmarks

EukRecover

Pipeline to recover eukaryotic MAGs using CONCOCT, metaBAT2 and EukCC's merging algorythm.

Needs paired end shotgun metagenomic reads.

Environment

Eukrecover requires an environment with snakemake and metaWRAP.

Quickstart

Define your samples in the file samples.csv . This file needs to have the columns project and run to identify each metagenome.

This pipeline does not support co-binning, but feel free to change it.

Clone this repro wherever you want to run the pipeline:

git clone https://github.com/openpaul/eukrecover/

You can then run the snakemake like so

snakemake --use-singularity

The pipeline used dockerhub to fetch all tools, so make sure you have singularity installed.

Prepare databases

The pipeline will setup databases for you, but if you already have a EukCC or a BUSCO 5 database you can use them by specifying the location in the file config/config.yaml

Output:

In the folder results you will find a folder MAGs which will contain a folder fa containing the actual MAG fastas. In addition you will find stats for each MAG in the table QC.csv .

This table contains the following columns:

name,eukcc_compl,eukcc_cont,BUSCO_C,BUSCO_M,BUSCO_D,BUSCO_F,BUSCO_tax,N50,bp

Code Snippets

28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
shell:
    """
    gunzip -c {input.fq1} > {output.fq1}
    gunzip -c {input.fq2} > {output.fq2}

    metawrap binning \
        -o {params.out} \
        -t {threads} \
        -l {config[mincontiglength]} \
        -m {resources.mem_GB} \
        -a {input.scaffolds} \
        --concoct --metabat2 --maxbin2 {output.fq1} {output.fq2}

    # rename bins with name
    cd {params.out}concoct_bins
    for f in *.fa ; do mv -- "$f" "{wildcards.project}_{wildcards.sample}_concoct_$f" || true ; done
    cd {params.out}metabat2_bins
    for f in *.fa ; do mv -- "$f" "{wildcards.project}_{wildcards.sample}_metabat2_$f" || true ; done
    cd {params.out}maxbin2_bins
    for f in *.fa ; do mv -- "$f" "{wildcards.project}_{wildcards.sample}_maxbin2_$f" || true ; done

    # remove unbinned bins
    cd {params.out}
    mkdir unbinned
    mv *_bins/*unbinned.fa unbinned/ || true

    # remove temp files
    cd {params.out}
    rm -r work_files
    """
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
shell:
    """
    #rm -r {params.out}
    #mkdir -p {params.out}
    metawrap bin_refinement -o {params.out} \
        -A {input.concoct} -B {input.metabat2} \
        -m {resources.mem_GB} \
        -t {threads}\
        -c 50 -x 10 \
        --quick  || true

    mv {params.d} {output.out}
    cd {params.out}metawrap_bins
    for f in *.fa ; do mv -- "$f" "{wildcards.project}_{wildcards.sample}_metawrap_$f" || true ; done
    """
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
shell:
    """
    mkdir -p {params.d}
    cd {params.d}
    wget -r -np -R "index.html*"  https://busco-data.ezlab.org/v5/data/
    mv busco-data.ezlab.org/v5/data/* .
    rm -r busco-data.ezlab.org
    #extract all gzipped files
    cd lineages
    find . -type f -iname "*tar.gz" -exec rm "{{}}" \;
    cd ../placement_files
    find . -type f -iname "*tar.gz" -exec rm "{{}}" \;
    cd ../information
    find . -type f -iname "*tar.gz" -exec rm "{{}}" \;
    """
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
shell:
    """
    FNA=$(realpath {input.fa})
    DB=$(realpath {input.db})
    mkdir -p {params.d}
    if [ ! -z "$(ls -A {params.d})" ]; then
      echo "Folder not empty. Deleting"
      rm -rf {params.d}
    fi
    mkdir -p {params.d}
    cd {params.d}
    busco  --offline \
          -i $FNA \
          -m genome \
          -o out \
          --auto-lineage-euk \
          --download_path $DB \
          -c {threads} || touch out/short_summary.specific_failed.out.txt

    cp out/short_summary.specific*.out.txt short_summary.specific.txt
    rm -r out
    """
25
26
27
28
29
30
31
32
33
34
35
36
37
shell:
    """
      minimap2 -ax sr -t {threads} {input.fasta} {input.fq1} {input.fq2}  | 
              samtools view -q 20 -Sb - | \
              samtools sort -@ {threads} -O bam - -o {output.bam}
              samtools index {output.bam}

      # cmseq
      breadth_depth.py \
          --combine \
          --mincov {params.mincov} \
          {output.bam} > {output.cov}
    """
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
shell:
    """
    mkdir -p {params.out}
    rm -r  {params.out}
    mkdir -p {params.out}
    BD=$(pwd)
    # prepare drep quality file
    echo "genome,completeness,contamination" > {output.quality}
    cat {input.concoct_eukcc}  {input.metabat2_eukcc}|\
        awk '{{if($2 - 5*$3 >=50){{print $0}}}}' |\
        sort -k 2,3 -n |\
        tr '\\t' ',' >> {output.quality}
    if [[ $(wc -l <{output.quality}) -lt 2  ]];
    then
        touch {output.quality}
        touch {output.genomes}
        mkdir -p {output.input_bins} || true
        mkdir -p {output.bins} || true
        exit 0
    fi

    cat {output.quality}
    mkdir {output.input_bins}
    cd {output.input_bins}
    ln -s {input.concoct}/*.fa . || true
    ln -s {input.concoct_merged}/*.fa . || true
    ln -s {input.metabat2}/*.fa . || true
    ln -s {input.metabat2_merged}/*.fa . || true
    ls
    ls > ../genomes.txt
    cd ..
    cat quality.csv | grep -v "genome,completeness,contaminatio" |\
        cut -f 1 -d , > .f
    grep -f .f genomes.txt | sed -e 's/^/input_genomes\//'  > .g
    mv .g genomes.txt
    rm .f
    cd $BD

    mkdir -p {params.out}
    cd {params.out}

    cat genomes.txt
    if [[ $(wc -l <genomes.txt) -lt 2  ]];
    then
        echo "No genomes"
        cp -r input_genomes dereplicated_genomes
    else
      dRep dereplicate -p {threads} \
          . -g genomes.txt \
          -pa 0.80 -sa 0.99 -nc 0.40 \
          -cm larger \
          --genomeInfo quality.csv\
          -comp 49 -con 21
    fi
    """
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
shell:
    """
    mkdir -p {params.out}
    rm -r  {params.out}
    mkdir -p {params.out}
    BD=$(pwd)
    cat {input.genomes} | sed -e 's/^/..\/raw\//'  > {output.genomes}

    mkdir -p {params.out}
    cd {params.out}

    N=$(wc -l genomes.txt) 
    if [[ $(wc -l <genomes.txt) -lt 2  ]];
    then
        cp -r {input.fa} dereplicated_genomes
    else
      dRep dereplicate -p {threads} \
          . -g genomes.txt \
          -pa 0.80 -sa 0.95 -nc 0.40 \
          -cm larger \
          --genomeInfo {input.quality}\
          -comp 49 -con 21
    fi
    ls dereplicated_genomes > MAGs.txt
    """
21
22
23
24
25
26
27
shell:
    """
    minimap2 -ax sr -t {threads} {input.scaffolds} {input.fq1} {input.fq2}  | 
            samtools view -q 20 -Sb - | \
            samtools sort -@ {threads} -O bam - -o {output.bam}
            samtools index {output.bam}
    """
43
44
45
46
47
48
49
50
51
52
shell:
    """
    # if no bins
    if [[ $( ls {input.bindir}/*.fa | wc -l ) -lt 1 ]]; then
      touch {output.links}
    else
      python3 scripts/binlinks.py  --ANI 99 --within 1500 \
          --out {output.links} {input.bindir} {input.bam}
    fi
    """
64
65
66
67
68
69
shell:
    """
    wget -O {params.tar} http://ftp.ebi.ac.uk/pub/databases/metagenomics/eukcc/eukcc2_db_ver_1.1.tar.gz
    cd {params.d}
    tar -xzvf eukcc2_db_ver_1.1.tar.gz
    """
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
shell:
    """
    rm -r {params.out}workdir || true
    # if no bins
    if [[ $( ls {input.bindir} | wc -l ) -lt 1 ]]; then
      touch {output.csv}
      mkdir -p {output.bins}
      exit 0
    fi

    eukcc --debug folder \
        --improve_percent 10 \
        --n_combine 1 \
        --threads {threads} \
        --improve_ratio     5 \
        --links {input.links} \
        --min_links 100 \
        --suffix .fa \
        --db {input.db} \
        --out {params.out} \
        --prefix {wildcards.project}_{wildcards.sample}_{wildcards.binner}_merged. \
        {input.bindir}
    rm -r {params.out}/refine_workdir || true
    """
131
132
133
134
135
136
137
138
139
140
shell:
    """
    rm -r {params.out}workdir || true
    eukcc --debug single \
        --threads {threads} \
        --db {input.db} \
        --out {params.out} \
        {input.fa}
    cp {output.tsv} {output.csv}
    """
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
import pysam
from Bio import SeqIO
from collections import defaultdict
import os
import argparse
import logging
import csv


def is_in(read, contig_map, within=1000):
    if read.reference_name not in contig_map.keys():
        return False
    if read.reference_start <= within or read.reference_end <= within:
        return True
    elif read.reference_start > (
        contig_map[read.reference_name] - within
    ) or read.reference_end > (contig_map[read.reference_name] - within):
        return True
    else:
        return False


def keep_read(read, contig_map, within=1000, min_ANI=98, min_cov=0):
    ani = (
        (read.query_alignment_length - read.get_tag("NM"))
        / float(read.query_alignment_length)
        * 100
    )
    cov = read.query_alignment_length / float(read.query_length) * 100

    if ani >= min_ANI and cov >= min_cov and is_in(read, contig_map, within) is True:
        return True
    else:
        return False


def contig_map(bindir, suffix=".fa"):
    m = {}
    for f in os.listdir(bindir):
        if f.endswith(suffix) is False:
            continue
        path = os.path.join(bindir, f)
        with open(path, "r") as handle:
            for record in SeqIO.parse(handle, "fasta"):
                m[record.name] = len(record.seq)
    return m


def bin_map(bindir, suffix=".fa"):
    contigs = defaultdict(str)
    contigs_per_bin = defaultdict(int)
    for f in os.listdir(bindir):
        if f.endswith(suffix) is False:
            continue
        path = os.path.join(bindir, f)
        binname = os.path.basename(f)
        with open(path, "r") as handle:
            for record in SeqIO.parse(handle, "fasta"):
                contigs[record.name] = binname
                contigs_per_bin[binname] += 1
    return contigs, contigs_per_bin


def read_pair_generator(bam):
    """
    Generate read pairs in a BAM file or within a region string.
    Reads are added to read_dict until a pair is found.
    From: https://www.biostars.org/p/306041/
    """
    read_dict = defaultdict(lambda: [None, None])
    for read in bam.fetch():
        if not read.is_paired or read.is_secondary or read.is_supplementary:
            continue
        qname = read.query_name
        if qname not in read_dict:
            if read.is_read1:
                read_dict[qname][0] = read
            else:
                read_dict[qname][1] = read
        else:
            if read.is_read1:
                yield read, read_dict[qname][1]
            else:
                yield read_dict[qname][0], read
            del read_dict[qname]


def main():
    # set arguments
    # arguments are passed to classes
    parser = argparse.ArgumentParser(
        description="Evaluate completeness and contamination of a MAG."
    )
    parser.add_argument("bindir", type=str, help="Run script on these bins")
    parser.add_argument(
        "bam",
        type=str,
        help="Bam with allr eads aligned against all contigs making up the bins",
    )
    parser.add_argument(
        "--out",
        "-o",
        type=str,
        required=False,
        help="Path to output table (Default: links.csv)",
        default="links.csv",
    )
    parser.add_argument(
        "--ANI", type=float, required=False, help="ANI of matching read", default=99
    )
    parser.add_argument(
        "--within",
        type=int,
        required=False,
        help="Within this many bp we need the read to map",
        default=1000,
    )
    parser.add_argument(
        "--contigs",
        "-c",
        action="store_true",
        default=False,
        help="Instead of bins print contigs",
    )
    parser.add_argument(
        "--quiet",
        "-q",
        dest="quiet",
        action="store_true",
        default=False,
        help="Silcence most output",
    )
    parser.add_argument(
        "--debug",
        "-d",
        action="store_true",
        default=False,
        help="Debug and thus ignore safety",
    )
    args = parser.parse_args()

    # define logging
    logLevel = logging.INFO
    if args.quiet:
        logLevel = logging.WARNING
    elif args.debug:
        logLevel = logging.DEBUG
    logging.basicConfig(
        format="%(asctime)s %(message)s",
        datefmt="%d-%m-%Y %H:%M:%S: ",
        level=logLevel,
    )

    bindir = args.bindir
    samfile = pysam.AlignmentFile(args.bam, "rb")

    cm = contig_map(bindir)
    bm, contigs_per_bin = bin_map(bindir)

    link_table = defaultdict(lambda: defaultdict(int))
    bin_table = defaultdict(lambda: defaultdict(int))

    # generate link table
    logging.info("Parsing Bam file. This can take a few moments")
    for read, mate in read_pair_generator(samfile):
        if keep_read(read, cm, args.within, min_ANI=args.ANI) and keep_read(
            mate, cm, args.within, min_ANI=args.ANI
        ):
            # fill in the table
            link_table[read.reference_name][mate.reference_name] += 1
            if read.reference_name != mate.reference_name:
                link_table[mate.reference_name][read.reference_name] += 1

    # generate bin table
    for contig_1, dic in link_table.items():
        for contig_2, links in dic.items():
            bin_table[bm[contig_1]][bm[contig_2]] += 1
    out_data = []
    logging.debug("Constructing output dict")
    if args.contigs:
        for contig_1, linked in link_table.items():
            for contig_2, links in linked.items():
                out_data.append(
                    {
                        "bin_1": bm[contig_1],
                        "bin_2": bm[contig_2],
                        "contig_1": contig_1,
                        "contig_2": contig_2,
                        "links": links,
                        "bin_1_contigs": contigs_per_bin[bm[contig_1]],
                        "bin_2_contigs": contigs_per_bin[bm[contig_2]],
                    }
                )
    else:
        for bin_1, dic in bin_table.items():
            for bin_2, links in dic.items():
                out_data.append({"bin_1": bin_1, "bin_2": bin_2, "links": links})
    # results
    logging.info("Writing output")
    with open(args.out, "w") as fout:
        cout = csv.DictWriter(fout, fieldnames=list(out_data[0].keys()))
        cout.writeheader()
        for row in out_data:
            cout.writerow(row)


if __name__ == "__main__":
    main()
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
import os
import argparse
import subprocess
import logging
import tempfile
import gzip
import glob
import csv


# backup fasta handler, so we can use readonly directories
class fa_class:
    def __init__(self, seq, name, long_name):
        self.seq = seq
        self.name = name
        self.long_name = long_name

    def __str__(self):
        return self.seq

    def __len__(self):
        return len(self.seq)


def Fasta(path):
    """
    Iterator for fasta files
    """
    entry = False

    with open(path) as fin:
        for line in fin:
            if line.startswith(">"):
                if entry is not False:
                    entry.seq = "".join(entry.seq)
                    yield entry
                # define new entry
                long_name = line.strip()[1:]
                name = long_name.split()[0]
                entry = fa_class([], name, long_name)
            else:
                entry.seq.append(line.strip())
        # yield last one
        entry.seq = "".join(entry.seq)
        yield entry


def N50(l, x=0.5):
    l = sorted(l)
    l.reverse()
    t = sum(l) * x
    for i, x in enumerate(l):
        if i == 0:
            cs = x
        else:
            cs = cs + x
        if cs >= t:
            return x


def genome_stats(path):
    lengths = [len(x) for x in Fasta(path)]
    return {"N50": N50(lengths), "bp": sum(lengths), "contigs": len(lengths)}


def eukcc_score(path):
    with open(path) as fin:
        for row in csv.DictReader(fin, delimiter="\t"):
            return row


def busco_score(path):
    d = {
        "BUSCO_C": None,
        "BUSCO_M": None,
        "BUSCO_D": None,
        "BUSCO_S": None,
        "BUSCO_F": None,
    }

    with open(path) as fin:
        for line in fin:
            if line.strip().startswith("C:"):
                elem = line.strip().replace("[", ",").replace("]", "").split(",")
                for e in elem:
                    x = e.split(":")
                    d["BUSCO_{}".format(x[0])] = x[1].replace("%", "")
                # this is the last info we need
                break
            if line.strip().startswith("# The lineage dataset is: "):
                d["BUSCO_lineage"] = line.strip().split()[5]
    return d


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--output", help="path for the output table", default="qc.csv", type=str
    )
    parser.add_argument("--mags", help="path to mag folder", type=str)
    parser.add_argument("--qc_dir", help="path to qc folder", type=str)
    parser.add_argument(
        "--rerun",
        action="store_true",
        help="rerun even if output exists",
        default=False,
    )
    parser.add_argument(
        "--quiet", action="store_true", help="supress information", default=False
    )
    parser.add_argument(
        "--debug", action="store_true", help="Make it more verbose", default=False
    )

    args = parser.parse_args()

    # define logging
    logLevel = logging.INFO
    if args.quiet:
        logLevel = logging.WARNING
    elif args.debug:
        logLevel = logging.DEBUG
    logging.basicConfig(
        format="%(asctime)s %(message)s", datefmt="%d/%m/%Y %H:%M:%S: ", level=logLevel
    )

    # find mags
    mags = glob.glob("{}/*.fa".format(args.mags))

    # for each mag learn eukcc and BUSCO values
    # also determine N50
    #
    fields = [
        "fasta",
        "completeness",
        "contamination",
        "BUSCO_C",
        "BUSCO_M",
        "BUSCO_D",
        "BUSCO_S",
        "BUSCO_F",
        "BUSCO_n",
        "BUSCO_lineage",
        "N50",
        "bp",
        "contigs",
    ]

    with open(args.output, "w") as outfile:
        cout = csv.DictWriter(outfile, fieldnames=fields, extrasaction="ignore")
        cout.writeheader()

        for mag in mags:
            name = os.path.basename(mag).replace(".fa", "")
            # learn eukcc score
            eukcc_p = os.path.join(args.qc_dir, "eukcc", name, "eukcc.csv")
            busco_p = os.path.join(
                args.qc_dir, "busco", name, "short_summary.specific.txt"
            )
            stat = genome_stats(mag)
            eukcc = eukcc_score(eukcc_p)
            stat = {**stat, **eukcc}
            stat["fasta"] = os.path.basename(mag)
            busco = busco_score(busco_p)
            stat = {**stat, **busco}
            cout.writerow(stat)
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import argparse
import sys
import random
import re
bases = {"R": ["A", "G"],
         "Y": ["C", "T"],
         "K": ["G", "T"],
         "M": ["A", "C"],
         "S": ["C", "G"],
         "W": ["A", "T"],
         "B": ["C", "G", "T"],
         "D": ["A", "G", "T"],
         "H": ["A", "C", "T"],
         "V": ["A", "C", "G"],
         "N": ["A", "G", "C", "T"]
         }

dna = re.compile("[^AGCT]{1}")

vbases = ["A","C","G","T"]

def ren_fasta(args):
    n = 0
    for line in open(args.fasta_file, "r"):
        if line[0] == ">":
            name = line.strip("\n").replace(">","")
            n += 1
            if args.clean and args.prefix is not None:
                print(">%s_%i" % (args.prefix, n))
            elif args.clean and args.prefix is None:
                print(">%s" % (name.split()[0]))
            else:
                print(">%s_%i\t%s" % (args.prefix, n, name))
        else:
            print(line.strip("\n"))

if __name__ == '__main__':
        parser = argparse.ArgumentParser(description='Rename multifasta file')
        parser.add_argument('-f', dest='fasta_file', help='Input FASTA file')
        parser.add_argument('-p', dest='prefix', help='Header prefix', default=None)
        parser.add_argument('--clean', help="Retain only new header (default: False)", default=False, action="store_true")
        if len(sys.argv) == 1:
            parser.print_help()
            sys.exit()
        else:
            args = parser.parse_args()
            ren_fasta(args) 
108
109
110
111
112
shell:
    """
    mkdir -p {output}
    cp {input} {output}
    """
SnakeMake From line 108 of main/Snakefile
119
120
121
122
shell:
    """
    touch {output}
    """
SnakeMake From line 119 of main/Snakefile
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
run:
    import shutil
    import glob
    import os
    os.makedirs(output.fa, exist_ok=True)
    with open(output.genomes, "w") as fout:
      for folder in input.dirs:
        for fa in glob.glob("{}/*.fa".format(folder)):
            shutil.copy(fa, output.fa)
            fout.write("{}\n".format(os.path.basename(fa)))
    with open(output.quality, "w") as fout:
        cout = csv.DictWriter(fout, fieldnames=['genome','completeness', 'contamination'])
        cout.writeheader()
        for f in input.quality:
            with open(f) as fin:
                for row in csv.DictReader(fin):
                    cout.writerow(row)
SnakeMake From line 133 of main/Snakefile
156
157
158
159
160
161
shell:
    """
    python3 scripts/rename_multifasta_prefix.py --clean \
        -p {wildcards.MAG} \
        -f {input} > {output}
    """
SnakeMake From line 156 of main/Snakefile
200
201
202
203
204
205
206
shell:
    """
    python3 scripts/create_qc_table.py \
        --mags {input.raw} \
        --qc_dir {wildcards.data}/qc \
        --output {output.csv}
    """
SnakeMake From line 200 of main/Snakefile
ShowHide 14 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/EBI-Metagenomics/eukrecover.git
Name: eukrecover
Version: Version 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...