Benchmarking adapter and quality trimming tools

public public 1yr ago 0 bookmarks

Benchmarking adapter and quality trimming tools

Original implementation by Brian Bushnell (2014): http://seqanswers.com/forums/showthread.php?t=42776

The following tools are compared:

  • bbduk

  • cutadapt

  • fastp

  • trimmomatic

Fake adapters, "gruseq"

The fake truseq adapters, "gruseq", provided by Brian Bushnell, downloaded from: http://seqanswers.com/forums/attachment.php?attachmentid=2993&d=1398383571

Test data

A single sample is downloaded from SRA. Feel free to replace it with whatever you want.

Running

Run the benchmarking workflow with snakemake --use-conda --jobs 10

Code Snippets

 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
__author__ = "Fredrik Boulund"
__date__ = "2019"

from sys import argv, exit
from pathlib import Path
import argparse

import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt


parser = argparse.ArgumentParser(description=f"{__doc__}. {__author__} (c) {__date__}")
parser.add_argument("benchmarks", metavar="FILE", nargs="+", help="Benchmark output(s)")
parser.add_argument("--output", default="plot.pdf", 
        help="Plot output. Will produce a png variant as well")

if len(argv) < 2:
    parser.print_help()
    exit()

args = parser.parse_args()

dfs = []
for benchmark in args.benchmarks:
    tool = Path(benchmark).stem.split(".", maxsplit=1)[0]
    df = pd.read_csv(benchmark, sep="\t")
    df["Tool"] = tool
    dfs.append(df)

table = pd.concat(dfs)


fig, (ax1, ax2) = plt.subplots(1,2, figsize=(10,5))

ax1.set_title("Average time")
ax1.set_ylabel("seconds")
table\
    .groupby("Tool")\
    .mean()["s"]\
    .plot(kind="bar", ax=ax1)

ax2.set_title("Average max_vms")
ax2.set_ylabel("Megabytes")
table\
    .groupby("Tool")\
    .mean()["max_vms"]\
    .plot(kind="bar", ax=ax2)


fig.savefig(args.output, bbox_inches="tight")
fig.savefig(args.output.replace(".pdf", ".png"), bbox_inches="tight")
27
28
29
30
shell:
    """
    wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR921/004/SRR9218144/SRR9218144.fastq.gz
    """
40
41
42
43
44
45
46
47
48
49
50
shell:
    """
    addadapters.sh \
        in={input.fastq} \
        out={output} \
        qout=33 \
        ref={input.adapters} \
        right \
        int=f \
        2> {log}
    """
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
shell:
    """
    cutadapt \
        --cores {threads} \
        --minimum-length 10 \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATATGATACTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATAACTGCGTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATAGGTCCATGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATAGCTAATTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATATATCGCTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATACAATTGTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATAATCTGATGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATATAGGCTTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATACTGATCTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATAGTCAGGTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATACCAGTATGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATAAGGCGTTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATATCGATTATTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATATCGGAACGTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATATGCGATCTTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATAAACGAAACTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATACGAACATATGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATACGCTTTACTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATACGCCAAGGTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATACGGGACCTTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATAACGTACGTTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATACTCGCCTGTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATATAGCTGTGTGAGACGTGCAACGAGGAGCAGGC \
        --anywhere CTGACCTTCTCATATACGAGCTTAGAATCGATATGGAAGGGTGAGACGTGCAACGAGGAGCAGGC \
        {input.fastq} \
        > {output} \
        2> {log}
    """
108
109
110
111
112
113
114
115
116
117
118
119
shell:
    """
    trimmomatic \
        SE \
        -phred33 \
        -threads {threads} \
        {input.fastq} \
        {output} \
        ILLUMINACLIP:gruseq.fa:2:28:10 \
        MINLEN:10 \
        2> {log}
    """
133
134
135
136
137
138
139
140
141
142
143
144
145
shell:
    """
    bbduk.sh \
        in={input.fastq} \
        out={output} \
        ref={input.adapters} \
        ktrim=r \
        mink=12 \
        hdist=1 \
        minlen=10 \
        threads={threads} \
        2> {log}
    """
160
161
162
163
164
165
166
167
168
169
170
171
shell:
    """
    fastp \
        --in1 {input.fastq} \
        --out1 {output.fq} \
        --adapter_fasta {input.adapters} \
        --thread {threads} \
        --html {output.html} \
        --json {output.json} \
        --length_required 10 \
        2> {log}
    """
179
180
181
182
183
184
185
shell:
    """
    addadapters.sh \
        in={input} \
        grade \
        2> {output}
    """
SnakeMake From line 179 of master/Snakefile
195
196
197
198
199
200
201
202
203
shell:
    """
    scripts/plot_benchmarks.py \
        --output {output.benchmarks} \
        {input.benchmarks}
    scripts/plot_grades.py \
        --output {output.grades} \
        {input.grades}
    """
SnakeMake From line 195 of master/Snakefile
ShowHide 4 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/boulund/adapter_benchmark
Name: adapter_benchmark
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...