Snakemake workflow implementing Accel Amplicon (Swift) trimming guidelines.
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
This workflow performs read trimming on Accel Amplicon Panel data, using the recommended guidelines provided by Swift
Authors
Patrik Smeds (@smeds)
Requirements
Samples have to be Paired-End Illumina sequences.
Usage
Step 1: Install workflow
If you simply want to use this workflow, download and extract the latest release. If you intend to modify and further develop this workflow, fork this repository. Please consider providing any generally applicable modifications via a pull request.
In any case, if you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository and, once available, its DOI.
Step 2: Configure workflow
Configure the workflow according to your needs by editing the config.yaml and the sample sheet samples.tsv.
Config.yaml
The following entries are expected in the config file
illuminaclip_file: /path/to/illumina.fa
accel_panels: panel1: 5p_primer_file: /path/to/5p_primers.fa 3p_primer_file: /path/to/3p_primers.fa panel2: 5p_primer_file: /path/to/5p_primers.fa 3p_primer_file: /path/to/3p_primers.fa
Sample.tsv
Example of a sample.tsv file (columns need to be tab separated)
sample panel
sample1 panel1
sample2 panel2
The panel column must contain a panel name that can be found in the accel_panels entry in the config.yaml
unit.tsv
Example of a units.tsv file (columns need to be tab separated)
sample unit fq1 fq2
sample1 lane1 /path/to/sample1.R1.fastq /path/to/sample1.R2.fastq
sample2 lane1 /path/to/sample2.R1.fastq /path/to/sample2.R2.fastq
There cane be multiple lanes or replicates for the same sample.
Step 3: Execute workflow
Test your configuration by performing a dry-run via
snakemake -n
Execute the workflow locally via
snakemake --cores $N
using $N cores or run it in a cluster environment via
snakemake --cluster qsub --jobs 100
or
snakemake --drmaa --jobs 100
See the Snakemake documentation for further details.
Code Snippets
27 28 29 30 31 | run: if input[0].endswith("gz"): shell("zcat {input} > {output}") else: shell("cat {input} > {output}") |
42 43 44 45 46 | run: import subprocess, os lines = int(float(subprocess.run("wc -l " + str(input[0]) + " | awk '{print($1/4)}'", stdout=subprocess.PIPE,shell=True).stdout.decode('utf-8').rstrip("\n"))) storage.store(wildcards.sample + "." + wildcards.unit + "." + wildcards.read + ".var",str(lines)) shell("echo 'reads: '" + str(lines) + "'' > " + output[0]) |
61 62 63 64 65 66 67 68 69 70 | run: import math num_reads = int(storage.fetch(wildcards.sample + "." + wildcards.unit + "." + wildcards.read + ".var")) num_split = _accel_get_num_splits(config) lines_per_file = 4*math.ceil(num_reads / num_split) shell('cat {input[0]} | awk \'BEGIN{{ file = 0; filename = sprintf("{params.output_prefix}%.04d{params.output_suffix}", file) }}{{ print > filename}} NR % {lines_per_file} == 0 {{ close(filename); file = file + 1; filename = sprintf("{params.output_prefix}%.04d{params.output_suffix}",file)}}\'') num_files_generated = 4*math.floor(num_reads / lines_per_file) while num_files_generated < num_split: shell("touch {params.output_prefix}%04d{params.output_suffix}" % num_split) num_split -= 1 |
93 94 | wrapper: "0.17.4/bio/trimmomatic/pe" |
128 129 | wrapper: "0.17.4/bio/cutadapt/pe" |
150 151 | wrapper: "0.17.4/bio/cutadapt/pe" |
177 178 | wrapper: "0.17.4/bio/cutadapt/pe" |
199 200 | wrapper: "0.17.4/bio/cutadapt/pe" |
220 221 222 223 224 | run: with open(output.qc,"w") as out: for qc_file in input.qc: with open(qc_file,"r") as qc_input: out.write(qc_input.read()) |
237 238 239 240 241 | run: with open(output.qc,"w") as out: for qc_file in input.qc: with open(qc_file,"r") as qc_input: out.write(qc_input.read()) |
254 255 256 257 258 | run: with open(output.qc,"w") as out: for qc_file in input.qc: with open(qc_file,"r") as qc_input: out.write(qc_input.read()) |
271 | shell: "mv {input} {output}" |
283 | shell: "zcat {input} | gzip > {output}" |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | __author__ = "Julian de Ruiter" __copyright__ = "Copyright 2017, Julian de Ruiter" __email__ = "[email protected]" __license__ = "MIT" from snakemake.shell import shell n = len(snakemake.input) assert n == 2, "Input must contain 2 (paired-end) elements." log = snakemake.log_fmt_shell(stdout=False, stderr=True) shell( "cutadapt" " {snakemake.params}" " -o {snakemake.output.fastq1}" " -p {snakemake.output.fastq2}" " {snakemake.input}" " > {snakemake.output.qc} {log}") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "[email protected]" __license__ = "MIT" from snakemake.shell import shell extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) trimmer = " ".join(snakemake.params.trimmer) shell("trimmomatic PE {snakemake.params.extra} " "{snakemake.input.r1} {snakemake.input.r2} " "{snakemake.output.r1} {snakemake.output.r1_unpaired} " "{snakemake.output.r2} {snakemake.output.r2_unpaired} " "{trimmer} " "{log}") |
Support
- Future updates