Snakemake workflow implementing Accel Amplicon (Swift) trimming guidelines.

public public 1yr ago 0 bookmarks

This workflow performs read trimming on Accel Amplicon Panel data, using the recommended guidelines provided by Swift

Authors

Patrik Smeds (@smeds)

Requirements

Samples have to be Paired-End Illumina sequences.

Usage

Step 1: Install workflow

If you simply want to use this workflow, download and extract the latest release. If you intend to modify and further develop this workflow, fork this repository. Please consider providing any generally applicable modifications via a pull request.

In any case, if you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository and, once available, its DOI.

Step 2: Configure workflow

Configure the workflow according to your needs by editing the config.yaml and the sample sheet samples.tsv.

Config.yaml

The following entries are expected in the config file

illuminaclip_file: /path/to/illumina.fa
accel_panels: panel1: 5p_primer_file: /path/to/5p_primers.fa 3p_primer_file: /path/to/3p_primers.fa panel2: 5p_primer_file: /path/to/5p_primers.fa 3p_primer_file: /path/to/3p_primers.fa

Sample.tsv

Example of a sample.tsv file (columns need to be tab separated)

sample panel
sample1 panel1
sample2 panel2

The panel column must contain a panel name that can be found in the accel_panels entry in the config.yaml

unit.tsv

Example of a units.tsv file (columns need to be tab separated)

sample unit fq1 fq2
sample1 lane1 /path/to/sample1.R1.fastq /path/to/sample1.R2.fastq
sample2 lane1 /path/to/sample2.R1.fastq /path/to/sample2.R2.fastq

There cane be multiple lanes or replicates for the same sample.

Step 3: Execute workflow

Test your configuration by performing a dry-run via

snakemake -n

Execute the workflow locally via

snakemake --cores $N

using $N cores or run it in a cluster environment via

snakemake --cluster qsub --jobs 100

or

snakemake --drmaa --jobs 100

See the Snakemake documentation for further details.

Code Snippets

27
28
29
30
31
run:
   if input[0].endswith("gz"):
      shell("zcat {input} > {output}")
   else:
      shell("cat {input} > {output}")
42
43
44
45
46
run:
  import subprocess, os
  lines = int(float(subprocess.run("wc -l " + str(input[0]) + " |  awk '{print($1/4)}'", stdout=subprocess.PIPE,shell=True).stdout.decode('utf-8').rstrip("\n")))
  storage.store(wildcards.sample + "." + wildcards.unit + "." + wildcards.read + ".var",str(lines))
  shell("echo 'reads: '" + str(lines) + "'' > "  + output[0])
61
62
63
64
65
66
67
68
69
70
run:
  import math
  num_reads = int(storage.fetch(wildcards.sample + "." + wildcards.unit + "." + wildcards.read + ".var"))
  num_split = _accel_get_num_splits(config)
  lines_per_file = 4*math.ceil(num_reads / num_split)
  shell('cat {input[0]} | awk \'BEGIN{{ file = 0; filename = sprintf("{params.output_prefix}%.04d{params.output_suffix}", file) }}{{ print > filename}} NR % {lines_per_file} == 0 {{ close(filename); file = file + 1; filename = sprintf("{params.output_prefix}%.04d{params.output_suffix}",file)}}\'')
  num_files_generated = 4*math.floor(num_reads / lines_per_file)
  while num_files_generated < num_split:
     shell("touch {params.output_prefix}%04d{params.output_suffix}" % num_split)
     num_split -= 1
93
94
wrapper:
    "0.17.4/bio/trimmomatic/pe"
128
129
wrapper:
    "0.17.4/bio/cutadapt/pe"
150
151
wrapper:
    "0.17.4/bio/cutadapt/pe"
177
178
wrapper:
    "0.17.4/bio/cutadapt/pe"
199
200
wrapper:
    "0.17.4/bio/cutadapt/pe"
220
221
222
223
224
run:
    with open(output.qc,"w") as out:
        for qc_file in input.qc:
            with open(qc_file,"r") as qc_input:
                out.write(qc_input.read())
237
238
239
240
241
run:
    with open(output.qc,"w") as out:
        for qc_file in input.qc:
            with open(qc_file,"r") as qc_input:
                out.write(qc_input.read())
254
255
256
257
258
run:
    with open(output.qc,"w") as out:
        for qc_file in input.qc:
            with open(qc_file,"r") as qc_input:
                out.write(qc_input.read())
271
shell: "mv {input} {output}"
283
shell: "zcat {input} | gzip > {output}"
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "[email protected]"
__license__ = "MIT"


from snakemake.shell import shell


n = len(snakemake.input)
assert n == 2, "Input must contain 2 (paired-end) elements."

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "cutadapt"
    " {snakemake.params}"
    " -o {snakemake.output.fastq1}"
    " -p {snakemake.output.fastq2}"
    " {snakemake.input}"
    " > {snakemake.output.qc} {log}")
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "[email protected]"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
trimmer = " ".join(snakemake.params.trimmer)

shell("trimmomatic PE {snakemake.params.extra} "
      "{snakemake.input.r1} {snakemake.input.r2} "
      "{snakemake.output.r1} {snakemake.output.r1_unpaired} "
      "{snakemake.output.r2} {snakemake.output.r2_unpaired} "
      "{trimmer} "
      "{log}")
ShowHide 13 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/clinical-genomics-uppsala/accel_amplicon_trimming
Name: accel_amplicon_trimming
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...