Illumina Quality Control and Filtering Workflow for High-Quality Data Processing and Taxonomic Classification

public public 1yr ago Version: Version 1 0 bookmarks

Workflow for Illumina Quality Control and Filtering

Multiple paired datasets will be merged into single paired dataset.

  • FastQC on raw data files
  • fastp for read quality trimming
  • BBduk for phiX and (optional) rRNA filtering
  • Kraken2 for taxonomic classification of reads (optional)
  • BBmap for (contamination) filtering using given references (optional)
  • FastQC on filtered (merged) data

Code Snippets

16
17
18
19
baseCommand: [pigz, -c]

arguments:
  - valueFrom: $(inputs.inputfile)
CWL From line 16 of bash/pigz.cwl
23
baseCommand: [ bbduk.sh ]
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
baseCommand: [bbmap.sh]

arguments:
  - "-Xmx$(inputs.memory)M"
  - "printunmappedcount"
  - "overwrite=true"
  - "bloom=t"
  - "statsfile=$(inputs.identifier)_BBMap_stats.txt"
  - "covstats=$(inputs.identifier)_BBMap_covstats.txt"
  - |
    ${
      if (inputs.output_mapped){
        return 'outm1='+inputs.identifier+'_filtered_1.fq.gz \
                outm2='+inputs.identifier+'_filtered_2.fq.gz';
      } else {
        return 'outu1='+inputs.identifier+'_filtered_1.fq.gz \
                outu2='+inputs.identifier+'_filtered_2.fq.gz';
      }
    }
  # - "fast"
  # - "minratio=0.9"
  # - "maxindel=3"
  # - "bwr=0.16"
  # - "bw=12"
  # - "minhits=2"
  # - "qtrim=r"
  # - "trimq=10"
  # - "untrim"
  # - "idtag"
  # - "kfilter=25"
  # - "maxsites=1"
  # - "k=14"
  # - "nodisk=t"
  # - "out=$(inputs.identifier)_BBMap.sam"
  # - "rpkm=$(inputs.identifier).rpkm"
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
- entryname: script.sh
  entry: |-
    #!/bin/bash
    echo -e "\
    #/usr/bin/python3
    import sys\n\
    headers = set()\n\
    c = 0\n\
    for line in sys.stdin:\n\
      splitline = line.split()\n\
      if line[0] == '>':    \n\
        if splitline[0] in headers:\n\
          c += 1\n\
          print(splitline[0]+'.x'+str(c)+' '+' '.join(splitline[1:]))\n\
        else:\n\
          print(line.strip())\n\
        headers.add(splitline[0])\n\
      else:\n\
        print(line.strip())" > ./dup.py
    out_name=$1
    shift

    if file $@ | grep gzip; then
      zcat $@ | python3 ./dup.py | gzip > $out_name
    else
      cat $@ | python3 ./dup.py | gzip > $out_name
    fi
137
baseCommand: [ fastp ]
6
baseCommand: [ fastqc ]
23
24
25
26
27
28
29
- entryname: script.sh
  entry: |-
    #!/bin/bash
    outname=$1
    longreads=$2
    shift;shift;
    filtlong $longreads $@ 2> >(tee -a $outname.filtlong.log>&2) | gzip > $outname.fastq.gz
6
baseCommand: [ kraken2 ]
15
baseCommand: [ ktImportTaxonomy ]
19
20
21
22
23
24
25
26
27
28
29
- entryname: script.sh
  entry: |-
    #!/bin/bash
    #   $1 = mapped/unmapped (-F -f)
    # 1 $2 = ref
    # 2 $3 = fastq
    # 3 $4 = preset (map-ont)
    # 4 $5 = threads
    # 5 $6 = identifier

    minimap2 -a -t $5 -x $4 $2 $3 | samtools fastq -@ $5 -n $1 4 | pigz -p $5 > $6_filtered.fastq.gz
27
baseCommand: [ NanoPlot ]
ShowHide 3 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://gitlab.com/m-unlock/cwl/-/blob/master/cwl/workflows/workflow_illumina_quality.cwl
Name: workflow-for-illumina-quality-control-and-filterin
Version: Version 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...