Illumina Quality Control and Filtering Workflow for High-Quality Data Processing and Taxonomic Classification

public 1yr ago Version: Version 1 0 bookmarks

View Workflow

workflow-for-illumina-quality-control-and-filterin — View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Workflow for Illumina Quality Control and Filtering

Multiple paired datasets will be merged into single paired dataset.

FastQC on raw data files
fastp for read quality trimming
BBduk for phiX and (optional) rRNA filtering
Kraken2 for taxonomic classification of reads (optional)
BBmap for (contamination) filtering using given references (optional)
FastQC on filtered (merged) data

Code Snippets

baseCommand: [pigz, -c]

arguments:
  - valueFrom: $(inputs.inputfile)

CWL From line 16 of bash/pigz.cwl

baseCommand: [ bbduk.sh ]

CWL BBMap From line 23 of bbmap/bbduk_filter.cwl

baseCommand: [bbmap.sh]

arguments:
  - "-Xmx$(inputs.memory)M"
  - "printunmappedcount"
  - "overwrite=true"
  - "bloom=t"
  - "statsfile=$(inputs.identifier)_BBMap_stats.txt"
  - "covstats=$(inputs.identifier)_BBMap_covstats.txt"
  - |
    ${
      if (inputs.output_mapped){
        return 'outm1='+inputs.identifier+'_filtered_1.fq.gz \
                outm2='+inputs.identifier+'_filtered_2.fq.gz';
      } else {
        return 'outu1='+inputs.identifier+'_filtered_1.fq.gz \
                outu2='+inputs.identifier+'_filtered_2.fq.gz';
      }
    }
  # - "fast"
  # - "minratio=0.9"
  # - "maxindel=3"
  # - "bwr=0.16"
  # - "bw=12"
  # - "minhits=2"
  # - "qtrim=r"
  # - "trimq=10"
  # - "untrim"
  # - "idtag"
  # - "kfilter=25"
  # - "maxsites=1"
  # - "k=14"
  # - "nodisk=t"
  # - "out=$(inputs.identifier)_BBMap.sam"
  # - "rpkm=$(inputs.identifier).rpkm"

CWL BBMap From line 90 of bbmap/bbmap_filter-reads.cwl

- entryname: script.sh
  entry: |-
    #!/bin/bash
    echo -e "\
    #/usr/bin/python3
    import sys\n\
    headers = set()\n\
    c = 0\n\
    for line in sys.stdin:\n\
      splitline = line.split()\n\
      if line[0] == '>':    \n\
        if splitline[0] in headers:\n\
          c += 1\n\
          print(splitline[0]+'.x'+str(c)+' '+' '.join(splitline[1:]))\n\
        else:\n\
          print(line.strip())\n\
        headers.add(splitline[0])\n\
      else:\n\
        print(line.strip())" > ./dup.py
    out_name=$1
    shift

    if file $@ | grep gzip; then
      zcat $@ | python3 ./dup.py | gzip > $out_name
    else
      cat $@ | python3 ./dup.py | gzip > $out_name
    fi

CWL From line 22 of bbmap/prepare_fasta_db.cwl

baseCommand: [ fastp ]

CWL fastp From line 137 of fastp/fastp.cwl

baseCommand: [ fastqc ]

CWL FastQC From line 6 of fastqc/fastqc.cwl

- entryname: script.sh
  entry: |-
    #!/bin/bash
    outname=$1
    longreads=$2
    shift;shift;
    filtlong $longreads $@ 2> >(tee -a $outname.filtlong.log>&2) | gzip > $outname.fastq.gz

CWL Filtlong From line 23 of filtlong/filtlong.cwl

baseCommand: [ kraken2 ]

CWL kraken2 From line 6 of kraken2/kraken2.cwl

baseCommand: [ ktImportTaxonomy ]

CWL Krona From line 15 of krona/krona.cwl

- entryname: script.sh
  entry: |-
    #!/bin/bash
    #   $1 = mapped/unmapped (-F -f)
    # 1 $2 = ref
    # 2 $3 = fastq
    # 3 $4 = preset (map-ont)
    # 4 $5 = threads
    # 5 $6 = identifier

    minimap2 -a -t $5 -x $4 $2 $3 | samtools fastq -@ $5 -n $1 4 | pigz -p $5 > $6_filtered.fastq.gz