mgi-ncov19 Snakemake COVID-19 Analysis Pipeline with Classification and De Novo Assembly

public public 1yr ago 0 bookmarks

This snakemake pipeline can conduct cov-19 virus classification, de novo assembly, coverage assessment and variant calling.

The pipeline is built according to https://github.com/BGI-IORI/nCoV_Meta (preprint: https://doi.org/10.1101/2020.03.16.993584)

Differences between mgi-ncov19-snakemake and nCoV_Meta:

  1. low complexity reads removal were implemented with fastp (bgi: prinSEQ)

  2. kraken2 was employed in this snakemake pipeline (bgi: kraken1)

  3. SOAPnuke v2 (bgi: SOAPnuke v1) (better to change to SOAPnuke v1)

  4. not yet finished with the alignment and variant calling steps.

updated: 2020-04-14

Usage:

0. Install Conda and Snakemake

1. Clone workflow

git clone git@github.com:huyue87/mgi-ncov19-snakemake

2. Execute workflow

# 2.1 load input files (paired-end raw reads)
cd mgi-ncov19-snakemake/input
ln -s Sample_{1,2}.fq.gz . 
# 2.2 run de novo assembly and generating sam files
cd mgi-ncov19-snakemake
snakemake --use-conda -n 
snakemake --use-conda 

Code Snippets

37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
shell:
    """
    kraken2 \
        --db {params.db} \
        --threads {threads} \
        --output {output.kraken}\
        --report {output.kreport}\
        --classified-out {params.classified} \
        --unclassified-out {params.unclassified}\
        --paired \
        {input.read1} {input.read2} 
        2> {log.stderr}
    pigz \
        --processes {threads} \
        --verbose \
        --force \
        {params.fq_to_compress} \
        2> {log.stderr}
    """
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
shell:
    """
    fastp \
    -q 20 -u 20 -n 1 -l 50 \
    -i {input.read1} \
    -I {input.read2}\
    -o {output.read1}\
    -O {output.read2}\
    -j {output.json}\
    -h {output.html}\
    --adapter_sequence AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA \
    --adapter_sequence_r2 AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG \
    --detect_adapter_for_pe \
    --disable_trim_poly_g \
    --thread {threads}\
    --low_complexity_filter \
    --complexity_threshold 7 \
    > {log.stdout} \
    2> {log.stderr}
    """
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
shell:
    """
    SOAPnuke filter \
    -f AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA \
    -r AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG \
    -l 20 -q 0.2 -n 0.02 -5 0 -Q 2 -G 2 \
    --fq1 {input.read1} \
    --fq2 {input.read2}\
    --cleanFq1 {params.clean1} \
    --cleanFq2 {params.clean2} \
    --outDir {params.outdir} \
    -T {threads} \
    > {log.stdout} \
    2> {log.stderr} 
    """
153
154
155
156
157
158
159
160
161
162
shell:
    """
    spades.py \
    -1 {input.read1} \
    -2 {input.read2} \
    -o {params.outdir} \
    -t {threads} \
    > {log.stdout} \
    2> {log.stderr}
    """
180
181
182
183
184
185
186
187
188
189
shell:
    """
    bwa aln -t 4 \
    {params.db} {input.read1} > {output.bwa1}\
    2>{log.stderr1}

    bwa aln -t 4 \
    {params.db} {input.read2} > {output.bwa2}\
    2>{log.stderr2}
    """
207
208
209
210
211
212
213
shell:
    """
    bwa sampe {params.db} \
    {input.bwa1} {input.bwa2} {input.read1} {input.read2}\
    >{output.sam} \
    2>{log.stderr}
    """
226
227
228
229
230
231
232
233
234
235
shell:
    """
    perl scripts/BWA_sam_Filter_identity_cvg.pl \
    -i {input} \
    -o {output}\
    -m 0.95 \
    -s 0.90 \
    > {log.stdout}\
    2> {log.stderr}
    """
SnakeMake From line 226 of master/Snakefile
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
shell:
    """
    samtools view -bt $DB.fai {input} > {output.L178} 

    samtools sort -n {output.L178} | samtools fixmate: - {output.L179}

    samtools flagstat {output.L179} > {output.L180}

    samtools sort {output.L179} -o {output.L181} --reference {params.db}

    samtools index {output.L181} 

    java -jar bin/picard.jar \
    MarkDuplicates AS=TRUE \
    VALIDATION_STRINGENCY=LENIENT \
    MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000 \
    REMOVE_DUPLICATES=TRUE INPUT={output.L181} \
    OUTPUT={output.L183a} \
    METRICS_FILE={output.L183b} \
    > {log.stdout} \
    2> {log.stderr}
    """ 
ShowHide 2 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/ctmrbio/mgi-ncov19-snakemake
Name: mgi-ncov19-snakemake
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...