ZARP: An automated workflow for processing of RNA-seq data

public public 1yr ago Version: Version 1 0 bookmarks

ZARP ( Zavolan-Lab Automated RNA-Seq Pipeline)

...is a generic RNA-Seq analysis workflow that allows users to process and analyze Illumina short-read sequencing libraries with minimum effort. The workflow relies on publicly available bioinformatics tools and currently handles single or paired-end stranded bulk RNA-seq data. The workflow is developed in Snakemake , a widely used workflow management system in the bioinformatics community.

According to the current ZARP implementation, reads are analyzed (pre-processed, aligned, quantified) with state-of-the-art tools to give meaningful initial insights into the quality and composition of an RNA-Seq library, reducing hands-on time for bioinformaticians and giving experimentalists the possibility to rapidly assess their data. Additional reports summarise the results of the individual steps and provide useful visualisations.

Requirements

The workflow has been tested on:

  • CentOS 7.5
  • Debian 10
  • Ubuntu 16.04, 18.04

NOTE: Currently, we only support Linux execution.

Code Snippets

210
211
212
shell:
    "(cat {input.reads} > {output.reads}) \
    1> {log.stdout} 2> {log.stderr} "
261
262
263
264
265
266
267
shell:
    "(mkdir -p {output.outdir}; \
    fastqc --outdir {output.outdir} \
    --threads {threads} \
    {params.additional_params} \
    {input.reads}) \
    1> {log.stdout} 2> {log.stderr}"
317
318
319
320
321
322
323
shell:
    "(mkdir -p {output.outdir}; \
    fastqc --outdir {output.outdir} \
    --threads {threads} \
    {params.additional_params} \
    {input.reads}) \
    1> {log.stdout} 2> {log.stderr}"
388
389
390
391
392
393
394
395
396
397
398
399
400
shell:
    "(mkdir -p {params.output_dir}; \
    chmod -R 777 {params.output_dir}; \
    STAR \
    --runMode genomeGenerate \
    --sjdbOverhang {params.sjdbOverhang} \
    --genomeDir {params.output_dir} \
    --genomeFastaFiles {input.genome} \
    --runThreadN {threads} \
    --outFileNamePrefix {params.outFileNamePrefix} \
    --sjdbGTFfile {input.gtf}) \
    {params.additional_params} \
    1> {log.stdout} 2> {log.stderr}"
434
435
436
437
shell:
    "(sort \
    -k1,1 -k4,4n -k5,5nr {input.gtf} > {output.gtf} \
    ) 1> {log.stdout} 2> {log.stderr}"
481
482
483
484
485
486
487
shell:
    "(gffread \
    -w {output.transcriptome} \
    -g {input.genome} \
    {params.additional_params} \
    {input.gtf}) \
    1> {log.stdout} 2> {log.stderr}"
523
524
525
526
shell:
    "(cat {input.transcriptome} {input.genome} \
    1> {output.genome_transcriptome}) \
    2> {log.stderr}"
583
584
585
586
587
588
589
590
591
shell:
    "(salmon index \
    --transcripts {input.genome_transcriptome} \
    --decoys {input.chr_names} \
    --index {output.index} \
    --kmerLen {params.kmerLen} \
    --threads {threads}) \
    {params.additional_params} \
    1> {log.stdout} 2> {log.stderr}"
626
627
628
629
630
631
632
633
shell:
    "(mkdir -p {params.output_dir}; \
    chmod -R 777 {params.output_dir}; \
    kallisto index \
    {params.additional_params} \
    -i {output.index} \
    {input.transcriptome}) \
    1> {log.stdout}  2> {log.stderr}"
673
674
675
676
677
678
shell:
    "(gtf2bed12 \
    --gtf {input.gtf} \
    --bed12 {output.bed12}); \
    {params.additional_params} \
    1> {log.stdout} 2> {log.stderr}"
729
730
731
732
733
734
735
shell:
    "(samtools sort \
    -o {output.bam} \
    -@ {threads} \
    {params.additional_params} \
    {input.bam}) \
    1> {log.stdout} 2> {log.stderr}"
786
787
788
789
790
shell:
    "(samtools index \
    {params.additional_params} \
    {input.bam} {output.bai};) \
    1> {log.stdout} 2> {log.stderr}"
859
860
861
862
863
864
865
866
shell:
    "(calculate-tin.py \
    -i {input.bam} \
    -r {input.transcripts_bed12} \
    --names {params.sample} \
    -p {threads} \
    {params.additional_params} \
    > {output.TIN_score};) 2> {log.stderr}"
945
946
947
948
949
950
951
952
953
shell:
    "(salmon quantmerge \
    --quants {params.salmon_in} \
    --genes \
    --names {params.sample_name_list} \
    --column {params.salmon_merge_on} \
    --output {output.salmon_out};) \
    {params.additional_params} \
    1> {log.stdout} 2> {log.stderr}"
1032
1033
1034
1035
1036
1037
1038
1039
shell:
    "(salmon quantmerge \
    --quants {params.salmon_in} \
    --names {params.sample_name_list} \
    --column {params.salmon_merge_on} \
    --output {output.salmon_out}) \
    {params.additional_params} \
    1> {log.stdout} 2> {log.stderr}"
1116
1117
1118
1119
1120
1121
1122
1123
1124
shell:
    "(merge_kallisto.R \
    --input {params.tables} \
    --names {params.sample_name_list} \
    --txOut FALSE \
    --anno {input.gtf} \
    --output {params.dir_out} \
    {params.additional_params} ) \
    1> {log.stdout} 2> {log.stderr}"
1197
1198
1199
1200
1201
1202
1203
shell:
    "(merge_kallisto.R \
    --input {params.tables} \
    --names {params.sample_name_list} \
    --output {params.dir_out} \
    {params.additional_params}) \
    1> {log.stdout} 2> {log.stderr}"
1242
1243
1244
1245
1246
1247
shell:
    "(zpca-tpm  \
    --tpm {input.tpm} \
    --out {output.out} \
    {params.additional_params}) \
    1> {log.stdout} 2> {log.stderr}"
1284
1285
1286
1287
1288
1289
shell:
    "(zpca-tpm  \
    --tpm {input.tpm} \
    --out {output.out} \
    {params.additional_params}) \
    1> {log.stdout} 2> {log.stderr}"
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
shell:
    "(mkdir -p {params.out_dir}; \
    chmod -R 777 {params.out_dir}; \
    STAR \
    --runMode inputAlignmentsFromBAM \
    --runThreadN {threads} \
    --inputBAMfile {input.bam} \
    --outWigType bedGraph \
    --outFileNamePrefix {params.prefix}) \
    {params.additional_params} \
    1> {log.stdout} 2> {log.stderr}"
1485
1486
1487
1488
shell:
    "(cp {input.plus} {output.plus}; \
    cp {input.minus} {output.minus};) \
    1>{log.stdout} 2>{log.stderr}"
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
shell:
    "(mkdir -p {output.temp_dir}; \
    alfa -a {input.gtf} \
    -g {params.genome_index} \
    --chr_len {input.chr_len} \
    --temp_dir {output.temp_dir} \
    -p {threads} \
    -o {params.out_dir} \
    {params.additional_params}) \
    &> {log}"
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
shell:
    "(mkdir -p {output.temp_dir};\
    cd {params.out_dir}; \
    alfa \
    -g {params.genome_index} \
    --bedgraph {params.plus} {params.minus} {params.name} \
    -s {params.alfa_orientation} \
    --temp_dir {params.temp_dir} \
    {params.additional_params}) \
    &> {log}"
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
shell:
    "(python {input.script} \
    --config {output.multiqc_config} \
    --intro-text '{params.multiqc_intro_text}' \
    --custom-logo '{params.logo_path}' \
    --url '{params.url}' \
    --author-name '{params.author_name}' \
    --author-email '{params.author_email}' \
    {params.additional_params}) \
    1> {log.stdout} 2> {log.stderr}"
1861
1862
1863
1864
1865
1866
1867
1868
shell:
    "(multiqc \
    --outdir {output.multiqc_report} \
    --config {input.multiqc_config} \
    {params.additional_params} \
    {params.results_dir} \
    {params.log_dir};) \
    1> {log.stdout} 2> {log.stderr}"
1916
1917
1918
1919
1920
shell:
    "(sortBed \
    -i {input.bg} \
    {params.additional_params} \
    > {output.sorted_bg};) 2> {log.stderr}"
1979
1980
1981
1982
1983
1984
1985
shell:
    "(bedGraphToBigWig \
    {params.additional_params} \
    {input.sorted_bg} \
    {input.chr_sizes} \
    {output.bigWig};) \
    1> {log.stdout} 2> {log.stderr}"
ShowHide 24 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/zavolanlab/zarp
Name: zarp-an-automated-workflow-for-processing-of-rna-s
Version: Version 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: Boost Software License 1.0
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...