Light-weight Snakemake workflow for preprocessing and statistical analysis of RNA-seq data

public public 1yr ago Version: v1.0 0 bookmarks

ARMOR ( A utomated R eproducible MO dular R NA-seq) is a Snakemake workflow , aimed at performing a typical RNA-seq workflow in a reproducible, automated, and partially contained manner. It is implemented such that alternative or similar analysis can be added or removed.

ARMOR consists of a Snakefile , a conda environment file ( envs/environment.yaml ) a configuration file ( config.yaml ) and a set of R scripts, to perform quality control, preprocessing and differential expression analysis of RNA-seq data. The output can be combined with the iSEE R package to generate a shiny application for browsing and sharing the results.

By default, the pipeline performs all the steps shown in the diagram below. However, you can turn off any combination of the light-colored steps (e.g STAR alignment or DRIMSeq analysis) in the config.yaml file.

Advanced use : If you prefer other software to run one of the outlined steps (e.g. DESeq2 over edgeR , or kallisto over Salmon ), you can use the software of your preference provided you have your own script(s), and change some lines within the Snakefile . If you think your "custom rule" might be of use to a broader audience, let us know by opening an issue.

Using the ARMOR workflow

Assuming that snakemake and conda are installed (and your system has the necessary libraries to compile R packages), you can use the following commands on a test dataset:

git clone https://github.com/csoneson/ARMOR.git
cd ARMOR && snakemake --use-conda

To use the ARMOR workflow on your own data, follow the steps outlined in the wiki .

Workflow graph

DAG
Blue circles are rules run in R , orange circles from software called as shell commands. Dashed lines and light-colored circles are optional rules, controlled in config.yaml

Contributors

Current contributors include:

Code Snippets

97
98
shell:
	'''{params.Rbin} CMD BATCH --no-restore --no-save "--args outtxt='{output}' ncores='{params.ncores}' annotation='{params.flag}' organism='{params.organism}'" {input.script} {log}'''
136
137
shell:
	'''{params.Rbin} CMD BATCH --no-restore --no-save "--args Routdir='{params.Routdir}' outtxt='{params.outtxt}'" {params.script} {log}'''
SnakeMake From line 136 of master/Snakefile
147
148
149
150
151
152
shell:
	"echo -n 'ARMOR version ' && cat version; "
	"salmon --version; trim_galore --version; "
	"echo -n 'cutadapt ' && cutadapt --version; "
	"fastqc --version; STAR --version; samtools --version; multiqc --version; "
	"bedtools --version"
173
174
175
176
177
178
179
180
181
182
183
	shell:
	  """
	  if [ {params.anno} == "Gencode" ]; then
      echo 'Salmon version:\n' > {log}; salmon --version >> {log};
  	  salmon index -t {input.txome} -i {params.salmonoutdir} --gencode {params.salmonextraparams}

    else
  	  echo 'Salmon version:\n' > {log}; salmon --version >> {log};
      salmon index -t {input.txome} -i {params.salmonoutdir} {params.salmonextraparams}
    fi
    """
207
208
shell:
	'''{params.Rbin} CMD BATCH --no-restore --no-save "--args transcriptfasta='{input.txome}' salmonidx='{input.salmonidx}' gtf='{input.gtf}' annotation='{params.flag}' organism='{params.organism}' release='{params.release}' build='{params.build}' output='{output}'" {input.script} {log}'''
SnakeMake From line 207 of master/Snakefile
230
231
232
233
234
shell:
	"echo 'STAR version:\n' > {log}; STAR --version >> {log}; "
	"STAR --runMode genomeGenerate --runThreadN {threads} --genomeDir {params.STARindex} "
	"--genomeFastaFiles {input.genome} --sjdbGTFfile {input.gtf} --sjdbOverhang {params.readlength} "
	"{params.starextraparams}"
255
256
257
shell:
	"echo 'FastQC version:\n' > {log}; fastqc --version >> {log}; "
	"fastqc -o {params.FastQC} -t {threads} {input.fastq}"
275
276
277
shell:
	"echo 'FastQC version:\n' > {log}; fastqc --version >> {log}; "
	"fastqc -o {params.FastQC} -t {threads} {input.fastq}"
324
325
326
shell:
	"echo 'MultiQC version:\n' > {log}; multiqc --version >> {log}; "
	"multiqc {params.inputdirs} -f -o {params.MultiQCdir}"
346
347
348
shell:
	"echo 'TrimGalore! version:\n' > {log}; trim_galore --version >> {log}; "
	"trim_galore -q 20 --phred33 --length 20 -o {params.FASTQtrimmeddir} --path_to_cutadapt cutadapt {input.fastq}"
365
366
367
368
shell:
	"echo 'TrimGalore! version:\n' > {log}; trim_galore --version >> {log}; "
	"trim_galore -q 20 --phred33 --length 20 -o {params.FASTQtrimmeddir} --path_to_cutadapt cutadapt "
	"--paired {input.fastq1} {input.fastq2}"
392
393
394
395
shell:
	"echo 'Salmon version:\n' > {log}; salmon --version >> {log}; "
	"salmon quant -i {params.salmonindex} -l A -r {input.fastq} "
	"-o {params.salmondir}/{wildcards.sample} -p {threads} {params.salmonextraparams}"
416
417
418
419
shell:
	"echo 'Salmon version:\n' > {log}; salmon --version >> {log}; "
	"salmon quant -i {params.salmonindex} -l A -1 {input.fastq1} -2 {input.fastq2} "
	"-o {params.salmondir}/{wildcards.sample} -p {threads} {params.salmonextraparams}"
443
444
445
446
447
448
shell:
	"echo 'STAR version:\n' > {log}; STAR --version >> {log}; "
	"STAR --genomeDir {params.STARindex} --readFilesIn {input.fastq} "
	"--runThreadN {threads} --outFileNamePrefix {params.STARdir}/{wildcards.sample}/{wildcards.sample}_ "
	"--outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c "
	"{params.starextraparams}"
469
470
471
472
473
474
shell:
	"echo 'STAR version:\n' > {log}; STAR --version >> {log}; "
	"STAR --genomeDir {params.STARindex} --readFilesIn {input.fastq1} {input.fastq2} "
	"--runThreadN {threads} --outFileNamePrefix {params.STARdir}/{wildcards.sample}/{wildcards.sample}_ "
	"--outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c "
	"{params.starextraparams}"
488
489
490
shell:
	"echo 'samtools version:\n' > {log}; samtools --version >> {log}; "
	"samtools index {input.bam}"
507
508
509
510
511
512
shell:
	"echo 'bedtools version:\n' > {log}; bedtools --version >> {log}; "
	"bedtools genomecov -split -ibam {input.bam} -bg | LC_COLLATE=C sort -k1,1 -k2,2n > "
	"{params.STARbigwigdir}/{wildcards.sample}_Aligned.sortedByCoord.out.bedGraph; "
	"bedGraphToBigWig {params.STARbigwigdir}/{wildcards.sample}_Aligned.sortedByCoord.out.bedGraph "
	"{input.chrl} {output}; rm -f {params.STARbigwigdir}/{wildcards.sample}_Aligned.sortedByCoord.out.bedGraph"
539
540
shell:
	'''{params.Rbin} CMD BATCH --no-restore --no-save "--args salmondir='{params.salmondir}' json='{input.json}' metafile='{input.metatxt}' outrds='{output}' annotation='{params.flag}' organism='{params.organism}'" {input.script} {log}'''
SnakeMake From line 539 of master/Snakefile
582
583
584
585
shell:
    '''{params.Rbin} CMD BATCH --no-restore --no-save "--args metafile='{params.metatxt}' design='{params.design}' contrast='{params.contrast}' outFile='{output}' gtf='{params.gtf}' genome='{params.genome}' fastqdir='{params.fastqdir}' fqsuffix='{params.fqsuffix}' fqext1='{params.fqext1}' fqext2='{params.fqext2}' txome='{params.txome}' run_camera='{params.run_camera}' organism='{params.organism}' {params.genesets} annotation='{params.annotation}'" {input.script} {log};
    cat {output}
    '''
SnakeMake From line 582 of master/Snakefile
613
614
shell:
	'''{params.Rbin} CMD BATCH --no-restore --no-save "--args se='{input.rds}' organism='{params.organism}' design='{params.design}' contrast='{params.contrast}' {params.genesets} rmdtemplate='{input.template}' outputdir='{params.directory}' outputfile='edgeR_dge.html'" {input.script} {log}'''
SnakeMake From line 613 of master/Snakefile
644
645
shell:
	'''{params.Rbin} CMD BATCH --no-restore --no-save "--args se='{input.rds}' design='{params.design}' contrast='{params.contrast}' ncores='{params.ncores}' rmdtemplate='{input.template}' outputdir='{params.directory}' outputfile='DRIMSeq_dtu.html'" {input.script} {log}'''
SnakeMake From line 644 of master/Snakefile
682
683
shell:
	'''{params.Rbin} CMD BATCH --no-restore --no-save "--args se='{input.rds}' gtffile='{input.gtf}' rmdtemplate='{input.template}' outputfile='prepare_shiny.html' {params.p}" {input.script} {log}'''
SnakeMake From line 682 of master/Snakefile
ShowHide 19 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/csoneson/ARMOR
Name: armor
Version: v1.0
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Accessed: 5
Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...