Light-weight Snakemake workflow for preprocessing and statistical analysis of RNA-seq data
ARMOR ( A utomated R eproducible MO dular R NA-seq) is a Snakemake workflow , aimed at performing a typical RNA-seq workflow in a reproducible, automated, and partially contained manner. It is implemented such that alternative or similar analysis can be added or removed.
ARMOR consists of a
Snakefile
, a
conda
environment file (
envs/environment.yaml
) a configuration file (
config.yaml
) and a set of
R
scripts, to perform quality control, preprocessing and differential expression analysis of RNA-seq data. The output can be combined with the
iSEE
R
package to generate a
shiny
application for browsing and sharing the results.
By default, the pipeline performs all the steps shown in the
diagram
below. However, you can turn off any combination of the light-colored steps (e.g
STAR
alignment or
DRIMSeq
analysis) in the
config.yaml
file.
Advanced use
: If you prefer other software to run one of the outlined steps (e.g.
DESeq2
over
edgeR
, or
kallisto
over
Salmon
), you can use the software of your preference provided you have your own script(s), and change some lines within the
Snakefile
. If you think your "custom rule" might be of use to a broader audience, let us know by opening an issue.
Using the ARMOR workflow
Assuming that snakemake and conda are installed (and your system has the necessary libraries to compile R packages), you can use the following commands on a test dataset:
git clone https://github.com/csoneson/ARMOR.git
cd ARMOR && snakemake --use-conda
To use the ARMOR workflow on your own data, follow the steps outlined in the wiki .
Workflow graph
Blue circles are rules run in
R
, orange circles from software called as shell commands. Dashed lines and light-colored circles are optional rules, controlled in
config.yaml
Contributors
Current contributors include:
Code Snippets
97 98 | shell: '''{params.Rbin} CMD BATCH --no-restore --no-save "--args outtxt='{output}' ncores='{params.ncores}' annotation='{params.flag}' organism='{params.organism}'" {input.script} {log}''' |
136 137 | shell: '''{params.Rbin} CMD BATCH --no-restore --no-save "--args Routdir='{params.Routdir}' outtxt='{params.outtxt}'" {params.script} {log}''' |
147 148 149 150 151 152 | shell: "echo -n 'ARMOR version ' && cat version; " "salmon --version; trim_galore --version; " "echo -n 'cutadapt ' && cutadapt --version; " "fastqc --version; STAR --version; samtools --version; multiqc --version; " "bedtools --version" |
147
of
master/Snakefile
173 174 175 176 177 178 179 180 181 182 183 | shell: """ if [ {params.anno} == "Gencode" ]; then echo 'Salmon version:\n' > {log}; salmon --version >> {log}; salmon index -t {input.txome} -i {params.salmonoutdir} --gencode {params.salmonextraparams} else echo 'Salmon version:\n' > {log}; salmon --version >> {log}; salmon index -t {input.txome} -i {params.salmonoutdir} {params.salmonextraparams} fi """ |
207 208 | shell: '''{params.Rbin} CMD BATCH --no-restore --no-save "--args transcriptfasta='{input.txome}' salmonidx='{input.salmonidx}' gtf='{input.gtf}' annotation='{params.flag}' organism='{params.organism}' release='{params.release}' build='{params.build}' output='{output}'" {input.script} {log}''' |
230 231 232 233 234 | shell: "echo 'STAR version:\n' > {log}; STAR --version >> {log}; " "STAR --runMode genomeGenerate --runThreadN {threads} --genomeDir {params.STARindex} " "--genomeFastaFiles {input.genome} --sjdbGTFfile {input.gtf} --sjdbOverhang {params.readlength} " "{params.starextraparams}" |
255 256 257 | shell: "echo 'FastQC version:\n' > {log}; fastqc --version >> {log}; " "fastqc -o {params.FastQC} -t {threads} {input.fastq}" |
275 276 277 | shell: "echo 'FastQC version:\n' > {log}; fastqc --version >> {log}; " "fastqc -o {params.FastQC} -t {threads} {input.fastq}" |
324 325 326 | shell: "echo 'MultiQC version:\n' > {log}; multiqc --version >> {log}; " "multiqc {params.inputdirs} -f -o {params.MultiQCdir}" |
346 347 348 | shell: "echo 'TrimGalore! version:\n' > {log}; trim_galore --version >> {log}; " "trim_galore -q 20 --phred33 --length 20 -o {params.FASTQtrimmeddir} --path_to_cutadapt cutadapt {input.fastq}" |
365 366 367 368 | shell: "echo 'TrimGalore! version:\n' > {log}; trim_galore --version >> {log}; " "trim_galore -q 20 --phred33 --length 20 -o {params.FASTQtrimmeddir} --path_to_cutadapt cutadapt " "--paired {input.fastq1} {input.fastq2}" |
392 393 394 395 | shell: "echo 'Salmon version:\n' > {log}; salmon --version >> {log}; " "salmon quant -i {params.salmonindex} -l A -r {input.fastq} " "-o {params.salmondir}/{wildcards.sample} -p {threads} {params.salmonextraparams}" |
416 417 418 419 | shell: "echo 'Salmon version:\n' > {log}; salmon --version >> {log}; " "salmon quant -i {params.salmonindex} -l A -1 {input.fastq1} -2 {input.fastq2} " "-o {params.salmondir}/{wildcards.sample} -p {threads} {params.salmonextraparams}" |
443 444 445 446 447 448 | shell: "echo 'STAR version:\n' > {log}; STAR --version >> {log}; " "STAR --genomeDir {params.STARindex} --readFilesIn {input.fastq} " "--runThreadN {threads} --outFileNamePrefix {params.STARdir}/{wildcards.sample}/{wildcards.sample}_ " "--outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c " "{params.starextraparams}" |
469 470 471 472 473 474 | shell: "echo 'STAR version:\n' > {log}; STAR --version >> {log}; " "STAR --genomeDir {params.STARindex} --readFilesIn {input.fastq1} {input.fastq2} " "--runThreadN {threads} --outFileNamePrefix {params.STARdir}/{wildcards.sample}/{wildcards.sample}_ " "--outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c " "{params.starextraparams}" |
488 489 490 | shell: "echo 'samtools version:\n' > {log}; samtools --version >> {log}; " "samtools index {input.bam}" |
507 508 509 510 511 512 | shell: "echo 'bedtools version:\n' > {log}; bedtools --version >> {log}; " "bedtools genomecov -split -ibam {input.bam} -bg | LC_COLLATE=C sort -k1,1 -k2,2n > " "{params.STARbigwigdir}/{wildcards.sample}_Aligned.sortedByCoord.out.bedGraph; " "bedGraphToBigWig {params.STARbigwigdir}/{wildcards.sample}_Aligned.sortedByCoord.out.bedGraph " "{input.chrl} {output}; rm -f {params.STARbigwigdir}/{wildcards.sample}_Aligned.sortedByCoord.out.bedGraph" |
539 540 | shell: '''{params.Rbin} CMD BATCH --no-restore --no-save "--args salmondir='{params.salmondir}' json='{input.json}' metafile='{input.metatxt}' outrds='{output}' annotation='{params.flag}' organism='{params.organism}'" {input.script} {log}''' |
582 583 584 585 | shell: '''{params.Rbin} CMD BATCH --no-restore --no-save "--args metafile='{params.metatxt}' design='{params.design}' contrast='{params.contrast}' outFile='{output}' gtf='{params.gtf}' genome='{params.genome}' fastqdir='{params.fastqdir}' fqsuffix='{params.fqsuffix}' fqext1='{params.fqext1}' fqext2='{params.fqext2}' txome='{params.txome}' run_camera='{params.run_camera}' organism='{params.organism}' {params.genesets} annotation='{params.annotation}'" {input.script} {log}; cat {output} ''' |
613 614 | shell: '''{params.Rbin} CMD BATCH --no-restore --no-save "--args se='{input.rds}' organism='{params.organism}' design='{params.design}' contrast='{params.contrast}' {params.genesets} rmdtemplate='{input.template}' outputdir='{params.directory}' outputfile='edgeR_dge.html'" {input.script} {log}''' |
644 645 | shell: '''{params.Rbin} CMD BATCH --no-restore --no-save "--args se='{input.rds}' design='{params.design}' contrast='{params.contrast}' ncores='{params.ncores}' rmdtemplate='{input.template}' outputdir='{params.directory}' outputfile='DRIMSeq_dtu.html'" {input.script} {log}''' |
682 683 | shell: '''{params.Rbin} CMD BATCH --no-restore --no-save "--args se='{input.rds}' gtffile='{input.gtf}' rmdtemplate='{input.template}' outputfile='prepare_shiny.html' {params.p}" {input.script} {log}''' |
Support
- Future updates