High throughput Next Generation Sequencing (NGS) data analysis using Python 3 Snakemake

public public 1yr ago Version: 4 0 bookmarks

Alex Soupir

Snakemake is a Bioinformatics tool for managing a workflow. This tool proves valuable when analyzing a large amount of data with multiple tools. This script was made as a learning tool for workflow manager. There is also Nextflow to manage large analysis workflows. Here, Snakemake was used to run everything that is usually run on Linux with RNA-Seq Analyses (here is my long winded version of an RNA-seq analysis on Mice p53 gene mutation).

The Snakemake file was developed for analyzing sequencing data from a recent publication - Dietary walnut altered gene expressions related to tumor growth, survival, and metastasis in breast cancer patients: a pilot clinical trial . The raw sequences were downloaded from the Sequence Read Archive Run Selector using sra-tools.

  • The genome and the gtf files were downloaded and an index was created of the genome.

  • The minikraken was downloaded and extracted.

  • Homo sapiens rRNA sequences were downloaded from NCBI.

  • PhiX sequences were downloaded from Illumina.

Conda Tools Used:

  • FastQC

  • Trimmomatic

  • STAR

  • featureCounts (conda Subread)

  • Bowtie2

  • bwa

  • Samtools

  • Bam2Fastx

  • MultiQC

Running

To run Snakemake , a big memory cluster node was used. To run, in the same folder as the snakefile I used snakemake -j 80 which tells Snakemake to use 80 cores. Snakemake if not given a file name searches current directory for a file named snakefile .

The output of running Snakemake is a QC folder with results for all of the steps as well as MultiQC which makes nice HTML pages to summarize the results.

MultiQC doesn't have the ability to identify and create summaries for microbial contamination with KrakenUniq.

Code Snippets

35
36
shell:
	"~/miniconda2/bin/fastqc {input} --threads {threads} -O {params}"
50
51
shell:
	"~/miniconda2/bin/trimmomatic PE -threads {threads} {input.read1} {input.read2} {output.fp} {output.fu} {output.rp} {output.ru} ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10:2:keepBothReads SLIDINGWINDOW:4:20 TRAILING:3 MINLEN:36 2>{log}"
62
63
shell:
	"~/miniconda2/bin/fastqc {input} --threads {threads} -O {params}"
77
78
79
80
shell:
	"""
	~/miniconda2/bin/STAR --runThreadN {threads} --genomeDir genomeIndex --readFilesIn {input.trimmed1} {input.trimmed2} --outFilterIntronMotifs RemoveNoncanonical --outFileNamePrefix {params.prefix} --outSAMtype BAM SortedByCoordinate  --outReadsUnmapped Fastx
	"""
87
88
89
90
91
92
93
	shell:
		"~/miniconda2/bin/featureCounts -a genome/gencode.v32.annotation.gtf -o {output.counts} {input.bam} -T {threads}

rule PhiXContamination:
	input:
		trimmed1="Trimmed/{id}_forward_paired.fastq.gz",
		trimmed2="Trimmed/{id}_reverse_paired.fastq.gz"
101
102
103
104
shell:
	"""
	~/miniconda2/bin/bowtie2 -p {threads} -x {params.bowtieGenome} -1 {input.trimmed1} -2 {input.trimmed2} -S {params.prefix}.sam &> {params.prefix}.out
	"""
116
117
118
119
shell:
	"""
	~/miniconda2/bin/krakenuniq --preload --db minikraken_20171019_8GB --threads {threads} --paired --report-file {output.report} --fastq-input {input.unmapped1} {input.unmapped2} > {output.out}
	"""
130
131
132
133
shell: 
	"""
	~/miniconda2/bin/bwa mem -t {threads} {params} {input.trimmed1} {input.trimmed2} > {output.rnaSam}
	"""
143
144
145
146
147
148
shell:
	"""
	~/miniconda2/bin/samtools view -@ {threads} -bS -o {output.rnaBam} {input.rnaSam}
	~/miniconda2/bin/samtools flagstat -@ {threads} {output.rnaBam} > {output.rnaOut}
	~/miniconda2/bin/samtools view -@ {threads} -u -f 12 -F 256 {output.rnaBam} > {output.rnaUnm}
	"""
157
158
159
160
shell:
	"""
	~/miniconda2/bin/bamToFastq -i {input} -fq {output.fwd} -fq2 {output.rvs}
	"""
SnakeMake From line 157 of master/Snakefile
174
175
176
177
178
179
180
shell:
	"""
	cp {input.star} {output.starout}
	cp {input.rrna} {output.rrnaout}
	cp {input.micr} {output.microut}
	cp {input.phix} {output.phixout}
	"""
SnakeMake From line 174 of master/Snakefile
185
186
shell:
	"~/miniconda2/bin/multiqc QC/."
ShowHide 3 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/ACSoupir/SnakeMake
Name: snakemake
Version: 4
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Other Versions:
Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...