Snakemake Pipeline for Paired-End RNA-Seq Data Analysis

public public 1yr ago 0 bookmarks

snakemake_PE_RNAseq

This pepeline is inspired by crazyhottommy . test generate workflow plot

snakemake --dag 2> /dev/null | dot -T png > workflow.png

the flow is as following

test2

run

## dry run first
snakemake -np

if no errors, preceed below.

snakemake -j -np 99 --cluster '/Snakefile-sbatch.py'

Code Snippets

 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
library(ballgown)
library(magrittr)
args <- commandArgs(trailingOnly = TRUE)

sampleids = args[1:length(args)-1]
output = args[length(args)]
print(sampleids)
## generate the  expression table for each sample
bg <-  ballgown(samples = sampleids , meas='FPKM')

print('process sample data')

print(sampleNames(bg))
gene_expression = ballgown::gexpr(bg)
gene_table = tibble::as.tibble(gene_expression)
gene_table$geneid = rownames(gene_expression)

geneID = tibble::tibble(genename = ballgown::geneNames(bg) ,
                        geneid =ballgown::geneIDs(bg) ) %>%
  dplyr::filter(! genename =='.' ) %>% dplyr::distinct()

gene_table <- dplyr::inner_join(gene_table ,geneID )
readr::write_csv(gene_table, path = output)
56
57
58
59
60
61
	shell:
		"""
		cutadapt -e 0.12 -a CTGTCTCTTATACACATCT -A CTGTCTCTTATACACATCT -j 6 -m 30 \
        -o {output[0]} -p {output[1]} \
        {input.r1} {input.r2} > {log}
        """
70
71
72
73
74
shell:
    """
    module load fastqc
    fastqc -o 02_fqc -f fastq --noextract {input}  2> {log}
    """
87
88
89
90
91
92
93
94
95
96
shell:
	"""
	{hisat} -p {threads} \
	--dta \
	-x {STARINDEX} \
	-1 {input[0]} \
	-2 {input[1]} \
	-S {output} \
	&> {log}
	"""
111
112
113
114
115
shell:
	"""
	module load samtools
	samtools sort  -@ {threads} -T {output}.tmp -o {output} {input} 2> {log}
	"""
124
125
126
127
128
shell:
    """
    module load samtools
    samtools index {input} 2> {log}
    """
139
140
141
142
143
shell:
	"""
	# -p for paried-end, counting fragments rather reads
	{stringtie} -e -B -p {threads} -G {gtf} -o {output[0]} {input}
	"""
SnakeMake From line 139 of master/Snakefile
155
156
157
158
159
shell:
	"""
# no window smoothing is done, for paired-end, bamCoverage will extend the length to the fragement length of the paired reads
bamCoverage -b {input[0]}  --binSize 100 --effectiveGenomeSize 2864785220 --skipNonCoveredRegions --normalizeUsing RPKM -p {threads}  -o {output[0]} 2> {log}
	"""
175
176
177
178
shell:
    """
    multiqc 02_fqc 00_log -o 07_multiQC -d -f -v -n multiQC_log 2> {log}
    """
184
185
186
187
shell: 
	"""
    Rscript ballgown.R {input} {output}
	""" 
SnakeMake From line 184 of master/Snakefile
ShowHide 4 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/yuxuth/snake_hisat_cutadapt
Name: snake_hisat_cutadapt
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...