Paired-end CUT and RUN data processing workflow designed for execution on the BigPurple HPC. Primary software used: FastQC, Fastp, Bowtie 2, SEACR, MultiQC

public public 1yr ago 0 bookmarks

CUT-RUN

This pipeline was written for execution on the NYU big purple server. This readme describes how to execute the snake make workflow for paired-end CUT&RUN data pre-processing (fastq -> peak calling), Utilizing bowtie2 for alignmen

Code Snippets

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import subprocess, pandas as pd

sample_file = "samples_info.tab"
sample = pd.read_table(sample_file)['Sample']
replicate = pd.read_table(sample_file)['Replicate']
condition = pd.read_table(sample_file)['Condition']
Antibody = pd.read_table(sample_file)['Antibody']

sample_ids = []
for i in range(len(sample)):
	sample_ids.append('%s_%s_%s' % (sample[i], condition[i], replicate[i]))
sample_ids = pd.unique(sample_ids).tolist()

sample_ids_file = []
for i in range(len(sample)):
	sample_ids_file.append('%s_%s_%s_%s' % (sample[i], condition[i], replicate[i], Antibody[i]))

total_fragments = []
peak_fragments = []
def FRP(sample):
	peak_id = sample.split('_')[0:-1]
	peak_id = "_".join(peak_id)
	print(sample)
	cmd_1 = ['bedtools', 'bamtobed', '-bedpe', '-i', 'alignment/%s.bam' % (sample)] 
	cmd_2 = ['bedtools', 'intersect', '-a', 'stdin', '-b', 'peaks/%s.stringent.bed' % (peak_id)]
	cmd_3 = ['wc', '-l']
	step_1 = subprocess.Popen(cmd_1, stdout=subprocess.PIPE)
	step_2 = subprocess.Popen(cmd_2, stdin = step_1.stdout, stdout=subprocess.PIPE)
	step_3 = subprocess.Popen(cmd_3, stdin = step_2.stdout, stdout=subprocess.PIPE)
	frag_peaks = step_3.stdout.read()
	frag_peaks.strip()
	peak_fragments.append(float(frag_peaks))

	cmd_4 = ['samtools', 'view', 'alignment/%s.bam' % (sample)]
	step_1 = subprocess.Popen(cmd_4, stdout=subprocess.PIPE)
	step_2 = subprocess.Popen(cmd_3, stdin = step_1.stdout, stdout=subprocess.PIPE)

	frag = step_2.stdout.read()
	frag.strip()
	frag = float(frag)
	total_fragments.append(frag/2)

for samples in sample_ids_file:
	FRP(samples)

with open('FRP.txt', 'w') as out:
	out.write('Sample\tTotal_fragments\tFragments_in_peaks\tFRP\n')
	for i in range(len(sample_ids_file)):
		out.write('%s\t%s\t%s\t%s\n' % (sample_ids_file[i], total_fragments[i], peak_fragments[i], peak_fragments[i]/total_fragments[i]))
	out.close()	
51
52
shell: 
	'fastqc {input.fastq} -o {params}'
61
62
shell: 
	'fastqc {input.fastq} -o {params}'
78
79
shell:
	'fastp -w {threads} {params} -i {input.R1} -I {input.R2} -o {output.R1} -O {output.R2} --html {output.html} --json {output.json} 2> {log}'
92
93
94
95
96
97
98
	shell:
		'bowtie2 {params} -x %s --threads {threads} -1 {input.R1} -2 {input.R2} 2> {log} | samtools view -bh -q 3 > alignment/{wildcards.sample}.bam' % (genome)

rule align_spike:
	input:
		R1='trim/{sample}_trimmed_R1.fastq.gz',
		R2='trim/{sample}_trimmed_R2.fastq.gz'
106
107
108
109
110
111
	shell:
		'bowtie2 {params} -x %s --threads {threads} -1 {input.R1} -2 {input.R2} 2> {log} | samtools view -bh -q 3 > alignment/{wildcards.sample}_ecoli.bam' % (spike_genome)

rule sort:
	input:
		'alignment/{sample}.bam'
115
116
shell:
	'samtools sort -@ {threads} {input} > {output}'
124
125
shell:
	'samtools index -@ {threads} {input} > {output}'
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
	shell:
		"""
		depth=`samtools view alignment/{wildcards.sample}_ecoli.bam | wc -l`
		depth=$((depth/2))
		echo $depth
		scale_fac=`echo "10000 / $depth" | bc -l`
		echo $scale_fac
		bedtools bamtobed -bedpe -i alignment/{wildcards.sample}.bam | cut -f 1,2,6 | sort -k1,1 -k2,2n -k3,3n > alignment/bed/{wildcards.sample}.bed
		"""
		'bedtools genomecov -bg -i alignment/bed/{wildcards.sample}.bed -scale $scale_fac -g %s > alignment/bed/{wildcards.sample}.bedgraph' % (chr_lens)

rule SEACR:
	input:
		exp='alignment/bed/{sample}_Antibody.bedgraph',
		con='alignment/bed/{sample}_Control.bedgraph'
153
154
shell:
	'bash SEACR_1.3.sh {input.exp} {input.con} {params} peaks/{wildcards.sample}'
SnakeMake From line 153 of main/Snakefile
161
162
163
164
shell:
	"""
	samtools view {input} | awk -F'\t' 'function abs(x){{return ((x < 0.0) ? -x : x)}} {{print abs($9)}}' | sort | uniq -c | awk -v OFS="\t" '{{print $2, $1/2}}' > {output}
	"""
171
172
script:
	'FRP.py'
SnakeMake From line 171 of main/Snakefile
ShowHide 8 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/mgildea87/CUT-RUN
Name: cut-run
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...