Paired-end CUT and RUN data processing workflow designed for execution on the BigPurple HPC. Primary software used: FastQC, Fastp, Bowtie 2, SEACR, MultiQC

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

CUT-RUN

This pipeline was written for execution on the NYU big purple server. This readme describes how to execute the snake make workflow for paired-end CUT&RUN data pre-processing (fastq -> peak calling), Utilizing bowtie2 for alignmen

Code Snippets

import subprocess, pandas as pd

sample_file = "samples_info.tab"
sample = pd.read_table(sample_file)['Sample']
replicate = pd.read_table(sample_file)['Replicate']
condition = pd.read_table(sample_file)['Condition']
Antibody = pd.read_table(sample_file)['Antibody']

sample_ids = []
for i in range(len(sample)):
	sample_ids.append('%s_%s_%s' % (sample[i], condition[i], replicate[i]))
sample_ids = pd.unique(sample_ids).tolist()

sample_ids_file = []
for i in range(len(sample)):
	sample_ids_file.append('%s_%s_%s_%s' % (sample[i], condition[i], replicate[i], Antibody[i]))

total_fragments = []
peak_fragments = []
def FRP(sample):
	peak_id = sample.split('_')[0:-1]
	peak_id = "_".join(peak_id)
	print(sample)
	cmd_1 = ['bedtools', 'bamtobed', '-bedpe', '-i', 'alignment/%s.bam' % (sample)] 
	cmd_2 = ['bedtools', 'intersect', '-a', 'stdin', '-b', 'peaks/%s.stringent.bed' % (peak_id)]
	cmd_3 = ['wc', '-l']
	step_1 = subprocess.Popen(cmd_1, stdout=subprocess.PIPE)
	step_2 = subprocess.Popen(cmd_2, stdin = step_1.stdout, stdout=subprocess.PIPE)
	step_3 = subprocess.Popen(cmd_3, stdin = step_2.stdout, stdout=subprocess.PIPE)
	frag_peaks = step_3.stdout.read()
	frag_peaks.strip()
	peak_fragments.append(float(frag_peaks))

	cmd_4 = ['samtools', 'view', 'alignment/%s.bam' % (sample)]
	step_1 = subprocess.Popen(cmd_4, stdout=subprocess.PIPE)
	step_2 = subprocess.Popen(cmd_3, stdin = step_1.stdout, stdout=subprocess.PIPE)

	frag = step_2.stdout.read()
	frag.strip()
	frag = float(frag)
	total_fragments.append(frag/2)

for samples in sample_ids_file:
	FRP(samples)

with open('FRP.txt', 'w') as out:
	out.write('Sample\tTotal_fragments\tFragments_in_peaks\tFRP\n')
	for i in range(len(sample_ids_file)):
		out.write('%s\t%s\t%s\t%s\n' % (sample_ids_file[i], total_fragments[i], peak_fragments[i], peak_fragments[i]/total_fragments[i]))
	out.close()	

Python SAMtools Pandas BEDTools From line 1 of main/FRP.py

shell: 
	'fastqc {input.fastq} -o {params}'

SnakeMake FastQC From line 51 of main/Snakefile

shell: 
	'fastqc {input.fastq} -o {params}'

SnakeMake FastQC From line 61 of main/Snakefile

shell:
	'fastp -w {threads} {params} -i {input.R1} -I {input.R2} -o {output.R1} -O {output.R2} --html {output.html} --json {output.json} 2> {log}'

SnakeMake fastp From line 78 of main/Snakefile

	shell:
		'bowtie2 {params} -x %s --threads {threads} -1 {input.R1} -2 {input.R2} 2> {log} | samtools view -bh -q 3 > alignment/{wildcards.sample}.bam' % (genome)

rule align_spike:
	input:
		R1='trim/{sample}_trimmed_R1.fastq.gz',
		R2='trim/{sample}_trimmed_R2.fastq.gz'

SnakeMake SAMtools Bowtie 2 From line 92 of main/Snakefile

	shell:
		'bowtie2 {params} -x %s --threads {threads} -1 {input.R1} -2 {input.R2} 2> {log} | samtools view -bh -q 3 > alignment/{wildcards.sample}_ecoli.bam' % (spike_genome)

rule sort:
	input:
		'alignment/{sample}.bam'

SnakeMake SAMtools Bowtie 2 From line 106 of main/Snakefile

shell:
	'samtools sort -@ {threads} {input} > {output}'

SnakeMake SAMtools From line 115 of main/Snakefile

shell:
	'samtools index -@ {threads} {input} > {output}'

SnakeMake SAMtools From line 124 of main/Snakefile

	shell:
		"""
		depth=`samtools view alignment/{wildcards.sample}_ecoli.bam | wc -l`
		depth=$((depth/2))
		echo $depth
		scale_fac=`echo "10000 / $depth" | bc -l`
		echo $scale_fac
		bedtools bamtobed -bedpe -i alignment/{wildcards.sample}.bam | cut -f 1,2,6 | sort -k1,1 -k2,2n -k3,3n > alignment/bed/{wildcards.sample}.bed
		"""
		'bedtools genomecov -bg -i alignment/bed/{wildcards.sample}.bed -scale $scale_fac -g %s > alignment/bed/{wildcards.sample}.bedgraph' % (chr_lens)

rule SEACR:
	input:
		exp='alignment/bed/{sample}_Antibody.bedgraph',
		con='alignment/bed/{sample}_Control.bedgraph'

SnakeMake BEDTools From line 134 of main/Snakefile

shell:
	'bash SEACR_1.3.sh {input.exp} {input.con} {params} peaks/{wildcards.sample}'

SnakeMake From line 153 of main/Snakefile

shell:
	"""
	samtools view {input} | awk -F'\t' 'function abs(x){{return ((x < 0.0) ? -x : x)}} {{print abs($9)}}' | sort | uniq -c | awk -v OFS="\t" '{{print $2, $1/2}}' > {output}
	"""