ENCODE pipeline for histone marks developed for the psychENCODE project
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in category topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Summary
psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project.
The orginal specs for the ENCODE pipeline can be found here : https://goo.gl/KqHjKH
This specific version is using the workflow system snakemake (https://bitbucket.org/snakemake/snakemake/wiki/Home)
Code Snippets
9 10 | shell: "bwa mem -M -t {threads} {config[genome]} {input} | gzip -c > {output}" |
11 12 | shell: "bwa mem -M -t {threads} {config[genome]} {input} | gzip -c > {output}" |
7 8 9 10 11 12 13 14 15 16 17 | shell: """ zcat {input} | \ awk 'BEGIN{{FS="\t"; OFS="\t"}} !/^@/ && $6!="*" {{ cigar=$6; gsub("[0-9]+D","",cigar); n=split(cigar, vals, "[A-Z]"); s=0; for(i=1;i<=n;i++) s=s+vals[i]; seqlen=length($10); if(s!=seqlen) print $1"\t";}}' | sort | uniq > {output} """ |
27 28 29 30 31 32 33 34 35 | run: with open("mapped_reads/" + wildcards.sample + ".badcigar") as fp: peek = [x for i,x in enumerate(fp) if i<10 and x.rstrip()] cigar_len=len(peek) shell("""if [ {cigar_len} -gt 0 ] then zcat {input.sam} | grep -vF -f {input.cigar} | \ samtools view -@ {threads} -Su - | \ samtools sort -@ {threads} -T {params.prefix} -o {output} - |
53 54 55 56 57 58 59 60 61 62 63 | shell: """ samtools view -@ {threads} -F1804 -f2 -q 20 -h \ -u {input} | \ samtools sort -@ {threads} -m {params.mem} -T {params.prefix} -n -o {output.tmp} - samtools fixmate -r {output.tmp} {output.clean} samtools view -@ {threads} -F1804 -f2 -u {output.clean} |\ samtools sort -@ {threads} -m {params.mem} -T {params.prefix}_2 -o {output.flt} - """ |
74 75 76 77 78 79 80 81 | shell: """ java -Xmx4G -jar {config[path]}/picard-tools-1.141/picard.jar MarkDuplicates\ INPUT={input} OUTPUT={output.temp} \ METRICS_FILE={output.qc} ASSUME_SORTED=true \ VALIDATION_STRINGENCY=LENIENT REMOVE_DUPLICATES=false """ |
93 94 95 96 97 98 99 100 101 102 | shell: """ samtools view -@ {threads} -h {input} |\ grep -vF -f {config[sponges]} - |\ samtools view -@ {threads} -b -o {output.flt} - samtools sort -@ {threads} -m {params.mem} -T {params.prefix} -n -o {output.sponge_temp} {output.flt} samtools fixmate -r {output.sponge_temp} {output.flt} """ |
116 117 118 119 120 121 122 123 124 125 | shell: """ samtools view -@ {threads} -F1804 -f2 -b -u {input} |\ samtools sort -@ {threads} -m {params.mem} -T {params.prefix}_2 -o {output.final_bam} - #samtools view -@ {threads} -F1804 -f2 -b -o {output.final_bam} {input} samtools index {output.final_bam} {output.final_bai} samtools sort -@ {threads} -m {params.mem} -T {params.prefix} -o {output.tmp_sort} -n {input} """ |
136 137 138 139 140 141 142 143 | shell: """ java -Xmx4G -jar {config[path]}/picard-tools-1.141/picard.jar MarkDuplicates\ INPUT={input} OUTPUT={output.flt} \ METRICS_FILE={output.qc} ASSUME_SORTED=true \ VALIDATION_STRINGENCY=LENIENT REMOVE_DUPLICATES=false """ |
155 156 157 158 159 160 | shell: """ samtools view -@ {threads} -F1804 -f2 -b -T {params.prefix} -o {output.final_bam} {input} samtools index {output.final_bam} {output.final_bai} """ |
173 174 175 176 177 178 | shell: """ samtools sort -@ {threads} -T {params.prefix} -o {output.tmp_sort} -n {input.final_bam} bedtools bamtobed -bedpe -mate1 -i {output.tmp_sort} |\ gzip -c > {output.bedpe} """ |
187 188 189 190 191 192 193 194 | shell: """zcat {input} | \ awk 'BEGIN{{OFS="\t"; FS="\t"}} {{ chrom=$1; beg=$2; end=$6; if($2>$5){{beg=$5}} if($3>$6){{end=$3}} print chrom,beg,end }}' - | {config[sort]} --parallel={threads} -S 2G -k1,1 -k2,2n | \ gzip -c > {output.bed}""" |
8 9 10 11 12 | shell: """ samtools view -@ {threads} -Su {input.sam} |\ samtools sort -@ {threads} -T {params.prefix} -o {output} - """ |
21 22 23 24 25 | shell: """ samtools view -@ {threads} -F 1804 -q 20 \ -u {input} -o {output.flt} """ |
36 37 38 39 40 41 42 43 | shell: """ java -Xmx4G -jar {config[path]}/picard-tools-1.141/picard.jar MarkDuplicates\ INPUT={input} OUTPUT={output.temp} \ METRICS_FILE={output.qc} ASSUME_SORTED=true \ VALIDATION_STRINGENCY=LENIENT REMOVE_DUPLICATES=false """ |
51 52 53 54 55 56 57 | shell: """ samtools view -@ {threads} -h {input} |\ grep -vF -f {config[sponges]} - |\ samtools view -@ {threads} -b -o {output.flt} - """ |
70 71 72 73 74 75 76 | shell: """ samtools view -@ {threads} -F1804 -b -o {output.final_bam} {input} samtools index {output.final_bam} {output.final_bai} samtools sort -@ {threads} -n -T {params.prefix} -o {output.tmp_sort} {input} """ |
87 88 89 90 91 92 93 94 | shell: """ java -Xmx4G -jar {config[path]}/picard-tools-1.141/picard.jar MarkDuplicates\ INPUT={input} OUTPUT={output.flt} \ METRICS_FILE={output.qc} ASSUME_SORTED=true \ VALIDATION_STRINGENCY=LENIENT REMOVE_DUPLICATES=false """ |
107 108 109 110 111 112 113 | shell: """ samtools view -@ {threads} -F1804 -b -o {output.final_bam} {input} samtools index {output.final_bam} {output.final_bai} samtools sort -@ {threads} -n -T {params.prefix} -o {output.tmp_sort} {input} """ |
122 123 124 125 126 127 | shell: """ bedtools bamtobed -i {input} | cut -f1-3 | \ {config[sort]} --parallel={threads} -k1,1 -k2,2n | \ gzip -c > {output.bed} """ |
9 | shell:"samtools flagstat {input.final} > {output.final_mapstats}" |
18 | shell:"samtools flagstat {input.raw} > {output.raw_mapstats}" |
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | shell: """ samtools sort -n -@ {threads} -T {params.prefix} -o {output.tmp_sort_bam} {input.flt} echo 1 | \ awk '{{print "#Total\tDistinct\tOne\tTwo\tNRF\tPBC1\tPBC2"}}' \ > {output.lib_complexity} bedtools bamtobed -bedpe -i {output.tmp_sort_bam} | \ awk 'BEGIN{{OFS="\t"}}{{print $1,$2,$4,$6,$9,$10}}' | \ grep -v 'chrM' | sort | uniq -c | \ awk 'BEGIN{{mt=0;m0=0;m1=0;m2=0; OFS="\t"}} ($1==1){{m1=m1+1}} ($1==2){{m2=m2+1}} {{m0=m0+1}} {{mt=mt+$1}} END{{print mt,m0,m1,m2,m0/mt,m1/m0,m1/m2}}'\ >> {output.lib_complexity} """ |
9 | shell: "samtools flagstat {input.final} > {output.final_mapstats}" |
18 | shell: "samtools flagstat {input.raw} > {output.raw_mapstats}" |
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | shell: """ samtools sort -n -@ {threads} -T {params.prefix} -o {output.tmp_sort_bam} {input.flt} echo 1 | \ awk '{{print "#Total\tDistinct\tOne\tTwo\tNRF\tPBC1\tPBC2"}}' \ > {output.lib_complexity} bedtools bamtobed -i {output.tmp_sort_bam} | \ awk 'BEGIN{{OFS="\t"}}{{print $1,$2,$3,$6}}' | \ grep -v 'chrM' | sort | uniq -c | \ awk 'BEGIN{{mt=0;m0=0;m1=0;m2=0; OFS="\t"}} ($1==1){{m1=m1+1}} ($1==2){{m2=m2+1}} {{m0=m0+1}} {{mt=mt+$1}} END{{if(m2>0) {{print mt,m0,m1,m2,m0/mt,m1/m0,m1/m2}}else{{print mt,m0,m1,m2,m0/mt,m1/m0,"-INF"}}}}'\ >> {output.lib_complexity} """ |
11 12 13 14 15 16 | shell: """ bamToBed -i {input} | awk 'BEGIN{{OFS="\\t"}}{{$4="N";$5="1000";print $0}}' |\ tee {output.temp_tagAlign} |\ gzip -c > {output.tagAlign} """ |
27 28 29 30 31 32 33 34 35 | shell: """ grep -v 'chrM' {input.temp_tagAlign} |\ shuf -n 15000000 |awk 'BEGIN{{OFS="\\t"}}{{$4="N";$5="1000";print $0}}' > {output.subsample} Rscript {config[path]}/run_spp_nodups.R -c={output.subsample} -p={threads} -filtchr=chrM -savp={output.cc_plot} -out={output.cc_score} > /dev/null 2>&1 sed -r 's/,[^\\t]+//g' {output.cc_score} > {output.temp_cc_score} cp {output.temp_cc_score} {output.cc_score} """ |
11 12 13 14 15 16 | shell: """ bamToBed -i {input} | awk 'BEGIN{{OFS="\\t"}}{{$4="N";$5="1000";print $0}}' |\ tee {output.temp_tagAlign} |\ gzip -c > {output.tagAlign} """ |
27 28 29 30 31 32 33 34 35 | shell: """ grep -v 'chrM' {input.temp_tagAlign} |\ shuf -n 15000000 > {output.subsample} Rscript {config[path]}/run_spp_nodups.R -c={output.subsample} -p={threads} -filtchr=chrM -savp={output.cc_plot} -out={output.cc_score} > /dev/null 2>&1 sed -r 's/,[^\\t]+//g' {output.cc_score} > {output.temp_cc_score} cp {output.temp_cc_score} {output.cc_score} """ |
13 14 15 16 17 18 19 20 21 22 23 | shell: """ #If the file is too big '--random-source' option will fail. That's why the '40000000' zcat {input} | shuf --random-source=<(zcat {input} | head -40000000) > {output.shuf} {config[split]} -d -nl/2 --additional-suffix=\".bed\" {output.shuf} bed_files/psr_{wildcards.sample}. {config[sort]} --parallel={threads} -S 2G \ -k1,1 -k2,2n {output.temp_psr1} | gzip -c > {output.psr1} {config[sort]} --parallel={threads} -S 2G \ -k1,1 -k2,2n {output.temp_psr2} | gzip -c > {output.psr2} """ |
32 33 34 35 | shell: """ zcat {input} | {config[sort]} --parallel={threads} -k1,1 -k2,2n | gzip -c > {output} """ |
47 48 49 50 51 | shell: """ zcat {input.input1} | {config[sort]} --parallel={threads} -k1,1 -k2,2n | gzip -c > {output.rep1} zcat {input.input2} | {config[sort]} --parallel={threads} -k1,1 -k2,2n | gzip -c > {output.rep2} """ |
59 60 61 62 | shell: """ zcat {input} | {config[sort]} --parallel={threads} -k1,1 -k2,2n | gzip -c > {output} """ |
32 33 34 35 36 37 38 | run: import os, sys, time,subprocess #Calling narrowPeaks shell(""" fraglen=`cat {input.cc_scores}| cut -f3` {config[macs2]} callpeak \ |
409 410 411 412 413 414 415 416 417 418 419 420 421 422 | run: import os, sys, time,subprocess sum_len = 0 n_files = 0 for files in {input.cc_scores}: for file in files: n_files+=1 with open(file,'r') as f: sum_len+=int(f.readlines()[0].split("\t")[2]) fraglen=int(sum_len/n_files) #Calling narrowPeaks shell(""" {config[macs2]} callpeak \ |
589 590 591 592 593 594 595 596 597 598 599 600 601 602 | run: import os, sys, time,subprocess sum_len = 0 n_files = 0 for files in {input.cc_scores}: for file in files: n_files+=1 with open(file,'r') as f: sum_len+=int(f.readlines()[0].split("\t")[2]) fraglen=int(sum_len/n_files) #Calling narrowPeaks shell(""" {config[macs2]} callpeak \ |
769 770 771 772 773 774 775 776 777 778 779 780 781 782 | run: import os, sys, time,subprocess sum_len = 0 n_files = 0 for files in {input.cc_scores}: for file in files: n_files+=1 with open(file,'r') as f: sum_len+=int(f.readlines()[0].split("\t")[2]) fraglen=int(sum_len/n_files) #Calling narrowPeaks shell(""" {config[macs2]} callpeak \ |
33 34 35 36 37 38 39 | run: import os, sys, time,subprocess #Calling narrowPeaks shell(""" fraglen=`cat {input.cc_scores}| cut -f3` {config[macs2]} callpeak \ |
412 413 414 415 416 417 418 419 420 421 422 423 424 425 | run: import os, sys, time,subprocess sum_len = 0 n_files = 0 for files in {input.cc_scores}: for file in files: n_files+=1 with open(file,'r') as f: sum_len+=int(f.readlines()[0].split("\t")[2]) fraglen=int(sum_len/n_files) #Calling narrowPeaks shell(""" {config[macs2]} callpeak \ |
32 33 34 35 36 37 38 | run: import os, sys, time,subprocess #Calling narrowPeaks shell(""" fraglen=`cat {input.cc_scores} | cut -f3` {config[macs2]} callpeak \ |
435 436 437 438 439 440 441 442 443 444 445 446 447 448 | run: import os, sys, time,subprocess sum_len = 0 n_files = 0 for files in {input.cc_scores}: for file in files: n_files+=1 with open(file,'r') as f: sum_len+=int(f.readlines()[0].split("\t")[2]) fraglen=int(sum_len/n_files) #Calling narrowPeaks shell(""" {config[macs2]} callpeak \ |
34 35 36 37 38 39 40 | run: import os, sys, time,subprocess #Calling narrowPeaks shell(""" fraglen=`cat {input.cc_scores} | cut -f3` {config[macs2]} callpeak \ |
413 414 415 416 417 418 419 420 421 422 423 424 425 426 | run: import os, sys, time,subprocess sum_len = 0 n_files = 0 for files in {input.cc_scores}: for file in files: n_files+=1 with open(file,'r') as f: sum_len+=int(f.readlines()[0].split("\t")[2]) fraglen=int(sum_len/n_files) #Calling narrowPeaks shell(""" {config[macs2]} callpeak \ |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | shell: """ bedtools intersect \ -a {input.narrowPeak} -b {input.narrowPeak_psr1} -f 0.50 -F 0.50 -e -u |\ bedtools intersect \ -a stdin -b {input.narrowPeak_psr2} -f 0.50 -F 0.50 -e -u > {output.narrow} sort -k1,1 -k2,2n -o finalPeaks/tmp_{params.prefix}_narrow_bb {output.narrow} {config[path]}/bedToBigBed -type='bed6+4' -as={config[path]}/narrowPeak.as finalPeaks/tmp_{params.prefix}_narrow_bb {config[chrom]} {output.narrowPeak_bb} rm -f finalPeaks/tmp_{params.prefix}_narrow_bb bedtools intersect \ -a {input.broadPeak} -b {input.broadPeak_psr1} -f 0.50 -F 0.50 -e -u |\ bedtools intersect \ -a stdin -b {input.broadPeak_psr2} -f 0.50 -F 0.50 -e -u > {output.broad} sort -k1,1 -k2,2n -o finalPeaks/tmp_{params.prefix}_broad_bb {output.broad} {config[path]}/bedToBigBed -type='bed6+4' -as={config[path]}/broadPeak.as finalPeaks/tmp_{params.prefix}_broad_bb {config[chrom]} {output.broadPeak_bb} rm -f finalPeaks/tmp_{params.prefix}_broad_bb bedtools intersect \ -a {input.gappedPeak} -b {input.gappedPeak_psr1} -f 0.50 -F 0.50 -e -u |\ bedtools intersect \ -a stdin -b {input.gappedPeak_psr2} -f 0.50 -F 0.50 -e -u > {output.gapped} sort -k1,1 -k2,2n -o finalPeaks/tmp_{params.prefix}_gapped_bb {output.gapped} {config[path]}/bedToBigBed -type='bed6+4' -as={config[path]}/gappedPeak.as finalPeaks/tmp_{params.prefix}_gapped_bb {config[chrom]} {output.gappedPeak_bb} rm -f finalPeaks/tmp_{params.prefix}_gapped_bb sleep 120 """ |
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | shell: """ bedtools intersect \ -a {input.narrowPeak} -b {input.narrowPeak_psr1} -f 0.50 -F 0.50 -e -u |\ bedtools intersect \ -a stdin -b {input.narrowPeak_psr2} -f 0.50 -F 0.50 -e -u > {output.narrow} sort -k1,1 -k2,2n -o finalPeaks/tmp_{params.prefix}_narrow_bb {output.narrow} {config[path]}/bedToBigBed -type='bed6+4' -as={config[path]}/narrowPeak.as finalPeaks/tmp_{params.prefix}_narrow_bb {config[chrom]} {output.narrowPeak_bb} rm -f finalPeaks/tmp_{params.prefix}_narrow_bb bedtools intersect \ -a {input.broadPeak} -b {input.broadPeak_psr1} -f 0.50 -F 0.50 -e -u |\ bedtools intersect \ -a stdin -b {input.broadPeak_psr2} -f 0.50 -F 0.50 -e -u > {output.broad} sort -k1,1 -k2,2n -o finalPeaks/tmp_{params.prefix}_broad_bb {output.broad} {config[path]}/bedToBigBed -type='bed6+4' -as={config[path]}/broadPeak.as finalPeaks/tmp_{params.prefix}_broad_bb {config[chrom]} {output.broadPeak_bb} rm -f finalPeaks/tmp_{params.prefix}_broad_bb bedtools intersect \ -a {input.gappedPeak} -b {input.gappedPeak_psr1} -f 0.50 -F 0.50 -e -u |\ bedtools intersect \ -a stdin -b {input.gappedPeak_psr2} -f 0.50 -F 0.50 -e -u > {output.gapped} sort -k1,1 -k2,2n -o finalPeaks/tmp_{params.prefix}_gapped_bb {output.gapped} {config[path]}/bedToBigBed -type='bed6+4' -as={config[path]}/gappedPeak.as finalPeaks/tmp_{params.prefix}_gapped_bb {config[chrom]} {output.gappedPeak_bb} rm -f finalPeaks/tmp_{params.prefix}_gapped_bb sleep 120 """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | shell: """ bedtools intersect \ -a {input.narrowPeak} -b {input.narrowPeak_psr1} -f 0.50 -F 0.50 -e -u |\ bedtools intersect \ -a stdin -b {input.narrowPeak_psr2} -f 0.50 -F 0.50 -e -u > {output.narrow} sort -k1,1 -k2,2n -o finalPeaks/tmp_{params.prefix}_narrow_bb {output.narrow} {config[path]}/bedToBigBed -type='bed6+4' -as={config[path]}/narrowPeak.as finalPeaks/tmp_{params.prefix}_narrow_bb {config[chrom]} {output.narrowPeak_bb} rm -f finalPeaks/tmp_{params.prefix}_narrow_bb bedtools intersect \ -a {input.broadPeak} -b {input.broadPeak_psr1} -f 0.50 -F 0.50 -e -u |\ bedtools intersect \ -a stdin -b {input.broadPeak_psr2} -f 0.50 -F 0.50 -e -u > {output.broad} sort -k1,1 -k2,2n -o finalPeaks/tmp_{params.prefix}_broad_bb {output.broad} {config[path]}/bedToBigBed -type='bed6+4' -as={config[path]}/broadPeak.as finalPeaks/tmp_{params.prefix}_broad_bb {config[chrom]} {output.broadPeak_bb} rm -f finalPeaks/tmp_{params.prefix}_broad_bb bedtools intersect \ -a {input.gappedPeak} -b {input.gappedPeak_psr1} -f 0.50 -F 0.50 -e -u |\ bedtools intersect \ -a stdin -b {input.gappedPeak_psr2} -f 0.50 -F 0.50 -e -u > {output.gapped} sort -k1,1 -k2,2n -o finalPeaks/tmp_{params.prefix}_gapped_bb {output.gapped} {config[path]}/bedToBigBed -type='bed6+4' -as={config[path]}/gappedPeak.as finalPeaks/tmp_{params.prefix}_gapped_bb {config[chrom]} {output.gappedPeak_bb} rm -f finalPeaks/tmp_{params.prefix}_gapped_bb sleep 120 """ |
11 12 13 14 15 16 | shell: """ bwa mem -M -t {threads} {config[genome]} {input} | gzip -c > {output.sam} samtools view -@ {threads} -Su {output.sam} | samtools sort -@ {threads} -T {params.prefix} -o {output.bam} """ |
9 10 11 12 13 | shell: """ samtools view -@ {threads} -F 1804 -q 20 \ -b {input} -o {output.flt} """ |
24 25 26 27 28 29 30 31 | shell: """ java -Xmx4G -jar {config[path]}/picard-tools-1.141/picard.jar MarkDuplicates\ INPUT={input} OUTPUT={output.temp} \ METRICS_FILE={output.qc} ASSUME_SORTED=true \ VALIDATION_STRINGENCY=LENIENT REMOVE_DUPLICATES=false """ |
39 40 41 42 43 44 45 | shell: """ samtools view -@ {threads} -h {input} |\ grep -vF -f {config[sponges]} - |\ samtools view -@ {threads} -b -o {output.flt} - """ |
58 59 60 61 62 63 64 | shell: """ samtools view -@ {threads} -F1804 -b -o {output.final_bam} {input} samtools index {output.final_bam} {output.final_bai} samtools sort -@ {threads} -n -T {params.prefix} -o {output.tmp_sort} {input} """ |
75 76 77 78 79 80 81 82 | shell: """ java -Xmx4G -jar {config[path]}/picard-tools-1.141/picard.jar MarkDuplicates\ INPUT={input} OUTPUT={output.flt} \ METRICS_FILE={output.qc} ASSUME_SORTED=true \ VALIDATION_STRINGENCY=LENIENT REMOVE_DUPLICATES=false """ |
95 96 97 98 99 100 101 | shell: """ samtools view -@ {threads} -F1804 -b -o {output.final_bam} {input} samtools index {output.final_bam} {output.final_bai} samtools sort -@ {threads} -n -T {params.prefix} -o {output.tmp_sort} {input} """ |
110 111 112 113 114 115 116 | shell: """ bedtools bamtobed -i {input} | \ awk 'BEGIN{{OFS="\t"}}{{$4="N";$5="1000";print $0}}' | \ {config[sort]} --parallel={threads} -k1,1 -k2,2n | \ gzip -nc > {output.tagAlign} """ |
9 | shell: "samtools flagstat {input.final} > {output.final_mapstats}" |
18 | shell: "samtools flagstat {input.raw} > {output.raw_mapstats}" |
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | shell: """ samtools sort -n -@ {threads} -T {params.prefix} -o {output.tmp_sort_bam} {input.flt} echo 1 | \ awk '{{print "#TotalReadPairs\tDistinctReadPairs\tOneReadPair\tTwoReadPairs\tNRF=Distinct/Total\tPBC1=OnePair/Distinct\tPBC2=OnePair/TwoPair"}}' \ > {output.lib_complexity} bedtools bamtobed -i {output.tmp_sort_bam} | \ awk 'BEGIN{{OFS="\t"}}{{print $1,$2,$3,$6}}' | \ grep -v 'chrM' | sort | uniq -c | \ awk 'BEGIN{{mt=0;m0=0;m1=0;m2=0; OFS="\t"}} ($1==1){{m1=m1+1}} ($1==2){{m2=m2+1}} {{m0=m0+1}} {{mt=mt+$1}} END{{if(m2>0) {{print mt,m0,m1,m2,m0/mt,m1/m0,m1/m2}}else{{print mt,m0,m1,m2,m0/mt,m1/m0,"-INF"}}}}'\ >> {output.lib_complexity} """ |
10 11 12 13 14 15 16 17 18 19 20 21 | shell: """ NREADS=15000000 zcat {input.tagAlign} | grep -v 'chrM' |\ shuf -n $NREADS | gzip -nc > {output.subsample} Rscript {config[path]}/run_spp_nodups.R -c={output.subsample} -p={threads} -filtchr=chrM -savp={output.cc_plot} -out={output.cc_score} > /dev/null 2>&1 sed -r 's/,[^\\t]+//g' {output.cc_score} > {output.cc_score}.tmp mv {output.cc_score}.tmp {output.cc_score} sleep 120 """ |
13 14 15 16 17 18 19 20 21 22 23 | shell: """ #If the file is too big '--random-source' option will fail. That's why the '40000000' zcat {input} | shuf --random-source=<(zcat {input} | head -40000000) > {output.shuf} {config[split]} -d -nl/2 --additional-suffix=\".tagAlign\" {output.shuf} bed_files/psr_{wildcards.sample}. {config[sort]} --parallel={threads} -S 2G \ -k1,1 -k2,2n {output.temp_psr1} | gzip -c > {output.psr1} {config[sort]} --parallel={threads} -S 2G \ -k1,1 -k2,2n {output.temp_psr2} | gzip -c > {output.psr2} """ |
32 33 34 35 | shell: """ zcat {input} | {config[sort]} --parallel={threads} -k1,1 -k2,2n | gzip -c > {output} """ |
43 44 45 46 | shell: """ zcat {input} | {config[sort]} --parallel={threads} -k1,1 -k2,2n | gzip -c > {output} """ |
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | shell: """ fraglen=`cat {input.cc_scores} | cut -f3` {config[macs2]} callpeak \ -t {input.ip} -c {input.control}\ -f BED -n peaks/{params.prefix}\ -g {config[macs_g]} -p 1e-2 --nomodel --shift 0 --extsize $fraglen \ --keep-dup all -B --SPMR sort -k 8gr,8gr {output.narrowPeak} |\ awk 'BEGIN{{OFS="\t"}}{{$4="Peak_"NR ; print $0}}'| \ gzip -nc > {output.narrowPeak_compressed} rm -f peaks/{params.prefix}_peaks.xls peaks/{params.prefix}_summits.bed touch {output.checkpoint} """ |
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | shell: """ fraglen=`cat {input.cc_scores} | cut -f3` {config[macs2]} callpeak \ -t {input.ip_psr1} -c {input.control}\ -f BED -n peaks/{params.prefix}\ -g {config[macs_g]} -p 1e-2 --nomodel --shift 0 --extsize $fraglen \ --keep-dup all -B --SPMR sort -k 8gr,8gr {output.narrowPeak_psr1} |\ awk 'BEGIN{{OFS="\t"}}{{$4="Peak_"NR ; print $0}}'| \ gzip -nc > {output.narrowPeak_psr1_compressed} rm -f peaks/{params.prefix}_peaks.xls peaks/{params.prefix}_summits.bed {config[macs2]} callpeak \ -t {input.ip_psr2} -c {input.control}\ -f BED -n peaks/{params.prefix_2}\ -g {config[macs_g]} -p 1e-2 --nomodel --shift 0 --extsize $fraglen \ --keep-dup all -B --SPMR sort -k 8gr,8gr {output.narrowPeak_psr2} |\ awk 'BEGIN{{OFS="\t"}}{{$4="Peak_"NR ; print $0}}'| \ gzip -nc > {output.narrowPeak_psr2_compressed} rm -f peaks/{params.prefix_2}_peaks.xls peaks/{params.prefix_2}_summits.bed touch {output.checkpoint} """ |
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | shell: """ fraglen=`cat {input.cc_scores} | cut -f3` {config[macs2]} callpeak \ -t {input.ip} -c {input.control}\ -f BED -n peaks/{params.prefix}\ -g {config[macs_g]} -p 1e-2 --broad --nomodel --shift 0 --extsize $fraglen \ --keep-dup all sort -k 8gr,8gr {output.broadPeak} | \ awk 'BEGIN{{OFS="\t"}}{{$4="Peak_"NR ; print $0}}'| \ gzip -nc > {output.broadPeak_compressed} sort -k 14gr,14gr {output.broadPeak} | \ awk 'BEGIN{{OFS="\t"}}{{$4="Peak_"NR ; print $0}}'| \ gzip -nc > {output.gappedPeak_compressed} rm -f peaks/{params.prefix}_peaks.xls peaks/{params.prefix}_summits.bed """ |
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | shell: """ {config[macs2]} bdgcmp \ -t {input.treat} -c {input.control_lambda} \ --outdir peaks/signals -o {params.prefix}_FE.bdg -m FE bedtools slop -i {output.fe_bdg} -g {config[chrom]} -b 0 |\ awk '{{ if($3!=-1) print $0}}' | \ {config[path]}/bedClip stdin {config[chrom]} {output.fc_signal_bdg} {config[path]}/bedGraphToBigWig {output.fc_signal_bdg} {config[chrom]} {output.fc_signal} chipReads=$(zcat {input.ip} | wc -l | awk '{{printf "%f", $1/1000000}}'); controlReads=$(zcat {input.control} | wc -l | awk '{{printf "%f", $1/1000000}}'); sval=$(echo "${{chipReads}} ${{controlReads}}" | awk '$1>$2{{printf "%f",$2}} $1<=$2{{printf "%f",$1}}'); macs2 bdgcmp -t {input.treat} -c {input.control_lambda} --outdir peaks/signals -o {params.prefix}_ppois.bdg -m ppois -S ${{sval}} slopBed -i {output.ppois} -g {config[chrom]} -b 0 | \ awk '{{if ($3 != -1) print $0}}' | \ bedClip stdin {config[chrom]} {output.pval_bdg} bedGraphToBigWig {output.pval_bdg} {config[chrom]} {output.pval} """ |
SnakeMake
BEDTools
macs2
bedGraphToBigWig
ucsc-bedclip
From
line
145
of
TF/5_peak_calling_macs_tf_single
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 | shell: """ fraglen=`cat {input.cc_scores} | cut -f3` {config[macs2]} callpeak \ -t {input.ip_psr1} -c {input.control}\ -f BED -n peaks/{params.prefix}\ -g {config[macs_g]} -p 1e-2 --broad --nomodel --shift 0 --extsize $fraglen \ --keep-dup all sort -k 8gr,8gr {output.broadPeak_psr1} | \ awk 'BEGIN{{OFS="\t"}}{{$4="Peak_"NR ; print $0}}'| \ gzip -nc > {output.broadPeak_psr1_compressed} sort -k 14gr,14gr {output.broadPeak_psr1} | \ awk 'BEGIN{{OFS="\t"}}{{$4="Peak_"NR ; print $0}}'| \ gzip -nc > {output.gappedPeak_psr1_compressed} rm -f peaks/{params.prefix}_peaks.xls peaks/{params.prefix}_summits.bed {config[macs2]} callpeak \ -t {input.ip_psr2} -c {input.control}\ -f BED -n peaks/{params.prefix_2}\ -g {config[macs_g]} -p 1e-2 --broad --nomodel --shift 0 --extsize $fraglen \ --keep-dup all sort -k 8gr,8gr {output.broadPeak_psr2} | \ awk 'BEGIN{{OFS="\t"}}{{$4="Peak_"NR ; print $0}}'| \ gzip -nc > {output.broadPeak_psr2_compressed} sort -k 14gr,14gr {output.broadPeak_psr2} | \ awk 'BEGIN{{OFS="\t"}}{{$4="Peak_"NR ; print $0}}'| \ gzip -nc > {output.gappedPeak_psr2_compressed} rm -f peaks/{params.prefix_2}_peaks.xls peaks/{params.prefix_2}_summits.bed """ |
253 254 255 256 257 258 259 260 261 262 263 264 265 | run: import os, sys, time,subprocess sum_len = 0 n_files = 0 for files in {input.cc_scores}: for file in files: n_files+=1 with open(file,'r') as f: sum_len+=int(f.readlines()[0].split("\t")[2]) fraglen=int(sum_len/n_files) shell(""" {config[macs2]} callpeak \ |
295 296 297 298 299 300 301 302 303 304 305 306 307 308 | run: import os, sys, time,subprocess sum_len = 0 n_files = 0 for files in {input.cc_scores}: for file in files: n_files+=1 with open(file,'r') as f: sum_len+=int(f.readlines()[0].split("\t")[2]) fraglen=int(sum_len/n_files) shell(""" {config[macs2]} callpeak \ |
347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 | shell: """ {config[macs2]} bdgcmp \ -t {input.treat} -c {input.control_lambda} \ --outdir peaks/signals -o {params.prefix}_FE.bdg -m FE bedtools slop -i {output.fe_bdg} -g {config[chrom]} -b 0 |\ awk '{{ if($3!=-1) print $0}}' | \ {config[path]}/bedClip stdin {config[chrom]} {output.fc_signal_bdg} {config[path]}/bedGraphToBigWig {output.fc_signal_bdg} {config[chrom]} {output.fc_signal} chipReads=$(zcat {input.ip} | wc -l | awk '{{printf "%f", $1/1000000}}'); controlReads=$(zcat {input.control} | wc -l | awk '{{printf "%f", $1/1000000}}'); sval=$(echo "${{chipReads}} ${{controlReads}}" | awk '$1>$2{{printf "%f",$2}} $1<=$2{{printf "%f",$1}}'); macs2 bdgcmp -t {input.treat} -c {input.control_lambda} --outdir peaks/signals -o {params.prefix}_ppois.bdg -m ppois -S ${{sval}} slopBed -i {output.ppois} -g {config[chrom]} -b 0 | \ awk '{{if ($3 != -1) print $0}}' | \ bedClip stdin {config[chrom]} {output.pval_bdg} bedGraphToBigWig {output.pval_bdg} {config[chrom]} {output.pval} """ |
SnakeMake
BEDTools
macs2
bedGraphToBigWig
ucsc-bedclip
From
line
347
of
TF/5_peak_calling_macs_tf_single
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | shell: """ fraglen=`cat {input.cc_scores} | cut -f3` #REP Rscript {config[path]}/run_spp.R -c={input.ip} -i={input.control} -p={threads} -npeak=300000 -odir=peaks_spp/{params.prefix} -speak=$fraglen -savr -savp -rf -out=peaks_spp/{params.prefix}/{params.prefix}.tagAlign.ccscores zcat {output.narrowPeak_raw} | \ awk 'BEGIN{{OFS="\t"}}{{ if ($2<0) $2=0; print $1,int($2),int($3),$4,$5,$6,$7,$8,$9,$10;}}' | \ gzip -f -nc > {output.narrowPeak_unfiltered} bedtools intersect -v -a <(zcat -f {output.narrowPeak_unfiltered}) -b <(zcat -f {config[path]}/blacklist_hg19.bed.gz) | \ awk 'BEGIN{{OFS="\t"}} {{if ($5>1000) $5=1000; print $0}}' | \ grep -P 'chr[\dXY]+[ \t]' | \ gzip -nc > {output.narrowPeak_final} #SELF-PSEUDOREP1 Rscript {config[path]}/run_spp.R -c={input.ip_psr1} -i={input.control} -p={threads} -npeak=300000 -odir=peaks_spp/psr00_{params.prefix} -speak=$fraglen -savr -savp -rf -out=peaks_spp/psr00_{params.prefix}/psr_{params.prefix}.00.tagAlign.ccscores zcat {output.narrowPeak_raw_0} | \ awk 'BEGIN{{OFS="\t"}}{{ if ($2<0) $2=0; print $1,int($2),int($3),$4,$5,$6,$7,$8,$9,$10;}}' | \ gzip -f -nc > {output.narrowPeak_unfiltered_0} bedtools intersect -v -a <(zcat -f {output.narrowPeak_unfiltered_0}) -b <(zcat -f {config[path]}/blacklist_hg19.bed.gz) | \ awk 'BEGIN{{OFS="\t"}} {{if ($5>1000) $5=1000; print $0}}' | \ grep -P 'chr[\dXY]+[ \t]' | \ gzip -nc > {output.narrowPeak_final_0} #SELF-PSEUDOREP2 Rscript {config[path]}/run_spp.R -c={input.ip_psr2} -i={input.control} -p={threads} -npeak=300000 -odir=peaks_spp/psr01_{params.prefix} -speak=$fraglen -savr -savp -rf -out=peaks_spp/psr01_{params.prefix}/psr_{params.prefix}.01.tagAlign.ccscores zcat {output.narrowPeak_raw_1} | \ awk 'BEGIN{{OFS="\t"}}{{ if ($2<0) $2=0; print $1,int($2),int($3),$4,$5,$6,$7,$8,$9,$10;}}' | \ gzip -f -nc > {output.narrowPeak_unfiltered_1} bedtools intersect -v -a <(zcat -f {output.narrowPeak_unfiltered_1}) -b <(zcat -f {config[path]}/blacklist_hg19.bed.gz) | \ awk 'BEGIN{{OFS="\t"}} {{if ($5>1000) $5=1000; print $0}}' | \ grep -P 'chr[\dXY]+[ \t]' | \ gzip -nc > {output.narrowPeak_final_1} """ |
17 18 19 20 21 22 23 24 25 26 27 28 29 30 | shell: """ idr --samples {input.narrowPeak_psr1} {input.narrowPeak_psr2} --peak-list {input.narrowPeak} --input-file-type narrowPeak --output-file idr/{params.prefix} --rank signal.value --soft-idr-threshold {params.threshold} --plot --use-best-multisummit-IDR IDR_THRESH_TRANSFORMED=`awk -v p={params.threshold} 'BEGIN{{print -log(p)/log(10)}}'` awk 'BEGIN{{OFS="\t"}} $12>='"$IDR_THRESH_TRANSFORMED"' {{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10}}' idr/{params.prefix}.overlapped-peaks.txt | sort | uniq | sort -k7n,7n | gzip -nc > {output.idr_raw} NPEAKS_IDR=`zcat {output.idr_raw} | wc -l` bedtools intersect -v -a {output.idr_raw} -b {config[path]}/blacklist_hg19.bed.gz | grep -P 'chr[\dXY]+[ \t]' | awk 'BEGIN{{OFS="\t"}} {{if ($5>1000) $5=1000; print $0}}| gzip -nc > {output.idr_final} sleep 120 """ |
14 15 16 17 18 19 20 21 22 | shell: """ bedtools intersect \ -a {input.narrowPeak} -b {input.narrowPeak_psr1} -f 0.50 -F 0.50 -e -u|\ bedtools intersect \ -a stdin -b {input.narrowPeak_psr2} -f 0.50 -F 0.50 -e -u > {output.narrow} sleep 120 """ |
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | shell: """ bedtools intersect \ -a {input.broadPeak} -b {input.broadPeak_psr1} -f 0.50 -F 0.50 -e -u|\ bedtools intersect \ -a stdin -b {input.broadPeak_psr2} -f 0.50 -F 0.50 -e -u > {output.broad} bedtools intersect \ -a {input.gappedPeak} -b {input.gappedPeak_psr1} -f 0.50 -F 0.50 -e -u|\ bedtools intersect \ -a stdin -b {input.gappedPeak_psr2} -f 0.50 -F 0.50 -e -u > {output.gapped} sleep 120 """ |
Support
Do you know this workflow well? If so, you can
request seller status , and start supporting this workflow.
Created: 1yr ago
Updated: 1yr ago
Maitainers:
public
URL:
https://github.com/weng-lab/psychip_snakemake
Name:
psychip_snakemake
Version:
1
Downloaded:
0
Copyright:
Public Domain
License:
GNU General Public License v3.0
Keywords:
- Future updates
Related Workflows
Near-real time tracking of SARS-CoV-2 in Connecticut
Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...
ATLAS - Three commands to start analyzing your metagenome data
Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...
raw sequence reads
Genome assembly
Annotation track
checkm2
gunc
prodigal
snakemake-wrapper-utils
MEGAHIT
Atlas
BBMap
Biopython
BioRuby
Bwa-mem2
cd-hit
CheckM
DAS
Diamond
eggNOG-mapper v2
MetaBAT 2
Minimap2
MMseqs
MultiQC
Pandas
Picard
pyfastx
SAMtools
SemiBin
Snakemake
SPAdes
SqueezeMeta
TADpole
VAMB
CONCOCT
ete3
gtdbtk
h5py
networkx
numpy
plotly
psutil
utils
metagenomics
RNA-seq workflow using STAR and DESeq2
This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...
This Snakemake pipeline implements the GATK best-practices workflow
This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...
Light-weight Snakemake workflow for preprocessing and statistical analysis of RNA-seq data
ARMOR ( A utomated R eproducible MO dular R NA-seq) is a Snakemake workflow , aimed at performing a ty...