ChIP-seq Pipeline

public public 1yr ago Version: Version 1 0 bookmarks

ChIP-Seq pipeline

Here we provide the tools to perform paired end or single read ChIP-Seq analysis including raw data quality control, read mapping, peak calling, differential binding analysis and functional annotation. As input files you may use either zipped fastq-files (.fastq.gz) or mapped read data (.bam files). In case of paired end reads, corresponding fastq files should be named using .R1.fastq.gz and .R2.fastq.gz suffixes.

Pipeline Workflow

All analysis steps are illustrated in the pipeline flowchart . Specify the desired analysis details for your data in the essential.vars.groovy file (see below) and run the pipeline chipseq.pipeline.groovy as described here . A markdown file ChIPreport.Rmd will be generated in the output reports folder after running the pipeline. Subsequently, the ChIPreport.Rmd file can be converted to a final html report using the knitr R-package.

The pipelines includes

  • raw data quality control with FastQC, BamQC and MultiQC
  • mapping reads or read pairs to the reference genome using bowtie2 (default) or bowtie1
  • filter out multimapping reads from bowtie2 output with samtools (optional)
  • identify and remove duplicate reads with Picard MarkDuplicates (optional)
  • generation of bigWig tracks for visualisation of alignment with deeptools bamCoverage. For single end design, reads are extended to the average fragment size
  • characterization of insert size using Picard CollectInsertSizeMetrics (for paired end libraries only)
  • characterize library complexity by PCR Bottleneck Coefficient using the GenomicAlignments R-package (for single read libraries only)
  • characterize phantom peaks by cross correlation analysis using the spp R-package (for single read libraries only)
  • peak calling of IP samples vs. corresponding input controls using MACS2
  • peak annotation using the ChIPseeker R-package (optional)
  • differential binding analysis using the diffbind R-package (optional). For this, input peak files must be given in NGSpipe2go/tools/diffbind/targets_diffbind.txt and contrasts of interest in NGSpipe2go/tools/diffbind/contrasts_diffbind.txt (see below)

Code Snippets

24
25
26
27
28
29
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    Rscript ${PIPELINE_ROOT}/tools/BlackList_Filter/BlackList_Filter.R $BLACKLIST_FILTER_FLAGS;
""","blacklist_filter"
37
38
39
40
41
42
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    bowtie $BOWTIE_FLAGS ${bowtie1_vars.ref} $BOWTIE_INPUT | samtools view $SAMTOOLS_VIEW_FLAGS - | samtools sort $SAMTOOLS_SORT_FLAGS -T \${TMP}/\$(basename $output.prefix)_bowtie1_sort - > $output
""","bowtie1"
37
38
39
40
41
42
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    bowtie2 $BOWTIE2_FLAGS $BOWTIE2_INPUT | samtools view $SAMTOOLS_VIEW_FLAGS - | samtools sort $SAMTOOLS_SORT_FLAGS -T \${TMP}/\$(basename $output.prefix) - > $output;
""","bowtie2"
41
42
43
44
45
46
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    Rscript ${PIPELINE_ROOT}/tools/diffbind/diffbind2.R $DIFFBIND_FLAGS
""","diffbind2"
47
48
49
50
51
52
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    Rscript ${PIPELINE_ROOT}/tools/diffbind/diffbind3.R $DIFFBIND_FLAGS
""","diffbind3"
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    samtools view $FILBOWTIE2_FLAGS -bhu ${input} | samtools sort -@ $filbowtie2unique_vars.samtools_threads -T \${TMP}/\$(basename $output.prefix) -o ${input.prefix}_mmremoved.bam - &&
    if [[ "${dupremoval_vars.remove_pcr_dups}" == "true" ]]; then
        echo "Removing PCR duplicates";
        bam dedup --in ${input.prefix}_mmremoved.bam --out ${output} --log ${LOGS}/filbowtie2unique/\$(basename ${output.prefix})_dupmetrics.log --noPhoneHome --force --rmDups;
        rm ${input.prefix}_mmremoved.bam;
    else
        echo "Not removing the PCR duplicates";
        mv ${input.prefix}_mmremoved.bam ${output};
    fi;

""","filbowtie2unique"
26
27
28
29
30
31
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    Rscript ${PIPELINE_ROOT}/tools/GO_Enrichment/GREAT.R $GREAT_FLAGS
""","GREAT"
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    touch $output;
    if [ ! -e ${ipstrength_vars.targets} ]; then
        echo "Targets file ${ipstrength_vars.targets} doesn't exist" >> $output &&
        exit 0;
    fi;

    BAM=\$(basename $input) &&           
    extension="\${BAM#*.}" &&
    BAM="\${BAM%%.*}" &&
    grep "^\$BAM" ${ipstrength_vars.targets} | while read -r TARGET; do
        IP=\$(       echo $TARGET | tr '\t' ' ' | cut -f1 -d" ").\$extension &&
        IPname=\$(   echo $TARGET | tr '\t' ' ' | cut -f2 -d" ") &&
        INPUT=\$(    echo $TARGET | tr '\t' ' ' | cut -f3 -d" ").\$extension &&
        INPUTname=\$(echo $TARGET | tr '\t' ' ' | cut -f4 -d" ");

        if [ "\$BAM" != "\$INPUT" ]; then
            if [ "\$INPUT" != "none.\$extension" ]; then
              echo "\${IPname} vs \${INPUTname}" >> $output ;
              Rscript ${PIPELINE_ROOT}/tools/ENCODEqc/IPstrength.R ${ipstrength_vars.mapped}/\$IP \$IPname ${ipstrength_vars.mapped}/\$INPUT \$INPUTname $subdir\${IPname}.vs.\${INPUTname}_ipstrength ${ipstrength_vars.bsgenome};
              if [ \$? -ne 0 ]; then rm $output; fi;
              find . -maxdepth 1 -name "$subdir\${IPname}.vs.\${INPUTname}_ipstrength*" -exec sh -c 'mv "\$1" "$output.dir/\${1#./$subdir}"' _ {} \\;;
            else
              echo "\${IPname} skipped because no input" >> $output ;
          fi;
        fi;
    done
""","ipstrength"
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    touch $output;
    if [ ! -e ${macs2_vars.targets} ]; then
        echo "Targets file ${macs2_vars.targets} doesn't exist" >> $output &&
        exit 0;
    fi;

    BAM=\$(basename $input) &&           
    extension="\${BAM#*.}" &&
    BAM="\${BAM%%.*}" &&
    grep "^\$BAM" ${macs2_vars.targets} | while read -r TARGET; do
        IP=\$(       echo $TARGET | tr '\t' ' ' | cut -f1 -d" ").\$extension &&
        IPname=\$(   echo $TARGET | tr '\t' ' ' | cut -f2 -d" ") &&
        INPUT=\$(    echo $TARGET | tr '\t' ' ' | cut -f3 -d" ").\$extension &&
        INPUTname=\$(echo $TARGET | tr '\t' ' ' | cut -f4 -d" ");

        if [ "\$BAM" != "\$INPUT" ]; then
            if [ "\$INPUT" != "none.\$extension" ]; then
                NAMEoutput="\${IPname}.vs.\${INPUTname}" &&
                INPUTflag="-c ${macs2_vars.mapped}/\$INPUT";
            else
                NAMEoutput="\${IPname}" &&
                INPUTflag="";
            fi &&   
            echo "\$NAMEoutput" >> $output &&
            macs2 callpeak -t ${macs2_vars.mapped}/\$IP \$INPUTflag -n $subdir\${NAMEoutput}_macs2 $MACS2_FLAGS ;
            if [ \$? -ne 0 ]; then rm $output; fi ;
            find . -maxdepth 1 -name "$subdir\${NAMEoutput}_macs2*" -exec sh -c 'mv "\$1" "$output.dir/\${1#./$subdir}"' _ {} \\;;
        fi;
    done
""","macs2"
24
25
26
27
28
29
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    Rscript ${PIPELINE_ROOT}/tools/BlackList_Filter/make_greylist.R $MAKE_GREYLIST_FLAGS;
""","make_greylist"
15
16
17
18
19
20
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    Rscript ${PIPELINE_ROOT}/tools/ENCODEqc/PBC.R $input && mv ${input.prefix}_PBC.csv $output.dir
""","pbc"
Bpipe From line 15 of ChIPseq/pbc.groovy
26
27
28
29
30
31
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    Rscript ${PIPELINE_ROOT}/tools/Peak_Annotation/Peak_Annotation.R $PEAK_ANNOTATION_FLAGS;
""","peak_annotation"
23
24
25
26
27
28
29
30
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    Rscript ${PIPELINE_ROOT}/tools/ENCODEqc/phantompeak.R $input \$(basename $input.prefix) $PHANTOMPEAK_FLAGS &&

    mv \$(basename $input.prefix)_phantompeak.* $output.dir
""","phantompeak"
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
       exec """
           ${PREAMBLE} &&

           cp ${PIPELINE_ROOT}/tools/reports/shiny_chipseq_reporting_tool/ChIP.shinyrep.helpers.R ${REPORTS} &&
           cp ${PIPELINE_ROOT}/tools/reports/shiny_chipseq_reporting_tool/styles.css ${REPORTS}              &&

           if [ -e "${REPORTS}/ChIPreport.Rmd" ]; then
               echo 'ChIPreport.Rmd already exists. Older copy will be kept and not overwritten';
           else
               cp ${PIPELINE_ROOT}/tools/reports/shiny_chipseq_reporting_tool/ChIPreport.Rmd ${REPORTS};
           fi &&

           PROJECT=\$(basename ${shinyReports_vars.project})                              &&
           sed -i "2,2s/SHINYREPS_PROJECT/\${PROJECT}/" ${REPORTS}/ChIPreport.Rmd &&

           echo "SHINYREPS_PROJECT=${shinyReports_vars.project}" >  $output &&
           echo "SHINYREPS_LOG=${shinyReports_vars.log}"         >> $output &&
           echo "SHINYREPS_QC=${shinyReports_vars.qc}"           >> $output &&
           echo "SHINYREPS_RES=${shinyReports_vars.res}"         >> $output &&
           echo "SHINYREPS_TARGET=${shinyReports_vars.target}"   >> $output &&
           echo "SHINYREPS_PAIRED=${shinyReports_vars.paired}"      >> $output &&
           echo "SHINYREPS_RUN_CUTADAPT=${shinyReports_vars.run_cutadapt}"  >> $output &&
           echo "SHINYREPS_RUN_PEAK_ANNOTATION=${shinyReports_vars.run_peakanno}"  >> $output &&
           echo "SHINYREPS_RUN_DIFFBIND=${shinyReports_vars.run_diffbind}"  >> $output &&
           echo "SHINYREPS_RUN_ENRICHMENT=${shinyReports_vars.run_enrich}"  >> $output &&
	    echo "SHINYREPS_CUTADAPT_STATS=${shinyReports_vars.cutadapt_stats}" >> $output &&
	    echo "SHINYREPS_SAMTOOLSCOV_OUT=${shinyReports_vars.samtoolscov_out}" >> $output &&
	    echo "SHINYREPS_BOWTIE_LOG=${shinyReports_vars.bowtie_log}"      >> $output &&
           echo "SHINYREPS_BAMINDEX_LOG=${shinyReports_vars.bamindex_log}"  >> $output &&
           echo "SHINYREPS_MARKDUPS_LOG=${shinyReports_vars.markdups_log}"  >> $output &&
           echo "SHINYREPS_EXTEND_LOG=${shinyReports_vars.extend_log}"      >> $output &&
           echo "SHINYREPS_FASTQC=${shinyReports_vars.fastqc}"   >> $output &&
           echo "SHINYREPS_FASTQC_LOG=${shinyReports_vars.fastqc_log}"   >> $output &&
           echo "SHINYREPS_FASTQC_SUMMARIZED=${shinyReports_vars.fastqc_summarized}" >> $output &&
           echo "SHINYREPS_FASTQSCREEN_OUT=${shinyReports_vars.fastqscreen_out}" >> $output &&
           echo "SHINYREPS_FASTQSCREEN_LOG=${shinyReports_vars.fastqscreen_log}" >> $output &&
           echo "SHINYREPS_FASTQSCREEN_PERC=${shinyReports_vars.fastqscreen_perc}" >> $output &&
           echo "SHINYREPS_INSERTSIZE=${shinyReports_vars.insertsize}" >> $output &&
           echo "SHINYREPS_IPSTRENGTH=${shinyReports_vars.ipstrength}"      >> $output &&
           echo "SHINYREPS_IPSTRENGTH_LOG=${shinyReports_vars.ipstrength_log}"      >> $output &&
           echo "SHINYREPS_PBC=${shinyReports_vars.pbc}"         >> $output &&
           echo "SHINYREPS_PHANTOMPEAK=${shinyReports_vars.phantompeak}"    >> $output &&
           echo "SHINYREPS_PHANTOM_LOG=${shinyReports_vars.phantom_log}"    >> $output &&
           echo "SHINYREPS_MACS2=${shinyReports_vars.macs2}"     >> $output &&
           echo "SHINYREPS_MACS2_LOG=${shinyReports_vars.macs2_log}"         >> $output &&
           echo "SHINYREPS_BLACKLIST_FILTER=${shinyReports_vars.blacklist_filter}" >> $output &&
           echo "SHINYREPS_UPSETPLOT=${shinyReports_vars.upset}" >> $output &&
           echo "SHINYREPS_PREFIX=${shinyReports_vars.prefix}"   >> $output &&
           echo "SHINYREPS_PLOTS_COLUMN=${shinyReports_vars.plots_column}" >> $output &&
           echo "SHINYREPS_PEAK_ANNOTATION=${shinyReports_vars.peak_annotation}" >> $output &&
           echo "SHINYREPS_DB=${shinyReports_vars.db}"   >> $output &&
           echo "SHINYREPS_GREAT=${shinyReports_vars.great}" >> $output &&
           echo "SHINYREPS_DIFFBIND=${shinyReports_vars.diffbind}" >> $output           &&
           echo "SHINYREPS_TRACKHUB_DONE=${shinyReports_vars.trackhub_done}" >> $output &&
           echo "SHINYREPS_TOOL_VERSIONS=${shinyReports_vars.tool_versions}" >> $output
       ""","shinyReports"
25
26
27
28
29
30
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    Rscript ${PIPELINE_ROOT}/tools/upsetPlot/upsetPlot.R $UPSET_FLAGS;
""","upsetPlot"
20
21
22
23
24
25
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    bamCoverage $BAMCOVERAGE_FLAGS --bam $input -o ${output};
""","bamCoverage"
15
16
17
18
19
20
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    samtools index $input
""","BAMindexer"
16
17
18
19
20
21
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    bamqc $BAMQC_FLAGS -o $output.dir $input
""","BamQC"
Bpipe From line 16 of NGS/bamqc.groovy
50
51
52
53
54
55
56
57
58
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    cutadapt $CUTADAPT_FLAGS $CUTADAPT_FLAGS_PAIRED $CUTADAPT_FLAGS_TOOLONG --too-short-output=\${TMP}/${SAMPLENAME_BASE}.cutadapt_discarded.fastq.gz --output=\${TMP}/${SAMPLENAME_BASE}.cutadapt.fastq.gz $input1 $CUTADAPT_INPUT_SECOND 1> ${CUTADAPT_STATSDIR}/${SAMPLENAME_BASE_PRUNED}.cutadapt.log &&

    mv -t ${CUTADAPT_DISCARDED_DIR}/ \${TMP}/${SAMPLENAME_BASE_PRUNED}*cutadapt_discarded*.fastq.gz &&
    mv -t $output.dir \${TMP}/${SAMPLENAME_BASE_PRUNED}*cutadapt.fastq.gz
""","Cutadapt"
17
18
19
20
21
22
23
24
25
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    CHRSIZES="\${TMP}/\$(basename ${input.prefix}).extend.chrsizes"  &&
    samtools idxstats ${input} | cut -f1-2 > "\${CHRSIZES}" &&
    bedtools bamtobed -split -i $input | bedtools slop -g "\${CHRSIZES}" -l 0 -r ${extend_vars.fraglen} -s | bedtools bedtobam -ubam -g "\${CHRSIZES}" | samtools sort $SAMTOOLS_SORT_FLAGS -T \${TMP}/\$(basename $output.prefix) - > $output &&
    samtools index $output
""","extend"
18
19
20
21
22
23
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    fastqc --extract $FASTQC_FLAGS -o $output.dir $inputs
""","FastQC"
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    if [ ! -e "$output.prefix" ]; then
        mkdir $output.prefix;
    fi &&
    fastqreference=${FastqScreen_vars.conf};
    references=(\${fastqreference//,/ });
    for i in "\${!references[@]}"; do
        reference=(\${references[i]//::/ });
        echo -e "DATABASE\t\${reference[0]}\t\${reference[1]}" >> $output.prefix/fastqscreen.conf;
    done;
    fastq_screen $FASTQSCREEN_FLAGS --conf $output.prefix/fastqscreen.conf --outdir $output.prefix $inputs;
    touch $outputs
""","FastqScreen"
14
15
16
17
18
19
20
21
exec """
     ${TOOL_ENV} &&
     ${PREAMBLE} &&

    chroms=\$(cut -f1 ${FilterChr_vars.file}) && 
    samtools view -@ ${FilterChr_vars.threads} -b $input \$chroms > $output &&
    samtools index $output
""","FilterChr"
20
21
22
23
24
25
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    java ${InsertSize_vars.java_flags} -jar \${PICARD} CollectInsertSizeMetrics $INSERTSIZE_FLAGS INPUT=$input OUTPUT=$output HISTOGRAM_FILE=${output.prefix}_hist.pdf
""","InsertSize"
19
20
21
22
23
24
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    java ${MarkDups_vars.java_flags} -jar \${PICARD} MarkDuplicates $MARKDUPS_FLAGS INPUT=$input OUTPUT=$output METRICS_FILE=${input.prefix}_dupmarked_dupmetrics.tsv
""","MarkDups"
16
17
18
19
20
21
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    multiqc $ESSENTIAL_PROJECT $MultiQC_FLAGS -o $output.dir
""","MULTIQC"
19
20
21
22
23
24
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    java ${RmDups_vars.java_flags} -jar \${PICARD} MarkDuplicates $RMDUPS_FLAGS INPUT=$input OUTPUT=$output METRICS_FILE=${input.prefix}_duprm_dupmetrics.tsv TMP_DIR=\${TMP}
""","RmDups"
Bpipe From line 19 of NGS/rmdups.groovy
16
17
18
19
20
21
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    samtools coverage $SAMTOOLSCOV_FLAGS -o $output $input
""","samtoolscov"
26
27
28
29
30
31
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    Rscript ${PIPELINE_ROOT}/tools/trackhub/Configure_Trackhub.R $TRACKHUB_FLAGS
""","trackhub_config"
17
18
19
20
21
22
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    Rscript ${PIPELINE_ROOT}/tools/trackhub/Make_Trackhub.R $TRACKHUB_FLAGS
""","trackhub"
ShowHide 23 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://gitlab.rlp.net/imbforge/NGSpipe2go/-/tree/master/pipelines/ChIPseq
Name: chip-seq
Version: Version 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Other Versions:
Downloaded: 0
Copyright: Public Domain
License: GNU General Public License v3.0
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...