Analysis pipeline for CUT&RUN and CUT&TAG experiments that includes QC, support for spike-ins, IgG controls, peak calling and downstream analysis.

public public 1yr ago Version: 3.1 0 bookmarks
Loading...

Introduction

nf-core/cutandrun is a best-practice bioinformatic analysis pipeline for CUT&RUN, CUT&Tag, and TIPseq experimental protocols that were developed to study protein-DNA interactions and epigenomic profiling.

CUT&RUN

Meers, M. P., Bryson, T. D., Henikoff, J. G., & Henikoff, S. (2019). Improved CUT&RUN chromatin profiling tools. eLife , 8 . https://doi.org/10.7554/eLife.46314

CUT&Tag

Kaya-Okur, H. S., Wu, S. J., Codomo, C. A., Pledger, E. S., Bryson, T. D., Henikoff, J. G., Ahmad, K., & Henikoff, S. (2019). CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nature Communications , 10 (1), 1930. https://doi.org/10.1038/s41467-019-09982-5]

TIPseq

Bartlett, D. A., Dileep, V., Handa, T., Ohkawa, Y., Kimura, H., Henikoff, S., & Gilbert, D. M. (2021). High-throughput single-cell epigenomic profiling by targeted insertion of promoters (TIP-seq). Journal of Cell Biology, 220(12), e202103078. https://doi.org/10.1083/jcb.202103078

The pipeline is built using Nextflow , a workflow tool to run tasks across multiple compute infrastructures in a portable, reproducible manner. It is capable of using containerisation and package management making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process, which makes it easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules .

The pipeline has been developed with continuous integration (CI) and test driven development (TDD) at its core. nf-core code and module linting as well as a battery of over 100 unit and integration tests run on pull request to the main repository and on release of the pipeline. On official release, automated CI tests run the pipeline on a full-sized dataset on AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website .

pipeline_diagram

Pipeline summary

  1. Check input files

  2. Merge re-sequenced FastQ files ( cat )

  3. Read QC ( FastQC )

  4. Adapter and quality trimming ( Trim Galore! )

  5. Alignment to both target and spike-in genomes ( Bowtie 2 )

  6. Filter on quality, sort and index alignments ( samtools )

  7. Duplicate read marking ( picard )

  8. Create bedGraph files ( bedtools

  9. Create bigWig coverage files ( bedGraphToBigWig )

  10. Peak calling ( SEACR , MACS2 )

  11. Consensus peak merging and reporting ( bedtools )

  12. Library complexity ([preseq]( Preseq | The Smith Lab ))

  13. Fragment-based quality control ( deepTools )

  14. Peak-based quality control ( bedtools , custom python)

  15. Heatmap peak analysis ( deepTools )

  16. Genome browser session ( IGV )

  17. Present all QC in web-based report ( MultiQC )

Usage

Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv :

group,replicate,fastq_1,fastq_2,control
h3k27me3,1,h3k27me3_rep1_r1.fastq.gz,h3k27me3_rep1_r2.fastq.gz,igg_ctrl
h3k27me3,2,h3k27me3_rep2_r1.fastq.gz,h3k27me3_rep2_r2.fastq.gz,igg_ctrl
igg_ctrl,1,igg_rep1_r1.fastq.gz,igg_rep1_r2.fastq.gz,
igg_ctrl,2,igg_rep2_r1.fastq.gz,igg_rep2_r2.fastq.gz,

Each row represents a pair of fastq files (paired end).

Now, you can run the pipeline using:

nextflow run nf-core/cutandrun
-profile <docker/singularity/.../institute>
--input samplesheet.csv
--peakcaller 'seacr,MACS2'
--genome GRCh38
--outdir

Warning: Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters ; see docs .

  • Typical command for CUT&Run/CUT&Tag/TIPseq analysis:

Pipeline output

To see the the results of a test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation .

Credits

nf-core/cutandrun was originally written by Chris Cheshire ( @chris-cheshire ) and Charlotte West ( @charlotte-west ) from Luscombe Lab at The Francis Crick Institute , London, UK.

The pipeline structure and parts of the downstream analysis were adapted from the original CUT&Tag analysis protocol from the Henikoff Lab . The removal of duplicates arising from linear amplification (also known as T7 duplicates) in the TIPseq protocol was implemented as described in the original TIPseq paper .

We thank Harshil Patel ( @drpatelh ) and everyone in the Luscombe Lab ( @luslab ) for their extensive assistance in the development of this pipeline.

The Francis Crick Institute

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines .

For further information or help, don't hesitate to get in touch on the Slack #cutandrun channel (you can join with this invite ).

Citations

If you use nf-core/cutandrun for your analysis, please cite it using the following doi: 10.5281/zenodo.5653535

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x .

Code Snippets

 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
"""
#!/usr/bin/env python

import yaml
import platform
from textwrap import dedent

def _make_versions_html(versions):
    html = [
        dedent(
            '''\\
            <style>
            #nf-core-versions tbody:nth-child(even) {
                background-color: #f2f2f2;
            }
            </style>
            <table class="table" style="width:100%" id="nf-core-versions">
                <thead>
                    <tr>
                        <th> Process Name </th>
                        <th> Software </th>
                        <th> Version  </th>
                    </tr>
                </thead>
            '''
        )
    ]
    for process, tmp_versions in sorted(versions.items()):
        html.append("<tbody>")
        for i, (tool, version) in enumerate(sorted(tmp_versions.items())):
            html.append(
                dedent(
                    f'''\\
                    <tr>
                        <td><samp>{process if (i == 0) else ''}</samp></td>
                        <td><samp>{tool}</samp></td>
                        <td><samp>{version}</samp></td>
                    </tr>
                    '''
                )
            )
        html.append("</tbody>")
    html.append("</table>")
    return "\\n".join(html)

def _make_versions_unique_html(versions):
    unique_versions = []

    for process, tmp_versions in sorted(versions.items()):
        for i, (tool, version) in enumerate(sorted(tmp_versions.items())):
            tool_version = tool + "=" + version
            if tool_version not in unique_versions:
                unique_versions.append(tool_version)

    unique_versions.sort()

    html = [
        dedent(
            '''\\
            <style>
            #nf-core-versions-unique tbody:nth-child(even) {
                background-color: #f2f2f2;
            }
            </style>
            <table class="table" style="width:100%" id="nf-core-versions-unique">
                <thead>
                    <tr>
                        <th> Software </th>
                        <th> Version  </th>
                    </tr>
                </thead>
            '''
        )
    ]

    for tool_version in unique_versions:
        tool_version_split = tool_version.split('=')
        html.append("<tbody>")
        html.append(
            dedent(
                f'''\\
                <tr>
                    <td><samp>{tool_version_split[0]}</samp></td>
                    <td><samp>{tool_version_split[1]}</samp></td>
                </tr>
                '''
            )
        )
        html.append("</tbody>")
    html.append("</table>")
    return "\\n".join(html)

module_versions = {}
module_versions["${task.process}"] = {
    'python': platform.python_version(),
    'yaml': yaml.__version__
}

with open("$versions") as f:
    workflow_versions = yaml.load(f, Loader=yaml.BaseLoader) | module_versions

workflow_versions["Workflow"] = {
    "Nextflow": "$workflow.nextflow.version",
    "$workflow.manifest.name": "$workflow.manifest.version"
}

versions_mqc = {
    'parent_id': 'software_versions',
    'parent_name': 'Software Versions',
    'parent_description': 'Details software versions used in the pipeline run',
    'id': 'software-versions-by-process',
    'section_name': '${workflow.manifest.name} software versions by process',
    'section_href': 'https://github.com/${workflow.manifest.name}',
    'plot_type': 'html',
    'description': 'are collected at run time from the software output.',
    'data': _make_versions_html(workflow_versions)
}

versions_mqc_unique = {
    'parent_id': 'software_versions',
    'parent_name': 'Software Versions',
    'parent_description': 'Details software versions used in the pipeline run',
    'id': 'software-versions-unique',
    'section_name': '${workflow.manifest.name} Software Versions',
    'section_href': 'https://github.com/${workflow.manifest.name}',
    'plot_type': 'html',
    'description': 'are collected at run time from the software output.',
    'data': _make_versions_unique_html(workflow_versions)
}

with open("software_versions.yml", 'w') as f:
    yaml.dump(workflow_versions, f, default_flow_style=False)

with open("software_versions_mqc.yml", 'w') as f:
    yaml.dump(versions_mqc, f, default_flow_style=False)

with open("software_versions_unique_mqc.yml", 'w') as f:
    yaml.dump(versions_mqc_unique, f, default_flow_style=False)

with open('local_versions.yml', 'w') as f:
    yaml.dump(module_versions, f, default_flow_style=False)
"""
26
27
28
29
30
31
32
33
34
35
36
37
38
"""
bedtools \\
    sort \\
    -i $intervals \\
    $sizes \\
    $args \\
    > ${prefix}.${extension}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""
25
26
27
28
29
30
31
32
33
34
35
36
37
"""
bamCoverage \
--bam $input \
$args \
--scaleFactor ${scale} \
--numberOfProcessors ${task.cpus} \
--outFileName ${prefix}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    deeptools: \$(bamCoverage --version | sed -e "s/bamCoverage //g")
END_VERSIONS
"""
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
"""
samtools \\
    view \\
    --threads ${task.cpus-1} \\
    ${reference} \\
    ${blacklist} \\
    $args \\
    $input \\
    $args2 \\
    > ${prefix}.${file_type}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
50
51
52
53
54
55
56
57
58
"""
touch ${prefix}.bam
touch ${prefix}.cram

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
NextFlow From line 50 of view/main.nf
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
"""
[ ! -f  ${prefix}.fastq.gz ] && ln -s $reads ${prefix}.fastq.gz
trim_galore \\
    $args \\
    --cores $cores \\
    --gzip \\
    $c_r1 \\
    $tpc_r1 \\
    ${prefix}.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    trim_galore: \$(echo \$(trim_galore --version 2>&1) | sed 's/^.*version //; s/Last.*\$//')
    cutadapt: \$(cutadapt --version)
END_VERSIONS
"""
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
"""
[ ! -f  ${meta.id}_1.fastq.gz ] && ln -s ${reads[0]} ${meta.id}_1.fastq.gz
[ ! -f  ${meta.id}_2.fastq.gz ] && ln -s ${reads[1]} ${meta.id}_2.fastq.gz
trim_galore \\
    $args \\
    --cores $cores \\
    --paired \\
    --gzip \\
    $c_r1 \\
    $c_r2 \\
    $tpc_r1 \\
    $tpc_r2 \\
    ${meta.id}_1.fastq.gz \\
    ${meta.id}_2.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    trim_galore: \$(echo \$(trim_galore --version 2>&1) | sed 's/^.*version //; s/Last.*\$//')
    cutadapt: \$(cutadapt --version)
END_VERSIONS

mv ${meta.id}_1_val_1.fq.gz ${prefix_1}.fastq.gz
mv ${meta.id}_2_val_2.fq.gz ${prefix_2}.fastq.gz

[ ! -f  ${meta.id}_1_val_1_fastqc.html ] || mv ${meta.id}_1_val_1_fastqc.html ${meta.id}_1${suffix}_fastqc.html
[ ! -f  ${meta.id}_2_val_2_fastqc.html ] || mv ${meta.id}_2_val_2_fastqc.html ${meta.id}_2${suffix}_fastqc.html

[ ! -f  ${meta.id}_1_val_1_fastqc.zip ] || mv ${meta.id}_1_val_1_fastqc.zip ${meta.id}_1${suffix}_fastqc.zip
[ ! -f  ${meta.id}_2_val_2_fastqc.zip ] || mv ${meta.id}_2_val_2_fastqc.zip ${meta.id}_2${suffix}_fastqc.zip
"""
22
23
24
25
26
27
28
29
30
31
"""
gtf2bed \\
    $args \\
    $gtf \\
    > ${gtf.baseName}.bed
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    perl: \$(echo \$(perl --version 2>&1) | sed 's/.*v\\(.*\\)) built.*/\\1/')
END_VERSIONS
"""
27
28
29
30
31
32
33
34
"""
awk $args $command $input $command2 > ${prefix}.awk.${ext}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    awk: \$(awk -Wversion 2>/dev/null | head -n 1 | awk '{split(\$0,a,","); print a[1];}' | egrep -o "([0-9]{1,}\\.)+[0-9]{1,}")
END_VERSIONS
"""
NextFlow From line 27 of linux/awk.nf
24
25
26
27
28
29
30
31
"""
awk $args -f $script $input > ${prefix}.awk.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    awk: \$(awk -Wversion 2>/dev/null | head -n 1 | awk '{split(\$0,a,","); print a[1];}' | egrep -o "([0-9]{1,}\\.)+[0-9]{1,}")
END_VERSIONS
"""
26
27
28
29
30
31
32
33
"""
cut $args $input $command > ${prefix}.cut.${ext}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cut: \$(cut --version | head -n 1 | awk '{print \$4;}')
END_VERSIONS
"""
NextFlow From line 26 of linux/cut.nf
28
29
30
31
32
33
34
35
"""
sort -T '.' $args $input_files > ${prefix}.sort.${ext}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    sort: \$(sort --version | head -n 1 | awk '{print \$4;}')
END_VERSIONS
"""
NextFlow From line 28 of linux/sort.nf
47
48
49
50
51
52
53
54
"""
multiqc -f $args $custom_config .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
END_VERSIONS
"""
23
24
25
26
27
28
29
30
"""
cat ${bed} | wc -l | awk -v OFS='\t' '{ print "Peak Count", \$1 }' | cat $peak_counts_header - > ${prefix}_mqc.tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""
24
25
26
27
28
29
30
31
32
"""
READS_IN_PEAKS=\$(bedtools intersect -a ${fragments_bed} -b ${peaks_bed} -bed -c -f $min_frip_overlap |  awk -F '\t' '{sum += \$NF} END {print sum * 2}')
grep -m 1 'mapped (' ${flagstat} | awk -v a="\$READS_IN_PEAKS" -v OFS='\t' '{print "Peak FRiP Score", a/\$1}' | cat $frip_score_header - > ${prefix}_mqc.tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""
25
26
27
28
29
30
31
32
33
34
35
36
"""
find_unique_reads.py \\
    --bed_path $input \\
    --output_path "${prefix}_unique_alignments.txt" \\
    --metrics_path "${prefix}_metrics.txt" \\
    --header_path $mqc_header \\
    --mqc_path "${prefix}_mqc.tsv"
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | grep -E -o \"([0-9]{1,}\\.)+[0-9]{1,}\")
END_VERSIONS
"""
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
"""
calc_frag_hist.py \\
    --frag_path "*len.txt" \\
    --output frag_len_hist.txt

if [ -f "frag_len_hist.txt" ]; then
    cat $frag_len_header_multiqc frag_len_hist.txt > frag_len_mqc.yml
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | grep -E -o \"([0-9]{1,}\\.)+[0-9]{1,}\")
    numpy: \$(python -c 'import numpy; print(numpy.__version__)')
    pandas: \$(python -c 'import pandas; print(pandas.__version__)')
    seaborn: \$(python -c 'import seaborn; print(seaborn.__version__)')
END_VERSIONS
"""
51
52
53
54
55
56
57
58
59
60
61
62
"""
echo "$output" > exp_files.txt
find -L * -iname "*.gtf" -exec echo -e {}"\\t0,48,73" \\; > gtf.igv.txt
find -L * -iname "*.gff" -exec echo -e {}"\\t0,48,73" \\; > gff.igv.txt
cat *.txt > igv_files.txt
igv_files_to_session.py igv_session.xml igv_files.txt $genome $gtf_bed --path_prefix './'

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | grep -E -o \"([0-9]{1,}\\.)+[0-9]{1,}\")
END_VERSIONS
"""
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
"""
peak_reproducibility.py \\
    --sample_id $meta.id \\
    --intersect $bed \\
    --threads ${task.cpus} \\
    --outpath .

cat $peak_reprod_header_multiqc *peak_repro.tsv > ${prefix}_mqc.tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | grep -E -o \"([0-9]{1,}\\.)+[0-9]{1,}\")
    dask: \$(python -c 'import dask; print(dask.__version__)')
    numpy: \$(python -c 'import numpy; print(numpy.__version__)')
    pandas: \$(python -c 'import pandas; print(pandas.__version__)')
END_VERSIONS
"""
17
18
19
20
21
22
23
24
25
26
27
28
29
"""
plot_consensus_peaks.py \\
    --peaks "*.peaks.bed" \\
    --outpath .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | grep -E -o \"([0-9]{1,}\\.)+[0-9]{1,}\")
    numpy: \$(python -c 'import numpy; print(numpy.__version__)')
    pandas: \$(python -c 'import pandas; print(pandas.__version__)')
    upsetplot: \$(python -c 'import upsetplot; print(upsetplot.__version__)')
END_VERSIONS
"""
19
20
21
22
23
24
25
26
"""
check_samplesheet.py $samplesheet samplesheet.valid.csv $params.use_control

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | grep -E -o \"([0-9]{1,}\\.)+[0-9]{1,}\")
END_VERSIONS
"""
24
25
26
27
28
29
30
31
"""
samtools view $args -@ $task.cpus $bam | $args2 > ${prefix}.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
23
24
25
26
27
28
29
30
31
32
33
34
"""
bedtools \\
    bamtobed \\
    $args \\
    -i $bam \\
    > ${prefix}.bed

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""
38
39
40
41
42
43
44
45
"""
touch ${prefix}.bed

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""
25
26
27
28
29
30
31
32
33
34
35
36
37
"""
bedtools \\
    complement \\
    -i $bed \\
    -g $sizes \\
    $args \\
    > ${prefix}.bed

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""
32
33
34
35
36
37
38
39
40
41
42
43
"""
bedtools \\
    genomecov \\
    -ibam $intervals \\
    $args \\
    > ${prefix}.${extension}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""
45
46
47
48
49
50
51
52
53
54
55
56
57
"""
bedtools \\
    genomecov \\
    -i $intervals \\
    -g $sizes \\
    $args \\
    > ${prefix}.${extension}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""
62
63
64
65
66
67
68
69
"""
touch  ${prefix}.${extension}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""
30
31
32
33
34
35
36
37
38
39
40
41
42
43
"""
bedtools \\
    intersect \\
    -a $intervals1 \\
    -b $intervals2 \\
    $args \\
    $sizes \\
    > ${prefix}.${extension}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""
51
52
53
54
55
56
57
58
"""
touch ${prefix}.${extension}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""
24
25
26
27
28
29
30
31
32
33
34
35
"""
bedtools \\
    merge \\
    -i $bed \\
    $args \\
    > ${prefix}.bed

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""
39
40
41
42
43
44
45
46
"""
touch ${prefix}.bed

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
"""
INDEX=`find -L ./ -name "*.rev.1.bt2" | sed "s/\\.rev.1.bt2\$//"`
[ -z "\$INDEX" ] && INDEX=`find -L ./ -name "*.rev.1.bt2l" | sed "s/\\.rev.1.bt2l\$//"`
[ -z "\$INDEX" ] && echo "Bowtie2 index files not found" 1>&2 && exit 1

bowtie2 \\
    -x \$INDEX \\
    $reads_args \\
    --threads $task.cpus \\
    $unaligned \\
    $args \\
    2> ${prefix}.bowtie2.log \\
    | samtools $samtools_command $args2 --threads $task.cpus -o ${prefix}.${extension} -

if [ -f ${prefix}.unmapped.fastq.1.gz ]; then
    mv ${prefix}.unmapped.fastq.1.gz ${prefix}.unmapped_1.fastq.gz
fi

if [ -f ${prefix}.unmapped.fastq.2.gz ]; then
    mv ${prefix}.unmapped.fastq.2.gz ${prefix}.unmapped_2.fastq.gz
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    pigz: \$( pigz --version 2>&1 | sed 's/pigz //g' )
END_VERSIONS
"""
80
81
82
83
84
85
86
87
88
89
90
91
92
"""
touch ${prefix}.${extension}
touch ${prefix}.bowtie2.log
touch ${prefix}.unmapped_1.fastq.gz
touch ${prefix}.unmapped_2.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    pigz: \$( pigz --version 2>&1 | sed 's/pigz //g' )
END_VERSIONS
"""
NextFlow From line 80 of align/main.nf
22
23
24
25
26
27
28
29
"""
mkdir bowtie2
bowtie2-build $args --threads $task.cpus $fasta bowtie2/${fasta.baseName}
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
END_VERSIONS
"""
32
33
34
35
36
37
38
39
40
41
"""
mkdir bowtie2
touch bowtie2/${fasta.baseName}.{1..4}.bt2
touch bowtie2/${fasta.baseName}.rev.{1,2}.bt2

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
END_VERSIONS
"""
26
27
28
29
30
31
32
33
"""
cat ${readList.join(' ')} > ${prefix}.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""
NextFlow From line 26 of fastq/main.nf
40
41
42
43
44
45
46
47
48
"""
cat ${read1.join(' ')} > ${prefix}_1.merged.fastq.gz
cat ${read2.join(' ')} > ${prefix}_2.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""
NextFlow From line 40 of fastq/main.nf
57
58
59
60
61
62
63
64
"""
touch ${prefix}.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""
NextFlow From line 57 of fastq/main.nf
68
69
70
71
72
73
74
75
76
"""
touch ${prefix}_1.merged.fastq.gz
touch ${prefix}_2.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""
NextFlow From line 68 of fastq/main.nf
24
25
26
27
28
29
30
31
32
"""
samtools faidx $fasta
cut -f 1,2 ${fasta}.fai > ${fasta}.sizes

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    getchromsizes: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
35
36
37
38
39
40
41
42
43
"""
touch ${fasta}.fai
touch ${fasta}.sizes

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    getchromsizes: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
25
26
27
28
29
30
31
32
33
34
35
36
37
38
"""
computeMatrix \\
    $args \\
    --regionsFileName $bed \\
    --scoreFileName $bigwig \\
    --outFileName ${prefix}.computeMatrix.mat.gz \\
    --outFileNameMatrix ${prefix}.computeMatrix.vals.mat.tab \\
    --numberOfProcessors $task.cpus

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    deeptools: \$(computeMatrix --version | sed -e "s/computeMatrix //g")
END_VERSIONS
"""
23
24
25
26
27
28
29
30
31
32
33
34
35
"""
multiBamSummary bins \\
    $args \\
    $label \\
    --bamfiles ${bams.join(' ')} \\
    --numberOfProcessors $task.cpus \\
    --outFileName all_bam.bamSummary.npz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    deeptools: \$(multiBamSummary --version | sed -e "s/multiBamSummary //g")
END_VERSIONS
"""
29
30
31
32
33
34
35
36
37
38
39
40
41
42
"""
plotCorrelation \\
    $args \\
    --corData $matrix \\
    --corMethod $resolved_method \\
    --whatToPlot $resolved_plot_type \\
    --plotFile ${prefix}.plotCorrelation.pdf \\
    --outFileCorMatrix ${prefix}.plotCorrelation.mat.tab

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    deeptools: \$(plotCorrelation --version | sed -e "s/plotCorrelation //g")
END_VERSIONS
"""
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
"""
plotFingerprint \\
    $args \\
    $extend \\
    --bamfiles ${bams.join(' ')} \\
    --plotFile ${prefix}.plotFingerprint.pdf \\
    --outRawCounts ${prefix}.plotFingerprint.raw.txt \\
    --outQualityMetrics ${prefix}.plotFingerprint.qcmetrics.txt \\
    --numberOfProcessors $task.cpus

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    deeptools: \$(plotFingerprint --version | sed -e "s/plotFingerprint //g")
END_VERSIONS
"""
24
25
26
27
28
29
30
31
32
33
34
35
"""
plotHeatmap \\
    $args \\
    --matrixFile $matrix \\
    --outFileName ${prefix}.plotHeatmap.pdf \\
    --outFileNameMatrix ${prefix}.plotHeatmap.mat.tab

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    deeptools: \$(plotHeatmap --version | sed -e "s/plotHeatmap //g")
END_VERSIONS
"""
24
25
26
27
28
29
30
31
32
33
34
35
"""
plotPCA \\
    $args \\
    --corData $matrix \\
    --plotFile ${prefix}.plotPCA.pdf \\
    --outFileNameData ${prefix}.plotPCA.tab

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    deeptools: \$(plotPCA --version | sed -e "s/plotPCA //g")
END_VERSIONS
"""
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
"""
printf "%s %s\\n" $rename_to | while read old_name new_name; do
    [ -f "\${new_name}" ] || ln -s \$old_name \$new_name
done

fastqc \\
    $args \\
    --threads $task.cpus \\
    $renamed_files

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""
46
47
48
49
50
51
52
53
54
"""
touch ${prefix}.html
touch ${prefix}.zip

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
"""
# Not calling gunzip itself because it creates files
# with the original group ownership rather than the
# default one for that user / the work directory
gzip \\
    -cd \\
    $args \\
    $archive \\
    > $gunzip

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//')
END_VERSIONS
"""
NextFlow From line 23 of gunzip/main.nf
41
42
43
44
45
46
47
"""
touch $gunzip
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//')
END_VERSIONS
"""
NextFlow From line 41 of gunzip/main.nf
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
"""
macs2 \\
    callpeak \\
    ${args_list.join(' ')} \\
    --gsize $macs2_gsize \\
    --format $format \\
    --name $prefix \\
    --treatment $ipbam \\
    $control

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    macs2: \$(macs2 --version | sed -e "s/macs2 //g")
END_VERSIONS
"""
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
"""
picard \\
    -Xmx${avail_mem}M \\
    MarkDuplicates \\
    $args \\
    --INPUT $bam \\
    --OUTPUT ${prefix}.bam \\
    --REFERENCE_SEQUENCE $fasta \\
    --METRICS_FILE ${prefix}.MarkDuplicates.metrics.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    picard: \$(echo \$(picard MarkDuplicates --version 2>&1) | grep -o 'Version:.*' | cut -f2- -d:)
END_VERSIONS
"""
51
52
53
54
55
56
57
58
59
60
"""
touch ${prefix}.bam
touch ${prefix}.bam.bai
touch ${prefix}.MarkDuplicates.metrics.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    picard: \$(echo \$(picard MarkDuplicates --version 2>&1) | grep -o 'Version:.*' | cut -f2- -d:)
END_VERSIONS
"""
26
27
28
29
30
31
32
33
34
35
36
37
38
39
"""
preseq \\
    lc_extrap \\
    $args \\
    $paired_end \\
    -output ${prefix}.lc_extrap.txt \\
    $bam
cp .command.err ${prefix}.command.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    preseq: \$(echo \$(preseq 2>&1) | sed 's/^.*Version: //; s/Usage:.*\$//')
END_VERSIONS
"""
25
26
27
28
29
30
31
32
33
34
35
"""
samtools \\
    faidx \\
    $fasta \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
40
41
42
43
44
45
46
47
48
49
"""
${fastacmd}
touch ${fasta}.fai

cat <<-END_VERSIONS > versions.yml

"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
NextFlow From line 40 of faidx/main.nf
23
24
25
26
27
28
29
30
31
32
33
34
"""
samtools \\
    flagstat \\
    --threads ${task.cpus} \\
    $bam \\
    > ${prefix}.flagstat

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
38
39
40
41
42
43
44
45
"""
touch ${prefix}.flagstat

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
24
25
26
27
28
29
30
31
32
33
34
35
"""
samtools \\
    idxstats \\
    --threads ${task.cpus-1} \\
    $bam \\
    > ${prefix}.idxstats

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
40
41
42
43
44
45
46
47
"""
touch ${prefix}.idxstats

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
24
25
26
27
28
29
30
31
32
33
34
35
"""
samtools \\
    index \\
    -@ ${task.cpus-1} \\
    $args \\
    $input

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
38
39
40
41
42
43
44
45
46
47
"""
touch ${input}.bai
touch ${input}.crai
touch ${input}.csi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
NextFlow From line 38 of index/main.nf
25
26
27
28
29
30
31
32
33
34
35
36
37
"""
samtools sort \\
    $args \\
    -@ $task.cpus \\
    -o ${prefix}.bam \\
    -T $prefix \\
    $bam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
41
42
43
44
45
46
47
48
"""
touch ${prefix}.bam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
NextFlow From line 41 of sort/main.nf
25
26
27
28
29
30
31
32
33
34
35
36
37
"""
samtools \\
    stats \\
    --threads ${task.cpus} \\
    ${reference} \\
    ${input} \\
    > ${prefix}.stats

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
41
42
43
44
45
46
47
48
"""
touch ${prefix}.stats

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
NextFlow From line 41 of stats/main.nf
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
"""
samtools \\
    view \\
    --threads ${task.cpus-1} \\
    ${reference} \\
    ${readnames} \\
    $args \\
    -o ${prefix}.${file_type} \\
    $input \\
    $args2

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
57
58
59
60
61
62
63
64
65
"""
touch ${prefix}.bam
touch ${prefix}.cram

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
NextFlow From line 57 of view/main.nf
27
28
29
30
31
32
33
34
35
36
37
38
39
40
"""
SEACR_1.3.sh \\
    $bedgraph \\
    $function_switch \\
    $args \\
    $prefix

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    seacr: $VERSION
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
END_VERSIONS
"""
25
26
27
28
29
30
31
32
33
"""
bgzip  --threads ${task.cpus} -c $args $input > ${prefix}.${input.getExtension()}.gz
tabix $args2 ${prefix}.${input.getExtension()}.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//')
END_VERSIONS
"""
37
38
39
40
41
42
43
44
45
46
"""
touch ${prefix}.${input.getExtension()}.gz
touch ${prefix}.${input.getExtension()}.gz.tbi
touch ${prefix}.${input.getExtension()}.gz.csi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//')
END_VERSIONS
"""
26
27
28
29
30
31
32
33
34
35
36
"""
bedClip \\
    $bedgraph \\
    $sizes \\
    ${prefix}.bedGraph

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    ucsc: $VERSION
END_VERSIONS
"""
26
27
28
29
30
31
32
33
34
35
36
"""
bedGraphToBigWig \\
    $bedgraph \\
    $sizes \\
    ${prefix}.bigWig

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    ucsc: $VERSION
END_VERSIONS
"""
41
42
43
44
45
46
47
48
"""
touch ${prefix}.bigWig

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    ucsc: $VERSION
END_VERSIONS
"""
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
"""
mkdir $prefix

## Ensures --strip-components only applied when top level of tar contents is a directory
## If just files or multiple directories, place all in prefix
if [[ \$(tar -taf ${archive} | grep -o -P "^.*?\\/" | uniq | wc -l) -eq 1 ]]; then
    tar \\
        -C $prefix --strip-components 1 \\
        -xavf \\
        $args \\
        $archive \\
        $args2
else
    tar \\
        -C $prefix \\
        -xavf \\
        $args \\
        $archive \\
        $args2
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    untar: \$(echo \$(tar --version 2>&1) | sed 's/^.*(GNU tar) //; s/ Copyright.*\$//')
END_VERSIONS
"""
NextFlow From line 25 of untar/main.nf
54
55
56
57
58
59
60
61
62
"""
mkdir $prefix
touch ${prefix}/file.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    untar: \$(echo \$(tar --version 2>&1) | sed 's/^.*(GNU tar) //; s/ Copyright.*\$//')
END_VERSIONS
"""
NextFlow From line 54 of untar/main.nf
ShowHide 62 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://nf-co.re/cutandrun
Name: cutandrun
Version: 3.1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...