Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
snakemake_hic
This is a template repository for running HiC-Pro under snakemake. The steps currently implemented are:
-
Download and index the genome
-
Trim raw data using
AdapterRemoval
-
Run
FastQC
on raw and trimmed data -
Prepare Files for HiC-Pro
-
Organise genome annotations
-
Define restriction fragments
-
Update
hicpro-config.txt
based onconfig.yml
-
Setup
HiC-Pro must be installed and visible to this workflow. Please ensure you have a working installation on your system. This workflow is built around v3.0.0
-
Data must be placed in
data/raw/fastq
-
Each sample/replicate needs to go in a separate folder. Use the test dataset as a guide.
-
Delete the test data once you have placed yours in the correct folder
-
-
Edit
config/samples.tsv
to reflect the sample names and file names.- Data is assumed to be paired, so only one file needs to be listed per directory
-
Edit
config/config.yml
Testing
Once all edits are performed and data is uploaded, run:
snakemake -n
This is a dry run to check everything should work as expected. If the dry run is successful, create the conda environments:
snakemake --use-conda --conda-create-envs-only --cores 1
This may take a while. Snakemake creates these in series so only one core is required.
Once complete, create the rulegraph to check everything is as expected.
snakemake --rulegraph > rules/rulegraph.dot
dot -Tpdf rules/rulegraph.dot > rules/rulegraph.pdf
Execution
The following will set the workflow running using
12
cores
snakemake \ --use-conda \ --notemp \ --cores 12
Code Snippets
15 16 17 18 | shell: """ pigz -p {threads} -c {input.valid_pairs} > {output.valid_pairs} """ |
32 33 34 35 | shell: """ pigz -p {threads} -c {input.matrix} > {output.matrix} """ |
57 58 59 60 61 62 63 64 | shell: """ ## The generic copy all. Clearly this will repeat every time though # cp {params.in_path}/* {params.out_path} ## Copy the specific files cp {input.stat} {output.stat} """ |
81 82 83 84 85 86 87 88 | shell: """ convert \ -density {params.density} \ {input.pdf}[0] \ -quality {params.quality} \ {output.png} """ |
102 103 104 105 106 107 | shell: """ Rscript --vanilla \ {input.script} \ {params.bin} &> {log} """ |
14 15 16 17 18 19 20 21 | shell: """ # Run the python script python {params.script} \ -r {params.restriction_site} \ -o {output.rs} \ {input} """ |
27 28 29 30 31 32 | shell: """ awk '{{print($3 - $2)}}' {input} | \ sort -n | \ uniq -c > {output} """ |
39 40 41 42 | shell: """ pigz -p {threads} -c {input} > {output} """ |
70 71 72 73 74 75 76 77 78 79 | shell: """ Rscript --vanilla \ scripts/write_hicpro_config.R \ {params.idx_root} \ {input.chr_sizes} \ {input.rs} \ {params.template} \ {output} """ |
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | shell: """ ## Remove any existing data as leaving this here causes HicPro to ## make an interactive request. Piping `yes` into HicPro may be the ## source of some current problems if [[ -d {params.outdir} ]]; then rm -rf {params.outdir} fi ## Run HiC-pro HiC-Pro \ -s mapping \ -c {input.config} \ -i {params.indir} \ -o {params.outdir} """ |
196 197 198 199 200 201 202 203 | shell: """ HiC-Pro \ -s quality_checks \ -c {input.config} \ -i {params.indir} \ -o {params.outdir} """ |
237 238 239 240 241 242 243 244 | shell: """ HiC-Pro \ -s proc_hic \ -c {input.config} \ -i {params.indir} \ -o {params.outdir} """ |
284 285 286 287 288 289 290 291 | shell: """ HiC-Pro \ -s merge_persample \ -c {input.config} \ -i {params.indir} \ -o {params.outdir} """ |
319 320 321 322 323 324 325 326 | shell: """ HiC-Pro \ -s build_contact_maps \ -c {input.config} \ -i {params.indir} \ -o {params.outdir} """ |
9 10 11 12 13 | shell: """ git clone https://github.com/Rassa-Gvm/MaxHiC.git scripts/MaxHiC rm -rf scripts/MaxHiC/Sample_Inputs """ |
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | shell: """ ## Given the problems with the raw output from HiC-Pro, we should ## delete any *ord.bed* files that exist. They seem to have been ## excluded from HiC-Pro v3 if compgen -G "{params.input_path}/*ord.bed" > /dev/null; then echo -e "Deleting unnecessary symlink" rm {params.input_path}/*ord.bed fi python {input.maxhic_exe} \ -t {threads} \ {params.input_path} \ {params.output_path} &> {log} ## Compress the output files pigz -p {threads} {params.output_path}/*txt """ |
37 38 39 40 41 42 43 44 45 | shell: """ Rscript --vanilla \ scripts/merge_matrices.R \ {params.bin} \ {params.in_path} \ {params.out_path} \ {params.samples} &> {log} """ |
64 65 66 67 68 | shell: """ pigz -p {threads} -c {input.mat} > {output.mat} pigz -p {threads} -c {input.bed} > {output.bed} """ |
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | shell: """ # Write to a separate temp directory for each run to avoid I/O clashes TEMPDIR=$(mktemp -d -t fqcXXXXXXXXXX) fastqc \ {params} \ -t {threads} \ --outdir $TEMPDIR \ {input} &> {log} # Move the files mv $TEMPDIR/*html $(dirname {output.html}) mv $TEMPDIR/*zip $(dirname {output.zip}) # Clean up the temp directory rm -rf $TEMPDIR """ |
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | shell: """ # Write to a separate temp directory for each run to avoid I/O clashes TEMPDIR=$(mktemp -d -t fqcXXXXXXXXXX) fastqc \ {params} \ -t {threads} \ --outdir $TEMPDIR \ {input} &> {log} # Move the files mv $TEMPDIR/*html $(dirname {output.html}) mv $TEMPDIR/*zip $(dirname {output.zip}) # Clean up the temp directory rm -rf $TEMPDIR """ |
12 13 14 15 16 17 18 | shell: """ wget \ -O "{output}" \ -o {log} \ {params.ftp} """ |
24 25 26 27 | shell: """ gunzip -c {input} > {output} """ |
46 47 48 49 50 51 52 | shell: """ bowtie2-build \ --threads {threads} \ -f {input} \ {params.prefix} &> {log} """ |
65 66 67 68 69 70 71 72 73 74 75 76 77 | shell: """ # Download the assembly report TEMPDIR=$(mktemp -d -t chrXXXXXXXXXX) REPORT="assembly_report.txt" curl {params.ftp} > $TEMPDIR/$REPORT 2> {log} # Extract the chrom_sizes egrep 'assembled-molecule' "$TEMPDIR/$REPORT" | \ awk '{{print "chr"$2"\t"$3}}' > {output} rm -rf $TEMPDIR """ |
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | shell: """ AdapterRemoval \ --adapter1 {params.adapter1} \ --adapter2 {params.adapter2} \ --file1 {input.r1} \ --file2 {input.r2} \ --threads {threads} \ --gzip \ --maxns {params.maxns} \ --trimqualities \ --minquality {params.minqual} \ --minlength {params.minlength} \ --output1 {output.t1} \ --output2 {output.t2} \ --discarded /dev/null \ --singleton /dev/null \ --settings {output.log} &> {log} """ |
4 5 6 7 | shell: """ echo -e "Version: 1.0\n\nRestoreWorkspace: Default\nSaveWorkspace: Default\nAlwaysSaveHistory: Default\n\nEnableCodeIndexing: Yes\nUseSpacesForTab: Yes\nNumSpacesForTab: 2\nEncoding: UTF-8\n\nRnwWeave: knitr\nLaTeX: pdfLaTeX\n\nAutoAppendNewline: Yes\nStripTrailingWhitespace: Yes" > {output} """ |
28 29 30 31 32 33 34 | shell: """ git add analysis/* R -e "workflowr::wflow_build('{input.rmd}')" 2>&1 > {log} git add docs/* git commit -m 'Updated all' """ |
54 55 56 57 | shell: """ R -e "workflowr::wflow_build('{input.rmd}')" 2>&1 > {log} """ |
77 78 79 80 | shell: """ R -e "workflowr::wflow_build('{input.rmd}')" 2>&1 > {log} """ |
132 133 134 135 | shell: """ R -e "workflowr::wflow_build('{input.rmd}')" 2>&1 > {log} """ |
155 156 157 158 | shell: """ R -e "workflowr::wflow_build('{input.rmd}')" 2>&1 > {log} """ |
Support
- Future updates
Related Workflows





