Snakemake-based HiC-Pro Workflow for Hi-C Data Analysis

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

snakemake_hic

This is a template repository for running HiC-Pro under snakemake. The steps currently implemented are:

Download and index the genome
Trim raw data using AdapterRemoval
Run FastQC on raw and trimmed data
Prepare Files for HiC-Pro
- Organise genome annotations
- Define restriction fragments
- Update hicpro-config.txt based on config.yml

Setup

HiC-Pro must be installed and visible to this workflow. Please ensure you have a working installation on your system. This workflow is built around v3.0.0

Data must be placed in data/raw/fastq
- Each sample/replicate needs to go in a separate folder. Use the test dataset as a guide.
- Delete the test data once you have placed yours in the correct folder
Edit config/samples.tsv to reflect the sample names and file names.
- Data is assumed to be paired, so only one file needs to be listed per directory
Edit config/config.yml

Testing

Once all edits are performed and data is uploaded, run:

snakemake -n

This is a dry run to check everything should work as expected. If the dry run is successful, create the conda environments:

snakemake --use-conda --conda-create-envs-only --cores 1

This may take a while. Snakemake creates these in series so only one core is required.

Once complete, create the rulegraph to check everything is as expected.

snakemake --rulegraph > rules/rulegraph.dot
dot -Tpdf rules/rulegraph.dot > rules/rulegraph.pdf

Execution

The following will set the workflow running using 12 cores

snakemake \ --use-conda \ --notemp \ --cores 12

Code Snippets

shell:
    """
    pigz -p {threads} -c {input.valid_pairs} > {output.valid_pairs}
    """

SnakeMake From line 15 of rules/collect_output.smk

shell:
    """
    pigz -p {threads} -c {input.matrix} > {output.matrix}
    """

SnakeMake From line 32 of rules/collect_output.smk

shell:
    """
    ## The generic copy all. Clearly this will repeat every time though
    # cp {params.in_path}/* {params.out_path}

    ## Copy the specific files
    cp {input.stat} {output.stat}
    """

SnakeMake From line 57 of rules/collect_output.smk

shell:
    """
    convert \
        -density {params.density} \
        {input.pdf}[0] \
        -quality {params.quality} \
        {output.png}
    """

SnakeMake From line 81 of rules/collect_output.smk

shell:
    """
    Rscript --vanilla \
      {input.script} \
      {params.bin} &> {log}
    """

SnakeMake From line 102 of rules/collect_output.smk

shell:
    """
    # Run the python script
    python {params.script} \
      -r {params.restriction_site} \
      -o {output.rs} \
      {input}
    """

SnakeMake From line 14 of rules/hicpro.smk

shell:
    """
    awk '{{print($3 - $2)}}' {input} | \
      sort -n | \
      uniq -c > {output}
    """

SnakeMake From line 27 of rules/hicpro.smk

shell:
    """
    pigz -p {threads} -c {input} > {output}
    """

SnakeMake From line 39 of rules/hicpro.smk

shell:
    """
    Rscript --vanilla \
      scripts/write_hicpro_config.R \
      {params.idx_root} \
      {input.chr_sizes} \
      {input.rs} \
      {params.template} \
      {output}
    """

SnakeMake From line 70 of rules/hicpro.smk

shell:
    """
    ## Remove any existing data as leaving this here causes HicPro to
    ## make an interactive request. Piping `yes` into HicPro may be the
    ## source of some current problems
    if [[ -d {params.outdir} ]]; then
      rm -rf {params.outdir}
    fi

    ## Run HiC-pro
    HiC-Pro \
      -s mapping \
      -c {input.config} \
      -i {params.indir} \
      -o {params.outdir} 
    """

SnakeMake HiC-Pro From line 132 of rules/hicpro.smk

shell:
    """
    HiC-Pro \
      -s quality_checks \
      -c {input.config} \
      -i {params.indir} \
      -o {params.outdir} 
    """

SnakeMake HiC-Pro From line 196 of rules/hicpro.smk

shell:
    """
    HiC-Pro \
      -s proc_hic \
      -c {input.config} \
      -i {params.indir} \
      -o {params.outdir} 
    """

SnakeMake HiC-Pro From line 237 of rules/hicpro.smk

shell:
    """
    HiC-Pro \
      -s merge_persample \
      -c {input.config} \
      -i {params.indir} \
      -o {params.outdir} 
    """

SnakeMake HiC-Pro From line 284 of rules/hicpro.smk

shell:
    """
    HiC-Pro \
      -s build_contact_maps \
      -c {input.config} \
      -i {params.indir} \
      -o {params.outdir} 
    """

SnakeMake HiC-Pro From line 319 of rules/hicpro.smk

shell:
    """
    git clone https://github.com/Rassa-Gvm/MaxHiC.git scripts/MaxHiC
    rm -rf scripts/MaxHiC/Sample_Inputs
    """

SnakeMake From line 9 of rules/maxhic.smk

shell:
    """
    ## Given the problems with the raw output from HiC-Pro, we should
    ## delete any *ord.bed* files that exist. They seem to have been 
    ## excluded from HiC-Pro v3
    if compgen -G "{params.input_path}/*ord.bed" > /dev/null; then
      echo -e "Deleting unnecessary symlink"
      rm {params.input_path}/*ord.bed
    fi

    python {input.maxhic_exe} \
      -t {threads} \
      {params.input_path} \
      {params.output_path} &> {log}

    ## Compress the output files
    pigz -p {threads} {params.output_path}/*txt
    """

SnakeMake HiC-Pro From line 34 of rules/maxhic.smk

shell:
    """
    Rscript --vanilla \
        scripts/merge_matrices.R \
        {params.bin} \
        {params.in_path} \
        {params.out_path} \
        {params.samples} &> {log}
    """

SnakeMake From line 37 of rules/merge_matrices.smk

shell:
    """
    pigz -p {threads} -c {input.mat} > {output.mat}
    pigz -p {threads} -c {input.bed} > {output.bed}
    """

SnakeMake From line 64 of rules/merge_matrices.smk

shell:
    """
    # Write to a separate temp directory for each run to avoid I/O clashes
    TEMPDIR=$(mktemp -d -t fqcXXXXXXXXXX)
    fastqc \
      {params} \
      -t {threads} \
      --outdir $TEMPDIR \
      {input} &> {log}

    # Move the files
    mv $TEMPDIR/*html $(dirname {output.html})
    mv $TEMPDIR/*zip $(dirname {output.zip})

    # Clean up the temp directory
    rm -rf $TEMPDIR
    """

SnakeMake FastQC From line 13 of rules/qc.smk

shell:
    """
    # Write to a separate temp directory for each run to avoid I/O clashes
    TEMPDIR=$(mktemp -d -t fqcXXXXXXXXXX)
    fastqc \
      {params} \
      -t {threads} \
      --outdir $TEMPDIR \
      {input} &> {log}

    # Move the files
    mv $TEMPDIR/*html $(dirname {output.html})
    mv $TEMPDIR/*zip $(dirname {output.zip})

    # Clean up the temp directory
    rm -rf $TEMPDIR
    """

SnakeMake FastQC From line 43 of rules/qc.smk

shell:
    """
    wget \
      -O "{output}" \
      -o {log} \
      {params.ftp}
    """

SnakeMake From line 12 of rules/reference.smk

shell:
    """
    gunzip -c {input} > {output}
    """

SnakeMake From line 24 of rules/reference.smk

shell:
    """
    bowtie2-build \
      --threads {threads} \
      -f {input} \
      {params.prefix} &> {log}
    """

SnakeMake Bowtie 2 From line 46 of rules/reference.smk

    shell:
        """
        # Download the assembly report
        TEMPDIR=$(mktemp -d -t chrXXXXXXXXXX)
        REPORT="assembly_report.txt"
	    curl {params.ftp} > $TEMPDIR/$REPORT 2> {log}

        # Extract the chrom_sizes
        egrep 'assembled-molecule' "$TEMPDIR/$REPORT" | \
          awk '{{print "chr"$2"\t"$3}}' > {output}

        rm -rf $TEMPDIR
        """

SnakeMake From line 65 of rules/reference.smk

shell:
    """
    AdapterRemoval \
        --adapter1 {params.adapter1} \
        --adapter2 {params.adapter2} \
        --file1 {input.r1} \
        --file2 {input.r2} \
        --threads {threads} \
        --gzip \
        --maxns {params.maxns} \
        --trimqualities \
        --minquality {params.minqual} \
        --minlength {params.minlength} \
        --output1 {output.t1} \
        --output2 {output.t2} \
        --discarded /dev/null \
        --singleton /dev/null \
        --settings {output.log} &> {log}
    """

SnakeMake AdapterRemoval From line 20 of rules/trimming.smk

shell:
    """
    echo -e "Version: 1.0\n\nRestoreWorkspace: Default\nSaveWorkspace: Default\nAlwaysSaveHistory: Default\n\nEnableCodeIndexing: Yes\nUseSpacesForTab: Yes\nNumSpacesForTab: 2\nEncoding: UTF-8\n\nRnwWeave: knitr\nLaTeX: pdfLaTeX\n\nAutoAppendNewline: Yes\nStripTrailingWhitespace: Yes" > {output}
    """

SnakeMake From line 4 of rules/workflowr.smk

shell:
    """
    git add analysis/*
    R -e "workflowr::wflow_build('{input.rmd}')" 2>&1 > {log}
    git add docs/*
    git commit -m 'Updated all'
    """

SnakeMake From line 28 of rules/workflowr.smk

shell:
    """
    R -e "workflowr::wflow_build('{input.rmd}')" 2>&1 > {log}
    """

SnakeMake From line 54 of rules/workflowr.smk

shell:
    """
    R -e "workflowr::wflow_build('{input.rmd}')" 2>&1 > {log}
    """

SnakeMake From line 77 of rules/workflowr.smk

shell:
    """
    R -e "workflowr::wflow_build('{input.rmd}')" 2>&1 > {log}
    """

SnakeMake From line 132 of rules/workflowr.smk

shell:
    """
    R -e "workflowr::wflow_build('{input.rmd}')" 2>&1 > {log}
    """

SnakeMake From line 155 of rules/workflowr.smk

ShowHide 27 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/steveped/snakemake_hic

Name: snakemake_hic

Version: 1

Badge:

Insert copied code into your website to add a link to this workflow.

License: None

Keywords:

AdapterRemoval Bowtie 2 FastQC HiC-Pro Snakemake Sequence analysis

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free