Trio Genome Assembly Snakemake Workflow

public public 1yr ago 0 bookmarks

Input

  • Child reads

    • HiFi in .bam or .fastq.gz
  • Parent reads

    • HiFi in .bam or .fastq.gz

    • Paired end reads in .bam , .cram , or ( .R1.fasq.gz + .R2.fasq.gz )

Directory structure within basedir

.
├── cluster_logs # slurm stderr/stdout logs
├── config.yaml # customized copy of example_config.yaml
├── output_trio_assembly # output directory
│ └── <trio_id>
│ ├── fasta
│ ├── hifiasm
│ ├── logs
│ └── yak
└── workflow_trio_assembly # clone of this repo ├── calN50 └── rules └── envs

Run the pipeline

# clone workflow from github into directory named `workflow_trio_assembly`
git clone --recursive [email protected]:juniper-lake/trio-assembly-snakemake.git workflow_trio_assembly
# create directory for cluster logs to be stored.
mkdir cluster_logs
# create `config.yaml` based on `example_config.yaml`.
cp workflow_trio_assembly/example_config.yaml config.yaml
# adjust paths in Snakefile and sbatch script as necessary
# create conda environment to run snakemake workflow.
conda create --prefix ./conda_env --channel bioconda --channel conda-forge lockfile==0.12.2 python=3 snakemake mamba
# activate conda environment. **This conda env must be activated each time you run the workflow.**
conda activate ./conda_env
# run workflow by submitting sbatch script with <trio_id>.
sbatch workflow_trio_assembly/run_snakemake.sh <trio_id>

To Do

  • find out if yak requires R1 and R2 separate for paired end

    • if not: remove sort step and remove step separating reads into separate files

Code Snippets

9
shell: "(samtools fasta -@ 3 {input} > {output}) > {log} 2>&1"
17
shell: "(seqtk seq -A {input} > {output}) > {log} 2>&1"
49
50
51
52
53
54
55
56
shell:
        """
        (
            hifiasm -o {params.prefix} -t {threads} {params.extras} \
                -1 {input.pat_yak} -2 {input.mat_yak} {input.fasta} \
            && (echo -e "hap1\t{params.hap1}\nhap2\t{params.hap2}" > {output_dir}/{trio_id}/hifiasm/{wildcards.sample}.asm.key.txt) \
        ) > {log} 2>&1
        """
65
shell: "(gfatools gfa2fa {input} > {output}) 2> {log}"
75
shell: "(bgzip --threads {threads} {input}) > {log} 2>&1"
96
shell: "(yak trioeval -t {threads} {input.pat_yak} {input.mat_yak} {input.fasta} > {output}) > {log} 2>&1"
113
114
115
116
117
118
shell:
    """
    (minimap2 -t {params.minimap2_threads} {params.minimap2_args} {input.ref} \
            -R '{params.readgroup}' {input.assembly} - \
            | samtools sort -@ {params.samtools_threads} {params.samtools_mem} > {output}) > {log} 2>&1
    """
128
shell: "(samtools index -@ 3 {input}) > {log} 2>&1"
101
shell: "md5sum {input} > {output}"
SnakeMake From line 101 of main/Snakefile
ShowHide 7 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/juniper-lake/trio-assembly-snakemake
Name: trio-assembly-snakemake
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...