Snakemake module containing different analyses provided by parabricks.

public public 1yr ago Version: v1.1.0 0 bookmarks

Snakemake module containing an array of steps provided by the parabricks tookit

:speech_balloon: Introduction

The module contains rules to align .fastq -files and call variants in the resulting .bam -files using Clara Parabricks . To use this module a server with access to one or more compatible NVIDIA GPUs is required. Input data should be trimmed .fastq -files and we recommend to generate these with hydra-genetics/prealignment for a smooth transition. In order to make use of read group information, add machine, flowcell and library specifics to units.tsv .

:heavy_exclamation_mark: Dependencies

In order to use this module, the following dependencies are required:

:school_satchel: Preparations

Sample and unit data

Input data should be added to samples.tsv and units.tsv . The following information need to be added to these files:

Column Id Description
samples.tsv
sample unique sample/patient id, one per row
tumor_content ratio of tumor cells to total cells
units.tsv
sample same sample/patient id as in samples.tsv
type data type identifier (one letter), can be one of T umor, N ormal, R NA
platform type of sequencing platform, e.g. NovaSeq
machine specific machine id, e.g. NovaSeq instruments have @Axxxxx
flowcell identifer of flowcell used
lane flowcell lane number
barcode sequence library barcode/index, connect forward and reverse indices by + , e.g. ATGC+ATGC
fastq1/2 absolute path to forward and reverse reads
adapter adapter sequences to be trimmed, separated by comma

Reference data

Reference files should be specified in config.yaml in the section reference . A .fasta -file is needed as well as a .vcf file containing known indels used during the alignment process. For the RNA alignment part, genome_dir should specify a directory containing reference files generated by STAR .

:rocket: Usage

To use this module in your workflow, follow the description in the snakemake docs . Add the module to your Snakefile like so:

module parabricks:
 snakefile:
 github(
 "hydra-genetics/parabricks",
 path="workflow/Snakefile",
 tag="1.0.0",
 )
 config:
 config
use rule * from parabricks as parabricks_*

Compatibility

Latest:

  • prealignment:v1.1.0

See COMPATIBLITY.md file for a complete list of module compatibility.

Output files

The following output files should be targeted via another rule:

File Description
parabricks/pbrun_deepvariant/{sample}.vcf variant call file generated by deepvariant
parabricks/pbrun_fq2bam/{sample}_{type}.bam alignment file generated by BWA-mem
parabricks/pbrun_mutectcaller_t/{sample}_T.vcf variant call file generated by Mutect2 using tumor-only mode
parabricks/pbrun_mutectcaller_tn/{sample}.vcf variant call file generated by Mutect2 using tumor/normal mode
parabricks/pbrun_rna_fq2bam/{sample}_R.bam alignment file generated by STAR

:judge: Rule Graph

rule_graph

Code Snippets

39
40
41
42
43
44
45
46
shell:
    "{params.cuda} pbrun deepvariant "
    "--ref {input.fasta} "
    "--in-bam {input.bam} "
    "--num-gpus {params.num_gpus} "
    "--out-variants {output.vcf} "
    "{params.extra} "
    "--tmp-dir parabricks/pbrun_deepvariant/{wildcards.sample} &> {log}"
85
86
87
88
89
90
91
92
93
94
95
shell:
    "{params.cuda} pbrun fq2bam "
    "--ref {input.fasta} "
    "--in-fq {params.in_fq} "
    "--knownSites {input.sites} "
    "--num-gpus {params.num_gpus} "
    "--out-bam {output.bam} "
    "--out-duplicate-metrics {output.metrics} "
    "--out-recal-file {output.recal} "
    "{params.extra} "
    "--tmp-dir parabricks/pbrun_fq2bam/{wildcards.sample}_{wildcards.type} &> {log}"
131
132
133
134
135
136
137
138
139
140
shell:
    "{params.cuda} pbrun mutectcaller "
    "--ref {input.fasta} "
    "--in-tumor-bam {input.bam_t} "
    "--tumor-name {wildcards.sample}_T "
    "--in-tumor-recal-file {input.recal_t} "
    "--num-gpus {params.num_gpus} "
    "--out-vcf {output.vcf} "
    "{params.extra} "
    "--tmp-dir parabricks/pbrun_mutectcaller_t/{wildcards.sample} &> {log}"
SnakeMake From line 131 of rules/pbrun.smk
179
180
181
182
183
184
185
186
187
188
189
190
191
shell:
    "{params.cuda} pbrun mutectcaller "
    "--ref {input.fasta} "
    "--in-tumor-bam {input.bam_t} "
    "--tumor-name {wildcards.sample}_T "
    "--in-tumor-recal-file {input.recal_t} "
    "--in-normal-bam {input.bam_n} "
    "--normal-name {wildcards.sample}_N "
    "--in-normal-recal-file {input.recal_n} "
    "--num-gpus {params.num_gpus} "
    "--out-vcf {output.vcf} "
    "{params.extra} "
    "--tmp-dir parabricks/pbrun_mutectcaller_tn/{wildcards.sample} &> {log}"
SnakeMake From line 179 of rules/pbrun.smk
226
227
228
229
230
231
232
233
234
235
236
237
shell:
    "{params.cuda} pbrun rna_fq2bam "
    "{params.extra} "
    "--genome-lib-dir {input.genome_dir} "
    "--in-fq {params.in_fq} "
    "--num-gpus {params.num_gpus} "
    "--output-dir parabricks/pbrun_rna_fq2bam/ "
    "--out-bam {output.bam} "
    "--out-prefix {wildcards.sample}_{wildcards.type} "
    "--tmp-dir parabricks/pbrun_rna_fq2bam/{wildcards.sample}_{wildcards.type} "
    "{params.extra} "
    "--logfile {log}"
SnakeMake From line 226 of rules/pbrun.smk
ShowHide 4 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/hydra-genetics/parabricks
Name: parabricks
Version: v1.1.0
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: GNU General Public License v3.0
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...