Clinical Whole Genome Sequencing Pipeline

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

genome-seek 🔬

Whole Genome Clinical Sequencing Pipeline.

This is the home of the pipeline, genome-seek. Its long-term goals: to accurately call germline and somatic variants, to infer SVs & CNVs, and to boldly annotate variants like no pipeline before!

Overview

Welcome to genome-seek! Before getting started, we highly recommend reading through genome-seek's documentation .

The ./genome-seek pipeline is composed several inter-related sub commands to setup and run the pipeline across different systems. Each of the available sub commands perform different functions:

genome-seek run : Run the genome-seek pipeline with your input files.
genome-seek unlock : Unlocks a previous runs output directory.
genome-seek cache : Cache software containers locally.

genome-seek is a comprehensive clinical WGS pipeline that is focused on speed. Each tool in the pipeline was benchmarked and selected due to its low run times without sacrificing accuracy or precision. It relies on technologies like Singularity1 to maintain the highest-level of reproducibility. The pipeline consists of a series of data processing and quality-control steps orchestrated by Snakemake2 , a flexible and scalable workflow management system, to submit jobs to a cluster.

The pipeline is compatible with data generated from Illumina short-read sequencing technologies. As input, it accepts a set of FastQ files and can be run locally on a compute instance, on-premise using a cluster, or on the cloud (feature coming soon!). A user can define the method or mode of execution. The pipeline can submit jobs to a cluster using a job scheduler like SLURM, or run on AWS using Tibanna (feature coming soon!). A hybrid approach ensures the pipeline is accessible to all users.

Before getting started, we highly recommend reading through the usage section of each available sub command.

For more information about issues or trouble-shooting a problem, please checkout our FAQ prior to opening an issue on Github .

Dependencies

Requires: singularity>=3.5 snakemake>=7.8

At the current moment, the pipeline uses a mixture of enviroment modules and docker images; however, this will be changing soon! In the very near future, the pipeline will only use docker images. With that being said, snakemake and singularity must be installed on the target system. Snakemake orchestrates the execution of each step in the pipeline. To guarantee the highest level of reproducibility, each step of the pipeline will rely on versioned images from DockerHub . Snakemake uses singularity to pull these images onto the local filesystem prior to job execution, and as so, snakemake and singularity will be the only two dependencies in the future.

Installation

Please clone this repository to your local filesystem using the following command:

# Clone Repository from Github
git clone https://github.com/OpenOmics/genome-seek.git
# Change your working directory
cd genome-seek/
# Add dependencies to $PATH
# Biowulf users should run
module load snakemake singularity
# Get usage information
./genome-seek -h

Contribute

This site is a living document, created for and by members like you. genome-seek is maintained by the members of NCBR and is improved by continous feedback! We encourage you to contribute new content and make improvements to existing content via pull request to our GitHub repository .

References

1. Kurtzer GM, Sochat V, Bauer MW (2017). Singularity: Scientific containers for mobility of compute. PLoS ONE 12(5): e0177459.
2. Koster, J. and S. Rahmann (2018). "Snakemake-a scalable bioinformatics workflow engine." Bioinformatics 34(20): 3600.

Code Snippets

shell: """
vcftools \\
    --gzvcf {input.vcf} \\
    --plink \\
    --out {params.intermediate} \\
    --chr {params.peddy_chr}
cut -f1-6 {params.intermediate}.ped \\
> {output.ped}
peddy -p {threads} \\
    --prefix {params.prefix} \\
    {input.vcf} \\
    {output.ped}
"""

SnakeMake VCFtools peddy From line 64 of rules/cnv.smk

shell: """
# Get biological sex
# predicted by peddy 
predicted_sex=$(awk -F ',' \\
    '$8=="{params.sample}" \\
    {{print $7}}' \\
    {input.csv}
)

# Copy over base ploidy 
# vcf for predicted sex 
# and add name to header
if [ "$predicted_sex" == "male" ]; then 
    cp {params.male_ploidy} {output.ploidy}
else
    # prediction is female
    cp {params.female_ploidy} {output.ploidy}
fi
sed -i 's/SAMPLENAME/{params.sample}/g' \\
    {output.ploidy}

# Delete Canvas checkpoints
if [ -d {params.checkpoints} ]; then
    # Forces Canvas to start 
    # over from the beginning
    rm -rf '{params.checkpoints}'
fi

# CANVAS in Germline WGS mode
export COMPlus_gcAllowVeryLargeObjects=1
Canvas.dll Germline-WGS \\
    -b {input.bam} \\
    -n {params.sample} \\
    -o {params.outdir} \\
    -r {params.canvas_kmer} \\
    --ploidy-vcf={output.ploidy} \\
    -g {params.canvas_genome} \\
    -f {params.canvas_filter} \\
    --sample-b-allele-vcf={input.vcf}

# Filter predicted CNVs
bcftools filter \\
    --include 'FILTER="PASS" && INFO/SVTYPE="CNV"' \\
    {output.vcf} \\
> {output.filtered}

# Rank and annotate CNVs
AnnotSV \\
    -annotationsDir  {params.annotsv_annot} \\
    -genomeBuild {params.annotsv_build} \\
    -outputDir {params.outdir} \\
    -outputFile {output.annotated} \\
    -SVinputFile {output.filtered} \\
    -snvIndelFiles {input.joint} \\
    -snvIndelSamples {params.sample}

# Check if AnnotSV silently failed
if [ ! -f "{output.annotated}" ]; then
    # AnnotSV failed to process
    # provided SVinputFile file, 
    # usually due to passing an 
    # empty filtered SV file
    echo "WARNING: AnnotSV silently failed..." 1>&2
    touch {output.annotated}
fi
"""

SnakeMake BCFtools peddy AnnotSV From line 126 of rules/cnv.smk

shell: """
java -Xmx{params.memory}g -cp {params.amber_jar} \\
    com.hartwig.hmftools.amber.AmberApplication \\
        -tumor {params.tumor} {params.normal_name} \\
        -tumor_bam {input.tumor}  {params.normal_bam} \\
        -output_dir {params.outdir} \\
        -threads {threads} {params.tumor_flag} \\
        -loci {params.loci_ref}
"""

SnakeMake From line 232 of rules/cnv.smk

shell: """
java -Xmx{params.memory}g -cp {params.cobalt_jar} \\
    com.hartwig.hmftools.cobalt.CountBamLinesApplication \\
        -tumor {params.tumor} {params.normal_name} \\
        -tumor_bam {input.tumor}  {params.normal_bam} \\
        -output_dir {params.outdir} \\
        -threads {threads} {params.tumor_flag} \\
        -gc_profile {params.gc_profile}
"""

SnakeMake From line 282 of rules/cnv.smk

shell: """
# Set output directories
# for Amber and Cobalt
amber_outdir="$(dirname "{input.amber}")"
cobalt_outdir="$(dirname "{input.cobalt}")"
echo "Amber output directory: $amber_outdir"
echo "Cobalt output directory: $cobalt_outdir"

# Run Purple to find CNVs,
# purity and ploidy, and 
# cancer driver events
java -Xmx{params.memory}g -jar {params.purple_jar} \\
    -tumor {params.tumor} {params.normal_name} \\
    -output_dir {params.outdir} \\
    -amber "$amber_outdir" \\
    -cobalt "$cobalt_outdir" \\
    -circos circos \\
    -gc_profile {params.gc_profile} \\
    -ref_genome {params.genome} \\
    -ref_genome_version {params.ref_ver} \\
    -run_drivers \\
    -driver_gene_panel {params.panel} \\
    -somatic_hotspots {params.somatic_hotspot} \\
    -germline_hotspots {params.germline_hotspot} \\
    -threads {threads} {params.tumor_flag} \\
    -somatic_vcf {input.vcf} {params.sv_option}
"""

SnakeMake NG-Circos From line 344 of rules/cnv.smk

shell: """
java -Xmx{params.memory}g -jar ${{GATK_JAR}} -T RealignerTargetCreator \\
    --use_jdk_inflater \\
    --use_jdk_deflater \\
    -I {input.bam} \\
    -R {params.genome} \\
    -o {output.intervals} \\
    {params.knowns}

java -Xmx{params.memory}g -jar ${{GATK_JAR}} -T IndelRealigner \\
    -R {params.genome} \\
    -I {input.bam} \\
    {params.knowns} \\
    --use_jdk_inflater \\
    --use_jdk_deflater \\
    -targetIntervals {output.intervals} \\
    -o {output.bam}
"""

SnakeMake gatk_realigner_target_creator From line 30 of rules/gatk_recal.smk

shell: """
gatk --java-options '-Xmx{params.memory}g' BaseRecalibrator \\
    --input {input.bam} \\
    --reference {params.genome} \\
    {params.knowns} \\
    --output {output.recal} \\
    {params.intervals}
"""

SnakeMake gatk From line 76 of rules/gatk_recal.smk

shell: """
# Create GatherBQSR list
find {params.bams} -iname '{params.sample}_recal*_data.grp' \\
    > {output.lsl}
# Gather per sample BQSR results
gatk --java-options '-Xmx{params.memory}g' GatherBQSRReports \\
    --use-jdk-inflater --use-jdk-deflater \\
    -I {output.lsl} \\
    -O {output.recal}
"""

SnakeMake gatk From line 111 of rules/gatk_recal.smk

shell: """
gatk --java-options '-Xmx{params.memory}g' ApplyBQSR \\
    --use-jdk-inflater --use-jdk-deflater \\
    --reference {params.genome} \\
    --bqsr-recal-file {input.recal} \\
    --input {input.bam} \\
    --output {output.bam}

samtools index -@ {threads} {output.bam} {output.bai}
"""

SnakeMake SAMtools gatk From line 149 of rules/gatk_recal.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

run_deepvariant \\
    --model_type=WGS \\
    --ref={params.genome} \\
    --reads={input.bam} \\
    --output_gvcf={output.gvcf} \\
    --output_vcf={output.vcf} \\
    --num_shards={threads} \\
    --intermediate_results_dir=${{tmp}}
"""

SnakeMake From line 32 of rules/germline.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit, 
# GLnexus tmpdir should NOT exist
# prior to running it. If it does
# exist prior to runnning, it will
# immediately error out.
if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi
tmp_parent=$(mktemp -d -p "{params.tmpdir}")
tmp_dne=$(echo "${{tmp_parent}}"| sed 's@$@/GLnexus.DB@')
trap 'rm -rf "${{tmp_parent}}"' EXIT

# Avoids ARG_MAX issue which will
# limit max length of a command
find {params.gvcfdir} -iname '*.g.vcf.gz' \\
> {output.gvcfs}

glnexus_cli \\
    --dir ${{tmp_dne}} \\
    --config DeepVariant_unfiltered \\
    --list {output.gvcfs} \\
    --threads {threads} \\
    --mem-gbytes {params.memory} \\
> {output.bcf}

bcftools norm \\
    -m - \\
    -Oz \\
    --threads {threads} \\
    -f {params.genome} \\
    -o {output.norm} \\
    {output.bcf}

bcftools view \\
    -Oz \\
    --threads {threads} \\
    -o {output.jvcf} \\
    {output.bcf}

bcftools index \\
    -f -t \\
    --threads {threads} \\
    {output.norm}

bcftools index \\
    -f -t \\
    --threads {threads} \\
    {output.jvcf}
"""

SnakeMake BCFtools GLnexus From line 79 of rules/germline.smk

shell: """
gatk --java-options '-Xmx{params.memory}g -XX:ParallelGCThreads={threads}' SelectVariants \\
    -R {params.genome} \\
    --variant {input.vcf} \\
    --sample-name {params.sample} \\
    --exclude-non-variants \\
    --output {output.vcf}
"""

SnakeMake gatk From line 152 of rules/germline.smk

shell: """
singularity exec -B {params.bind} {params.sif} \\
HLA-LA.pl \\
    --BAM {input.bam} \\
    --graph PRG_MHC_GRCh38_withIMGT \\
    --sampleID sample \\
    --maxThreads {threads} \\
    --workingDir {params.outdir}
"""

SnakeMake Singularity Hub hla-la From line 49 of rules/hla.smk

shell: """
gatk --java-options '-Xmx{params.memory}g -XX:ParallelGCThreads={threads}' SelectVariants \\
    -R {params.genome} \\
    --variant {input.vcf} \\
    -L {params.chrom} \\
    --exclude-non-variants \\
    --output {output.vcf}

tabix --force \\
    -p vcf {output.vcf}
"""

SnakeMake gatk tabix From line 32 of rules/open_cravat.smk

shell: """
# Environment variable for modules dir
export OC_MODULES="{params.module}"

oc run \\
    -t vcf \\
    -x \\
    --newlog \\
    --cleanrun \\
    --system-option "modules_dir={params.module}" \\
    -a {params.annot} \\
    -n {params.prefix} \\
    -l {params.genome} \\
    -d {params.outdir} \\
    --mp {threads} \\
    {input.vcf}
"""

SnakeMake From line 70 of rules/open_cravat.smk

    shell: """
# Create first filtering script,
# Filters based on AF 
mkdir -p {params.scripts}
cp {input.db} {output.filter_1} 
cat << EOF > {params.filter_1}
import sqlite3
import os

maf = str({params.maf_thres})
mafc = str(1 - float({params.maf_thres}))
so = "{params.so}"
conn = sqlite3.connect("{output.filter_1}")
conn.isolation_level = None
cursor = conn.cursor()
conn.execute('CREATE TABLE variant2 AS SELECT * FROM variant WHERE (base__so IN (' + '"' + '", "'.join(so.split(',')) + '"' + ')) AND ((gnomad__af IS NULL AND gnomad3__af IS NULL AND thousandgenomes__af IS NULL) OR (gnomad__af <= ' + maf +' OR gnomad__af >= '+mafc+' OR gnomad3__af <= '+maf+' OR gnomad3__af >= '+mafc+' OR thousandgenomes__af <= '+maf+' OR thousandgenomes__af >= '+mafc+'))')
conn.execute('DROP TABLE variant')
conn.execute('ALTER TABLE variant2 RENAME TO variant')
conn.execute('DELETE from sample WHERE base__uid NOT IN (SELECT base__uid FROM variant)')
conn.execute('DELETE FROM gene WHERE base__hugo NOT IN (SELECT base__hugo FROM variant)')
conn.execute('DELETE FROM mapping WHERE base__uid NOT IN (SELECT base__uid FROM variant)')
conn.execute('UPDATE info SET colval = (SELECT COUNT(*) FROM variant) WHERE colkey == "Number of unique input variants"')
conn.execute('VACUUM')
conn.close()
EOF

# Create second filtering script,
# Filters based on filters in config
cp {output.filter_1} {output.filter_2}
cat << EOF > {params.filter_2}
import pandas as pd
import sqlite3
import os

conn = sqlite3.connect("{output.filter_2}")
conn.isolation_level = None
cursor = conn.cursor()
filter = {params.secondary}

def keep(dd, used_annotators):
    final_annotators = {{annotator: dd[annotator] for annotator in used_annotators}}
    return final_annotators

def filtercol(dd, annot):
    if dd['relation']=='IN':
        return annot + '__' + dd['col'] + ' ' + dd['relation'] + ' ("' + '", "'.join(dd['value'].split(',')) + '")'
    elif dd['relation']=='IS' or dd['relation']=='IS NOT':
        return annot + '__' + dd['col'] + ' ' + dd['relation'] + ' ' + dd['value']
    else:
        return annot + '__'+ dd['col'] + ' ' + dd['relation'] + dd['value']    

def filterunit(annot):
    dd = filter[annot]
    if dd['col'].lower()=='multiple':
        cols = dd['cols'].split(',')
        relations = dd['relation'].split(',')
        values = dd['value'].split(',')
        return(' OR '.join([filtercol({{'col':cols[0], 'relation': relations[0], 'value': values[0]}}, annot) for i in range(len(cols))]))
    else:
        return(filtercol(dd, annot))

def filterunit_null(annot):
    dd = filter[annot]
    if dd['col'].lower()=='multiple':
        cols = dd['cols'].split(',')
        relations = dd['relation'].split(',')
        values = dd['value'].split(',')
        return(' AND '.join([annot + '__' + cols[i] + ' IS NULL' for i in range(len(cols))]))
    else:
        return(annot + '__' + dd['col'] + ' IS NULL')

# Find the intersect of annotators listed
# in colnames of the SQLite variants table 
# and annotators in filter config. Must run
# this step, if a module in OpenCRAVAT does
# not have any annotations for a provided set
# of variants, then that modules annotations 
# may not exist in the SQLite table.
df = pd.read_sql_query("SELECT * FROM variant", conn)
table_var_annotators = set([col for col in df.columns])
filter_annotators = []
column2module = {{}}
for ann in set(filter.keys()):
    try:
        # Multiple column filters
        col_names = filter[ann]['cols']
        col_names = [c.strip() for c in col_names.split(',')]
    except KeyError:
        # One column filter 
        col_names = [filter[ann]['col'].strip()]
    for col in col_names:
        coln = '{{}}__{{}}'.format(ann, col)
        filter_annotators.append(coln)
        column2module[coln] = ann

filter_annotators = set(filter_annotators)
tmp_annotators = table_var_annotators.intersection(filter_annotators)
keep_annotators = set([column2module[ann] for ann in tmp_annotators])

# Sanity check 
if len(keep_annotators) == 0:
    print('WARNING: No filter annotators were provided that match oc run annotators.', file=sys.stderr)
    print('WARNING: The next filtering step may fail.', file=sys.stderr)

# Filter to avoid SQL filtering issues
filter = keep(filter, keep_annotators)
print('Apply final filters to SQLite: ', filter)

filter_query_nonnull = ' OR '.join([filterunit(annot) for annot in filter.keys()])
filter_query_null = ' AND '.join([filterunit_null(annot) for annot in filter.keys()])
filter_query = filter_query_nonnull + ' OR (' + filter_query_null + ')'
print(filter_query)
conn.execute('CREATE TABLE variant2 AS SELECT * FROM variant WHERE (' + filter_query + ')')
conn.execute('DROP TABLE variant')
conn.execute('ALTER TABLE variant2 RENAME TO variant')
conn.execute('DELETE from sample WHERE base__uid NOT IN (SELECT base__uid FROM variant)')
conn.execute('DELETE FROM gene WHERE base__hugo NOT IN (SELECT base__hugo FROM variant)')
conn.execute('DELETE FROM mapping WHERE base__uid NOT IN (SELECT base__uid FROM variant)')
conn.execute('UPDATE info SET colval = (SELECT COUNT(*) FROM variant) WHERE colkey == "Number of unique input variants"')
conn.execute('VACUUM')
conn.close()
EOF

# Create fix column script,
# Fixes columns to a min and max AF 
cp {output.filter_2} {output.fixed}
cat << EOF > {params.fixed}
import sqlite3
import os
import pandas as pd

conn = sqlite3.connect("{output.fixed}")
conn.isolation_level = None
cursor = conn.cursor()
depth = pd.read_sql_query('SELECT base__uid,vcfinfo__tot_reads,vcfinfo__af from variant', conn, index_col = 'base__uid')
tot_read = depth.vcfinfo__tot_reads.str.split(';', expand = True).astype('float')
af = depth.vcfinfo__af.str.split(';', expand = True).apply(pd.to_numeric)
depth['vcfinfo__Max_read'] = tot_read.max(axis = 1)
depth['vcfinfo__Min_read'] = tot_read.min(axis = 1)
depth['vcfinfo__Max_af'] = af.max(axis = 1)
depth['vcfinfo__Min_af'] = af.min(axis = 1)
depth.to_sql('tmp', conn, if_exists = 'replace', index = True)
conn.execute('alter table variant add column vcfinfo__Max_read numeric(50)')
conn.execute('alter table variant add column vcfinfo__Min_read numeric(50)')
conn.execute('alter table variant add column vcfinfo__Max_af numeric(50)')
conn.execute('alter table variant add column vcfinfo__Min_af numeric(50)')
qry = 'update variant set vcfinfo__Max_read = (select vcfinfo__Max_read from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Max_read is NULL'
conn.execute(qry)
qry = 'update variant set vcfinfo__Min_read = (select vcfinfo__Min_read from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Min_read is NULL'
conn.execute(qry)
qry = 'update variant set vcfinfo__Max_af = (select vcfinfo__Max_af from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Max_af is NULL'
conn.execute(qry)
qry = 'update variant set vcfinfo__Min_af = (select vcfinfo__Min_af from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Min_af is NULL'
conn.execute(qry)
conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Max_read','{{"index": null, "name": "vcfinfo__Max_read", "title": "Max reads", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''')
conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Min_read','{{"index": null, "name": "vcfinfo__Min_read", "title": "Min reads", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''')
conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Max_af','{{"index": null, "name": "vcfinfo__Max_af", "title": "Max AF", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''')
conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Min_af','{{"index": null, "name": "vcfinfo__Min_af", "title": "Min AF", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''')
conn.commit()
conn.execute('drop table tmp')
conn.execute('VACUUM')
conn.close()
EOF

echo 'Running first filtering script'
python3 {params.filter_1}

echo 'Running secondary filtering script'
python3 {params.filter_2}

echo 'Running column fixing script'
python3 {params.fixed}
"""

SnakeMake Pandas From line 157 of rules/open_cravat.smk

shell: """
oc util mergesqlite \\
    -o {output.merged} \\
    {input.dbs} 
"""

SnakeMake From line 353 of rules/open_cravat.smk

shell: """
# Environment variable for modules dir
export OC_MODULES="{params.module}"

oc run \\
    -t vcf \\
    -x \\
    --newlog \\
    --cleanrun \\
    --system-option "modules_dir={params.module}" \\
    -a {params.annot} \\
    -n {params.prefix} \\
    -l {params.genome} \\
    -d {params.outdir} \\
    --mp {threads} \\
    {input.vcfs}
"""

SnakeMake From line 386 of rules/open_cravat.smk

    shell: """
# Create first filtering script,
# Filters based on AF 
mkdir -p {params.scripts}
cp {input.db} {output.filter_1} 
cat << EOF > {params.filter_1}
import sqlite3
import os

maf = str({params.maf_thres})
mafc = str(1 - float({params.maf_thres}))
so = "{params.so}"
conn = sqlite3.connect("{output.filter_1}")
conn.isolation_level = None
cursor = conn.cursor()
conn.execute('CREATE TABLE variant2 AS SELECT * FROM variant WHERE (base__so IN (' + '"' + '", "'.join(so.split(',')) + '"' + ')) AND ((gnomad__af IS NULL AND gnomad3__af IS NULL AND thousandgenomes__af IS NULL) OR (gnomad__af <= ' + maf +' OR gnomad__af >= '+mafc+' OR gnomad3__af <= '+maf+' OR gnomad3__af >= '+mafc+' OR thousandgenomes__af <= '+maf+' OR thousandgenomes__af >= '+mafc+'))')
conn.execute('DROP TABLE variant')
conn.execute('ALTER TABLE variant2 RENAME TO variant')
conn.execute('DELETE from sample WHERE base__uid NOT IN (SELECT base__uid FROM variant)')
conn.execute('DELETE FROM gene WHERE base__hugo NOT IN (SELECT base__hugo FROM variant)')
conn.execute('DELETE FROM mapping WHERE base__uid NOT IN (SELECT base__uid FROM variant)')
conn.execute('UPDATE info SET colval = (SELECT COUNT(*) FROM variant) WHERE colkey == "Number of unique input variants"')
conn.execute('VACUUM')
conn.close()
EOF

# Create second filtering script,
# Filters based on filters in config
cp {output.filter_1} {output.filter_2}
cat << EOF > {params.filter_2}
import pandas as pd
import sqlite3
import os

conn = sqlite3.connect("{output.filter_2}")
conn.isolation_level = None
cursor = conn.cursor()
filter = {params.secondary}

def keep(dd, used_annotators):
    final_annotators = {{annotator: dd[annotator] for annotator in used_annotators}}
    return final_annotators

def filtercol(dd, annot):
    if dd['relation']=='IN':
        return annot + '__' + dd['col'] + ' ' + dd['relation'] + ' ("' + '", "'.join(dd['value'].split(',')) + '")'
    elif dd['relation']=='IS' or dd['relation']=='IS NOT':
        return annot + '__' + dd['col'] + ' ' + dd['relation'] + ' ' + dd['value']
    else:
        return annot + '__'+ dd['col'] + ' ' + dd['relation'] + dd['value']    

def filterunit(annot):
    dd = filter[annot]
    if dd['col'].lower()=='multiple':
        cols = dd['cols'].split(',')
        relations = dd['relation'].split(',')
        values = dd['value'].split(',')
        return(' OR '.join([filtercol({{'col':cols[0], 'relation': relations[0], 'value': values[0]}}, annot) for i in range(len(cols))]))
    else:
        return(filtercol(dd, annot))

def filterunit_null(annot):
    dd = filter[annot]
    if dd['col'].lower()=='multiple':
        cols = dd['cols'].split(',')
        relations = dd['relation'].split(',')
        values = dd['value'].split(',')
        return(' AND '.join([annot + '__' + cols[i] + ' IS NULL' for i in range(len(cols))]))
    else:
        return(annot + '__' + dd['col'] + ' IS NULL')

# Find the intersect of annotators listed
# in colnames of the SQLite variants table 
# and annotators in filter config. Must run
# this step, if a module in OpenCRAVAT does
# not have any annotations for a provided set
# of variants, then that modules annotations 
# may not exist in the SQLite table.
df = pd.read_sql_query("SELECT * FROM variant", conn)
table_var_annotators = set([col for col in df.columns])
filter_annotators = []
column2module = {{}}
for ann in set(filter.keys()):
    try:
        # Multiple column filters
        col_names = filter[ann]['cols']
        col_names = [c.strip() for c in col_names.split(',')]
    except KeyError:
        # One column filter 
        col_names = [filter[ann]['col'].strip()]
    for col in col_names:
        coln = '{{}}__{{}}'.format(ann, col)
        filter_annotators.append(coln)
        column2module[coln] = ann

filter_annotators = set(filter_annotators)
tmp_annotators = table_var_annotators.intersection(filter_annotators)
keep_annotators = set([column2module[ann] for ann in tmp_annotators])

# Sanity check 
if len(keep_annotators) == 0:
    print('WARNING: No filter annotators were provided that match oc run annotators.', file=sys.stderr)
    print('WARNING: The next filtering step may fail.', file=sys.stderr)

# Filter to avoid SQL filtering issues
filter = keep(filter, keep_annotators)
print('Apply final filters to SQLite: ', filter)

filter_query_nonnull = ' OR '.join([filterunit(annot) for annot in filter.keys()])
filter_query_null = ' AND '.join([filterunit_null(annot) for annot in filter.keys()])
filter_query = filter_query_nonnull + ' OR (' + filter_query_null + ')'
print(filter_query)
conn.execute('CREATE TABLE variant2 AS SELECT * FROM variant WHERE (' + filter_query + ')')
conn.execute('DROP TABLE variant')
conn.execute('ALTER TABLE variant2 RENAME TO variant')
conn.execute('DELETE from sample WHERE base__uid NOT IN (SELECT base__uid FROM variant)')
conn.execute('DELETE FROM gene WHERE base__hugo NOT IN (SELECT base__hugo FROM variant)')
conn.execute('DELETE FROM mapping WHERE base__uid NOT IN (SELECT base__uid FROM variant)')
conn.execute('UPDATE info SET colval = (SELECT COUNT(*) FROM variant) WHERE colkey == "Number of unique input variants"')
conn.execute('VACUUM')
conn.close()
EOF

# Create fix column script,
# Fixes columns to a min and max AF 
cp {output.filter_2} {output.fixed}
cat << EOF > {params.fixed}
import sqlite3
import os
import pandas as pd

conn = sqlite3.connect("{output.fixed}")
conn.isolation_level = None
cursor = conn.cursor()
depth = pd.read_sql_query('SELECT base__uid,vcfinfo__tot_reads,vcfinfo__af from variant', conn, index_col = 'base__uid')
tot_read = depth.vcfinfo__tot_reads.str.split(';', expand = True).astype('float')
af = depth.vcfinfo__af.str.split(';', expand = True).apply(pd.to_numeric)
depth['vcfinfo__Max_read'] = tot_read.max(axis = 1)
depth['vcfinfo__Min_read'] = tot_read.min(axis = 1)
depth['vcfinfo__Max_af'] = af.max(axis = 1)
depth['vcfinfo__Min_af'] = af.min(axis = 1)
depth.to_sql('tmp', conn, if_exists = 'replace', index = True)
conn.execute('alter table variant add column vcfinfo__Max_read numeric(50)')
conn.execute('alter table variant add column vcfinfo__Min_read numeric(50)')
conn.execute('alter table variant add column vcfinfo__Max_af numeric(50)')
conn.execute('alter table variant add column vcfinfo__Min_af numeric(50)')
qry = 'update variant set vcfinfo__Max_read = (select vcfinfo__Max_read from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Max_read is NULL'
conn.execute(qry)
qry = 'update variant set vcfinfo__Min_read = (select vcfinfo__Min_read from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Min_read is NULL'
conn.execute(qry)
qry = 'update variant set vcfinfo__Max_af = (select vcfinfo__Max_af from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Max_af is NULL'
conn.execute(qry)
qry = 'update variant set vcfinfo__Min_af = (select vcfinfo__Min_af from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Min_af is NULL'
conn.execute(qry)
conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Max_read','{{"index": null, "name": "vcfinfo__Max_read", "title": "Max reads", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''')
conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Min_read','{{"index": null, "name": "vcfinfo__Min_read", "title": "Min reads", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''')
conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Max_af','{{"index": null, "name": "vcfinfo__Max_af", "title": "Max AF", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''')
conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Min_af','{{"index": null, "name": "vcfinfo__Min_af", "title": "Min AF", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''')
conn.commit()
conn.execute('drop table tmp')
conn.execute('VACUUM')
conn.close()
EOF

echo 'Running first filtering script'
python3 {params.filter_1}

echo 'Running secondary filtering script'
python3 {params.filter_2}

echo 'Running column fixing script'
python3 {params.fixed}
"""

SnakeMake Pandas From line 471 of rules/open_cravat.smk

shell: """
python3 {params.get_flowcell_lanes} \\
    {input.r1} \\
    {wildcards.name} \\
> {output.txt}
"""

SnakeMake From line 36 of rules/qc.smk

shell: """
fastqc \\
    {input.r1} \\
    {input.r2} \\
    -t {threads} \\
    -o {params.outdir}
"""

SnakeMake FastQC From line 66 of rules/qc.smk

shell: """
fastq_screen --conf {params.fastq_screen_config} \\
    --outdir {params.outdir} \\
    --threads {threads} \\
    --subset 1000000 \\
    --aligner bowtie2 \\
    --force \\
    {input.fq1} \\
    {input.fq2}
"""

SnakeMake Bowtie 2 From line 105 of rules/qc.smk

shell: """
fastqc -t {threads} \\
    -f bam \\
    --contaminants {params.adapters} \\
    -o {params.outdir} \\
    {input.bam} 
"""

SnakeMake FastQC From line 140 of rules/qc.smk

shell: """
unset DISPLAY
qualimap bamqc -bam {input.bam} \\
    --java-mem-size=92G \\
    -c -ip --gd HUMAN \\
    -outdir {params.outdir} \\
    -outformat HTML \\
    -nt {threads} \\
    --skip-duplicated \\
    -nw 500 \\
    -p NON-STRAND-SPECIFIC
"""

SnakeMake QualiMap From line 172 of rules/qc.smk

shell: """
samtools flagstat --threads {threads} \\
    {input.bam} \\
> {output.txt}
"""

SnakeMake SAMtools From line 207 of rules/qc.smk

shell: """
bcftools stats \\
    {input.vcf} \\
> {output.txt}
"""

SnakeMake BCFtools From line 238 of rules/qc.smk

shell: """
gatk --java-options '-Xmx{params.memory}g -XX:ParallelGCThreads={threads}' VariantEval \\
    -R {params.genome} \\
    -O {output.grp} \\
    --dbsnp {params.dbsnp} \\
    --eval {input.vcf} 
"""

SnakeMake gatk From line 272 of rules/qc.smk

shell: """
java -Xmx{params.memory}g -jar ${{SNPEFF_JAR}} \\
    -v -canon -c {params.config} \\
    -csvstats {output.csv} \\
    -stats {output.html} \\
    {params.genome} \\
    {input.vcf} > {output.vcf}
"""

SnakeMake From line 307 of rules/qc.smk

shell: """
vcftools \\
    --gzvcf {input.vcf} \\
    --het \\
    --out {params.prefix}
"""

SnakeMake VCFtools From line 340 of rules/qc.smk

shell: """
java -Xmx{params.memory}g -jar ${{PICARDJARPATH}}/picard.jar \\
    CollectVariantCallingMetrics \\
    INPUT={input.vcf} \\
    OUTPUT={params.prefix} \\
    DBSNP={params.dbsnp} \\
    Validation_Stringency=SILENT
"""

SnakeMake Picard From line 372 of rules/qc.smk

shell: """ 
echo "Extracting sites to estimate ancestry."
somalier extract \\
    -d {params.outdir} \\
    --sites {params.sites} \\
    -f {params.genome} \\
    {input.vcf}

# Check if pedigree file exists,
# pulled from patient database
pedigree_option=""
if [ -f "{params.ped}" ]; then
    # Use PED with relate command
    pedigree_option="-p {params.ped}"
fi
echo "Estimating relatedness with pedigree $pedigree_option"
somalier relate "$pedigree_option" \\
    -i -o {params.outdir}/relatedness \\
    {output.somalier}

echo "Estimating ancestry."
somalier ancestry \\
    --n-pcs=10 \\
    -o {params.outdir}/ancestry \\
    --labels {params.ancestry_db}/ancestry-labels-1kg.tsv \\
    {params.ancestry_db}/*.somalier ++ \\
    {output.somalier} || {{
# Somalier ancestry error,
# usually due to not finding
# any sites compared to the 
# its references, expected 
# with sub-sampled datasets
echo "WARNING: Somalier ancestry failed..." 1>&2
touch {output.ancestry}
}}
"""

SnakeMake From line 410 of rules/qc.smk

shell: """
multiqc --ignore '*/.singularity/*' \\
    --ignore 'slurmfiles/' \\
    --exclude peddy \\
    -f --interactive \\
    -n {output.report} \\
    {params.workdir}
"""

SnakeMake MultiQC peddy From line 500 of rules/qc.smk

shell: """
mkdir -p '{params.tmppath}'
octopus --threads {threads} \\
    -C cancer \\
    --working-directory {params.wd} \\
    --temp-directory-prefix {params.tmpdir} \\
    -R {params.genome} \\
    -I {input.normal} {input.tumor} {params.normal_option} \\
    -o {output.vcf} \\
    --forest-model {params.g_model} \\
    --somatic-forest-model {params.s_model} \\
    --annotations AC AD DP \\
    -T {params.chunk}
"""

SnakeMake octopus From line 100 of rules/somatic.smk

shell: """
# Create list of chunks to merge
find {params.octopath} -iname '{params.tumor}.vcf.gz' \\
    > {output.lsl}
# Merge octopus chunk calls,
# contains both germline and
# somatic variants
bcftools concat \\
    --threads {threads} \\
    -d exact \\
    -a \\
    -f {output.lsl} \\
    -o {output.raw} \\
    -O v
# Filter Octopus callset for 
# variants with SOMATIC tag
grep -E "#|CHROM|SOMATIC" {output.raw} \\
    > {output.vcf}
"""

SnakeMake BCFtools octopus From line 140 of rules/somatic.smk

shell: """
octopus --threads {threads} \\
    --working-directory {params.wd} \\
    --temp-directory-prefix {params.tmpdir} \\
    -R {params.genome} \\
    -I {input.normal} \\
    -o {output.vcf} \\
    --forest-model {params.model} \\
    --annotations AC AD AF DP \\
    -T {params.chroms}
"""

SnakeMake octopus From line 196 of rules/somatic.smk

shell: """
gatk Mutect2 \\
    -R {params.genome} \\
    -I {input.tumor} {params.i_option} {params.normal_option} \\
    --panel-of-normals {params.pon} \\
    --germline-resource {params.germsource} \\
    -L {params.chrom} \\
    -O {output.vcf} \\
    --f1r2-tar-gz {output.orien} \\
    --independent-mates
"""

SnakeMake gatk From line 246 of rules/somatic.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

java -Xmx{params.memory}g -Djava.io.tmpdir=${{tmp}} \
    -XX:ParallelGCThreads={threads} -jar $GATK_JAR -T CombineVariants \\
    --use_jdk_inflater --use_jdk_deflater \\
    -R {params.genome} \\
    --filteredrecordsmergetype KEEP_UNCONDITIONAL \\
    --assumeIdenticalSamples \\
    -o {output.vcf} \\
    {params.multi_variant_option}
"""

SnakeMake From line 287 of rules/somatic.smk

shell: """
# Gather Mutect2 stats
gatk MergeMutectStats \\
    {params.multi_stats_option} \\
    -O {output.stats} &
# Learn read orientaion model
# for artifact filtering 
gatk LearnReadOrientationModel \\
    --output {output.orien} \\
    {params.multi_orien_option} &
wait
"""

SnakeMake gatk From line 347 of rules/somatic.smk

shell: """
gatk --java-options '-Xmx{params.memory}g' GetPileupSummaries \\
    -I {input.tumor} \\
    -V {params.gsnp} \\
    -L {params.gsnp} \\
    -O {output.summary}
"""

SnakeMake gatk From line 386 of rules/somatic.smk

shell: """
gatk --java-options '-Xmx{params.memory}g' GetPileupSummaries \\
    -I {input.normal} \\
    -V {params.gsnp} \\
    -L {params.gsnp} \\
    -O {output.summary}
"""

SnakeMake gatk From line 422 of rules/somatic.smk

shell: """
gatk CalculateContamination \\
    -I {input.tumor} {params.normal_option} \\
    -O {output.summary}
"""

SnakeMake gatk From line 458 of rules/somatic.smk

shell: """
# Mutect2 orien bias filter
gatk FilterMutectCalls \\
    -R {params.genome} \\
    -V {input.vcf} \\
    --ob-priors {input.orien} \\
    --contamination-table {input.summary} \\
    -O {output.vcf} \\
    --stats {input.stats} 
"""

SnakeMake gatk From line 493 of rules/somatic.smk

shell: """
# Prefilter and calculate position
# specific summary statistics 
MuSE call \\
    -n {threads} \\
    -f {params.genome} \\
    -O {params.tumor} \\
    {input.tumor} {params.normal_option} 
# Calculate cutoffs from a 
# sample specific error model
MuSE sump \\
    -n {threads} \\
    -G \\
    -I {output.txt} \\
    -O {output.vcf} \\
    -D {params.dbsnp}
# Renaming TUMOR/NORMAL in VCF 
# with real sample names
echo -e "TUMOR\\t{params.rename}{params.normal_header}" \\
> {output.header} 
bcftools reheader \\
    -o {output.final} \\
    -s {output.header} \\
    {output.vcf}
"""

SnakeMake BCFtools From line 549 of rules/somatic.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

# Delete previous attempts output
# directory to ensure hard restart
if [ -d "{params.outdir}" ]; then
    rm -rf "{params.outdir}"
fi

# Configure Strelka somatic workflow
configureStrelkaSomaticWorkflow.py \\
    --referenceFasta {params.genome} \\
    --tumorBam {input.tumor} {params.normal_option} \\
    --runDir {params.outdir} \\
    --callRegions {params.regions}

# Call somatic variants with Strelka
echo "Starting Strelka workflow..."
{params.workflow} \\
    -m local \\
    -j {threads} \\
    -g {params.memory} 

# Combine and filter results
echo "Running CombineVariants..."
java -Xmx{params.memory}g -Djava.io.tmpdir=${{tmp}} \
    -XX:ParallelGCThreads={threads} -jar $GATK_JAR -T CombineVariants \\
    --use_jdk_inflater --use_jdk_deflater \\
    -R {params.genome} \\
    --variant {output.snps} \\
    --variant {output.indels} \\
    --assumeIdenticalSamples \\
    --filteredrecordsmergetype KEEP_UNCONDITIONAL \\
    -o {output.vcf}

# Renaming TUMOR/NORMAL in 
# VCF with real sample names
echo -e "TUMOR\\t{params.tumor}{params.normal_header}" \\
> {output.header} 
bcftools reheader \\
    -o {output.final} \\
    -s {output.header} \\
    {output.vcf}
"""

SnakeMake BCFtools From line 621 of rules/somatic.smk

shell: """
# Normalize VCF prior to SelectVar
# which needs multi-allelic sites
# to be split prior to running 
echo "Running bcftools norm..."
bcftools norm \\
    -c w \\
    -m - \\
    -Ov \\
    --threads {threads} \\
    -f {params.genome} \\
    -o {output.norm} \\
    {input.vcf}
# Remove filtered sites and output
# variants not called in the PON
echo "Running SelectVariants..."
gatk --java-options "-Xmx{params.memory}g" SelectVariants \\
    -R {params.genome} \\
    --variant {output.norm} \\
    --discordance {params.pon} \\
    --exclude-filtered \\
    --output {output.filt}
# Fix format number metadata, gatk 
# SelectVariants converts Number
# metadata incorrectly when it
# it is set to Number=.
sed -i 's/Number=R/Number=./g' \\
    {output.filt}
"""

SnakeMake gatk BCFtools From line 699 of rules/somatic.smk

shell: """
# Filter call set to tumor sites
bcftools view \\
    -c1 \\
    -Oz \\
    -s '{params.sample}' \\
    -o {output.tmp} \\
    {input.vcf}
# Renaming sample name in VCF 
# to contain caller name
echo -e "{params.sample}\\t{params.rename}" \\
> {output.header} 
bcftools reheader \\
    -o {output.tumor} \\
    -s {output.header} \\
    {output.tmp}
# Create an VCF index for intersect
bcftools index \\
    -f \\
    --tbi \\
    {output.tumor} 
"""

SnakeMake BCFtools From line 756 of rules/somatic.smk

shell: """
# Filter call set to normal sites
bcftools view \\
    --force-samples \\
    -c1 \\
    -Oz \\
    -s '{params.sample}' \\
    -o {output.tmp} \\
    {input.vcf}
# Renaming sample name in VCF 
# to contain caller name
echo -e "{params.sample}\\t{params.rename}" \\
> {output.header} 
bcftools reheader \\
    -o {output.normal} \\
    -s {output.header} \\
    {output.tmp}
# Create an VCF index for intersect
bcftools index \\
    -f \\
    --tbi \\
    {output.normal}
"""

SnakeMake BCFtools From line 806 of rules/somatic.smk

shell: """
# Delete previous attempts output
# directory to ensure hard restart
if [ -d "{params.isec_dir}" ]; then
    rm -rf "{params.isec_dir}"
fi
# Intersect somatic callset to find
# variants in at least two callers
bcftools isec \\
    -Oz \\
    -n+2 \\
    -c none \\
    -p {params.isec_dir} \\
    {input.tumors} 
# Create list of files to merge 
find {params.isec_dir} \\
    -name '*.vcf.gz' \\
    | sort \\
> {output.lsl} 
# Merge variants found in at 
# least two callers 
bcftools merge \\
    -Oz \\
    -o {output.merged} \\
    -l {output.lsl}
# Create an VCF index for merge
bcftools index \\
    -f \\
    --tbi \\
    {output.merged}
"""

SnakeMake BCFtools From line 855 of rules/somatic.smk

shell: """
# vcf2maf needs an uncompressed VCF file
zcat {input.vcf} \\
> {output.vcf}
# Run VEP and convert VCF into MAF file
vcf2maf.pl \\
    --input-vcf {output.vcf} \\
    --output-maf {output.maf} \\
    --vep-path ${{VEP_HOME}} \\
    --vep-data {params.vep_data} \\
    --cache-version {params.ref_version} \\
    --ref-fasta {params.genome} \\
    --vep-forks {threads} \\
    --tumor-id {params.tumor} {params.normal_option} \\
    --ncbi-build {params.vep_build} \\
    --species {params.vep_species}
"""

SnakeMake From line 921 of rules/somatic.smk

shell: """
echo "Combining MAFs..."
head -2 {input.mafs[0]} > {output.maf}
awk 'FNR>2 {{print}}' {input.mafs} >> {output.maf}
"""

SnakeMake From line 958 of rules/somatic.smk

shell: """
Rscript {params.script} \\
    {params.wdir} \\
    {input.maf} \\
    {output.summary} \\
    {output.oncoplot}
"""

SnakeMake From line 992 of rules/somatic.smk

shell: """
# SigProfiler input directory must
# only contain input MAF
mkdir -p "{params.wdir}"
ln -sf {input.maf} {params.wdir}
python3 {params.script} \\
    -i {params.wdir}/ \\
    -o {params.odir}/ \\
    -p {params.sample} \\
    -r {params.genome}
"""

SnakeMake From line 1040 of rules/somatic.smk

shell: """
# Merge SigProfiler PDFs
pdfunite {input.pdfs} \\
    {output.pdf}
"""

SnakeMake From line 1075 of rules/somatic.smk

shell: """
# Delete previous attempts output
# directory to ensure hard restart
if [ -d "{params.outdir}" ]; then
    rm -rf "{params.outdir}"
fi

# Configure Manta germline SV workflow 
configManta.py \\
    --callRegions {params.regions} \\
    --bam {input.bam} \\
    --referenceFasta {params.genome} \\
    --runDir {params.outdir}

# Call germline SV with Manta workflow
echo "Starting Manta workflow..."
{params.workflow} \\
    -j {threads} \\
    -g {params.memory} 
"""

SnakeMake From line 55 of rules/sv.smk

shell: """
# Delete previous attempts output
# directory to ensure hard restart
if [ -d "{params.outdir}" ]; then
    rm -rf "{params.outdir}"
fi

# Configure Manta somatic SV workflow 
configManta.py {params.normal_option} \\
    --callRegions {params.regions} \\
    --tumorBam {input.tumor} \\
    --referenceFasta {params.genome} \\
    --runDir {params.outdir} \\
    --outputContig

# Call somatic SV with Manta workflow
echo "Starting Manta workflow..."
{params.workflow} \\
    -j {threads} \\
    -g {params.memory} 
"""

SnakeMake From line 108 of rules/sv.smk

shell: """
fastp -w {threads} \\
    --detect_adapter_for_pe \\
    --in1 {input.r1} \\
    --in2 {input.r2} \\
    --out1 {output.r1} \\
    --out2 {output.r2} \\
    --json {output.json} \\
    --html {output.html}
"""

SnakeMake fastp From line 29 of rules/trim_map.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

bwa-mem2 mem \\
    -t {threads} \\
    -K 100000000 \\
    -M \\
    -R \'@RG\\tID:{params.sample}\\tSM:{params.sample}\\tPL:illumina\\tLB:{params.sample}\\tPU:{params.sample}\\tCN:ncbr\\tDS:wgs\' \\
    {params.genome} \\
    {input.r1} \\
    {input.r2} \\
| samblaster -M \\
| samtools sort -@{params.sort_threads} \\
    -T ${{tmp}} \\
    --write-index \\
    -m 10G - \\
    -o {output.bam}##idx##{output.bai}
"""