An easy-to-use, flexible variant calling pipeline for use on the Biowulf cluster at NIH

public 1yr ago Version: v2.0 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

XAVIER - eXome Analysis and Variant explorER . This is the home of the pipeline, XAVIER. Its long-term goals: to accurately call germline and somatic variants, to infer CNVs, and to boldly annotate variants like no pipeline before!

Overview

Welcome to XAVIER! Before getting started, we highly recommend reading through xavier's documentation .

The xavier pipeline is composed several inter-related sub commands to setup and run the pipeline across different systems. Each of the available sub commands perform different functions:

xavier run : Run the XAVIER pipeline with your input files.
xavier unlock : Unlocks a previous runs output directory.
xavier cache : Cache remote resources locally, coming soon!

XAVIER is a comprehensive whole exome-sequencing pipeline following the Broad's set of best practices. It relies on technologies like Singularity1 to maintain the highest-level of reproducibility. The pipeline consists of a series of data processing and quality-control steps orchestrated by Snakemake2 , a flexible and scalable workflow management system, to submit jobs to a cluster or cloud provider.

The pipeline is compatible with data generated from Illumina short-read sequencing technologies. As input, it accepts a set of FastQ or BAM files and can be run locally on a compute instance, on-premise using a cluster, or on the cloud (feature coming soon!). A user can define the method or mode of execution. The pipeline can submit jobs to a cluster using a job scheduler like SLURM, or run on AWS using Tibanna (feature coming soon!). A hybrid approach ensures the pipeline is accessible to all users.

Before getting started, we highly recommend reading through the usage section of each available sub command.

For more information about issues or trouble-shooting a problem, please checkout our FAQ prior to opening an issue on Github .

Dependencies

Requires: singularity>=3.5 snakemake==6.X

Snakemake and singularity must be installed on the target system. Snakemake orchestrates the execution of each step in the pipeline. To guarantee the highest level of reproducibility, each step relies on versioned images from DockerHub . Snakemake uses singaularity to pull these images onto the local filesystem prior to job execution, and as so, snakemake and singularity are the only two dependencies.

Installation

Please clone this repository to your local filesystem using the following command:

# Clone Repository from Github
git clone https://github.com/CCBR/XAVIER.git
# Change your working directory
cd XAVIER/

Contribute

This site is a living document, created for and by members like you. XAVIER is maintained by the members of CCBR and is improved by continous feedback! We encourage you to contribute new content and make improvements to existing content via pull request to our repository .

References

1. Kurtzer GM, Sochat V, Bauer MW (2017). Singularity: Scientific containers for mobility of compute. PLoS ONE 12(5): e0177459.
2. Koster, J. and S. Rahmann (2018). "Snakemake-a scalable bioinformatics workflow engine." Bioinformatics 34(20): 3600.

Code Snippets

shell: """
myoutdir="$(dirname {output.cnvs})/{params.tumorsample}"
if [ ! -d "$myoutdir" ]; then mkdir -p "$myoutdir"; fi

perl "{params.config_script}" \\
    "$myoutdir" \\
    {params.lengths} \\
    {params.chroms} \\
    {input.tumor} \\
    {input.normal} \\
    {params.pile} \\
    {params.fasta} \\
    {params.snps} \\
    {input.targets}

freec -conf "$myoutdir/freec_exome_config.txt"

cat "{params.sig_script}" | \\
    R --slave \\
    --args $myoutdir/{params.tumorsample}.bam_CNVs \\
    $myoutdir/{params.tumorsample}.bam_ratio.txt

mv $myoutdir/{params.tumorsample}.bam_CNVs.p.value.txt {output.cnvs}
cat "{params.plot_script}" | \\
    R --slave \\
    --args 2 \\
    $myoutdir/{params.tumorsample}.bam_ratio.txt \\
    $myoutdir/{params.tumorsample}.bam_BAF.txt
"""

SnakeMake From line 27 of rules/cnv.smk

shell: """
myoutdir="$(dirname {output.fit})/{params.tumorsample}"
if [ ! -d "$myoutdir" ]; then mkdir -p "$myoutdir"; fi

gzip -c "$(dirname {input.freeccnvs})/{params.tumorsample}/{params.normalsample}.bam_minipileup.pileup" \\
    > "$myoutdir/{params.normalsample}.recal.bam_minipileup.pileup.gz"
gzip -c "$(dirname {input.freeccnvs})/{params.tumorsample}/{params.tumorsample}.bam_minipileup.pileup" \\
    > "$myoutdir/{params.tumorsample}.recal.bam_minipileup.pileup.gz"

sequenza-utils bam2seqz \\
    -p \\
    -gc {params.gc} \\
    -n "$myoutdir/{params.normalsample}.recal.bam_minipileup.pileup.gz" \\
    -t "$myoutdir/{params.tumorsample}.recal.bam_minipileup.pileup.gz" \\
    | gzip > "$myoutdir/{params.tumorsample}.seqz.gz"

sequenza-utils seqz_binning \\
    -w 100 \\
    -s "$myoutdir/{params.tumorsample}.seqz.gz" \\
    | tee "$myoutdir/{params.tumorsample}.bin100.seqz" \\
    | gzip > "$myoutdir/{params.tumorsample}.bin100.seqz.gz"

Rscript "{params.run_script}" \\
    "$myoutdir/{params.tumorsample}.bin100.seqz" \\
    "$myoutdir" \\
    "{params.normalsample}+{params.tumorsample}" \\
    {threads}

mv "$myoutdir/{params.normalsample}+{params.tumorsample}_alternative_solutions.txt" "{output.fit}"
rm "$myoutdir/{params.tumorsample}.bin100.seqz"
"""

SnakeMake sequenza-utils From line 75 of rules/cnv.smk

shell: """
myoutdir="$(dirname {output.cnvs})/{params.tumorsample}"
if [ ! -d "$myoutdir" ]; then mkdir -p "$myoutdir"; fi

perl {params.config_script} \\
    "$myoutdir" \\
    {params.lengths} \\
    {params.chroms} \\
    {input.tumor} \\
    {input.normal} \\
    {params.pile} \\
    {params.fasta} \\
    {params.snps} \\
    {input.targets} \\
    {input.fit}

freec -conf "$myoutdir/freec_exome_config.txt"

cat "{params.sig_script}" | \\
    R --slave \\
    --args $myoutdir/{params.tumorsample}.bam_CNVs \\
    $myoutdir/{params.tumorsample}.bam_ratio.txt

mv $myoutdir/{params.tumorsample}.bam_CNVs.p.value.txt {output.cnvs}
cat "{params.plot_script}" | \\
    R --slave \\
    --args 2 \\
    $myoutdir/{params.tumorsample}.bam_ratio.txt \\
    $myoutdir/{params.tumorsample}.bam_BAF.txt
"""

SnakeMake From line 134 of rules/cnv.smk

shell: """
wget https://github.com/mikdio/SOBDetector/releases/download/v1.0.2/SOBDetector_v1.0.2.jar \\
    -O {output.SOBDetector_jar}
"""

SnakeMake From line 8 of rules/ffpe.smk

shell: """
if [ ! -d "$(dirname {output.pass1_vcf})" ]; then 
    mkdir -p "$(dirname {output.pass1_vcf})"
fi

echo "Running SOBDetector..."
# Try/catch for running SOB Dectetor
# with an empty input VCF file 
java -jar {input.SOBDetector_jar} \\
    --input-type VCF \\
    --input-variants "{input.vcf}" \\
    --input-bam {input.bam} \\
    --output-variants {output.pass1_vcf} \\
    --only-passed false || {{
# Compare length of VCF header to 
# the total length of the file
header_length=$(grep '^#' "{input.vcf}" | wc -l)
file_length=$(cat "{input.vcf}" | wc -l)
if [ $header_length -eq $file_length ]; then
    # VCF file only contains header
    # File contains no variants, catch
    # problem so pipeline can continue
    cat "{input.vcf}" > {output.pass1_vcf}
else
    # SOB Dectector failed for another reason
    echo "SOB Detector Failed... exiting now!" 1>&2
    exit 1
fi
}}

bcftools query \\
    -f '%INFO/numF1R2Alt\\t%INFO/numF2R1Alt\\t%INFO/numF1R2Ref\\t%INFO/numF2R1Ref\\t%INFO/numF1R2Other\\t%INFO/numF2R1Other\\t%INFO/SOB\\n' \\
    {output.pass1_vcf} \\
    | awk '{{if ($1 != "."){{tum_alt=$1+$2; tum_depth=$1+$2+$3+$4+$5+$6; if (tum_depth==0){{tum_af=1}} else {{tum_af=tum_alt/tum_depth }}; print tum_alt,tum_depth,tum_af,$7}}}}' > {output.pass1_info} 
"""

SnakeMake BCFtools From line 30 of rules/ffpe.smk

shell: """
echo -e "#TUMOR.alt\\tTUMOR.depth\\tTUMOR.AF\\tSOB\\tFS\\tSOR\\tTLOD\\tReadPosRankSum" > {output.all_info_file}
cat {input.info_files} >> {output.all_info_file}

# Try/catch for running calculating
# mean and standard deviation with 
# with a set of empty input VCF files
all_length=$(tail -n+2 {output.all_info_file} | wc -l)
if [ $all_length -eq 0 ]; then 
    echo 'WARNING: All SOB Dectect pass1 samples contained no variants.' \\
    | tee {output.params_file}
else
    # Calculate mean and standard deviation
    grep -v '^#' {output.all_info_file} \\
    | awk '{{ total1 += $1; ss1 += $1^2; total2 += $2; ss2 += $2^2; total3 += $3; ss3 += $3^2; total4 += $4; ss4 += $4^2 }} END {{ print total1/NR,total2/NR,total3/NR,total4/NR; print sqrt(ss1/NR-(total1/NR)^2),sqrt(ss2/NR-(total2/NR)^2),sqrt(ss3/NR-(total3/NR)^3),sqrt(ss4/NR-(total4/NR)^2) }}' > {output.params_file}
fi
"""

SnakeMake From line 77 of rules/ffpe.smk

shell: """
if [ ! -d "$(dirname {output.pass2_vcf})" ]; then 
    mkdir -p "$(dirname {output.pass2_vcf})"
fi

echo "Running SOBDetector..."
# Try/catch for running SOB Dectetor
# with an empty input VCF file
bcf_annotate_option="-e 'INFO/pArtifact < 0.05' "
java -jar {input.SOBDetector_jar} \\
    --input-type VCF \\
    --input-variants "{input.vcf}" \\
    --input-bam "{input.bam}" \\
    --output-variants "{output.pass2_vcf}" \\
    --only-passed true \\
    --standardization-parameters "{input.params_file}" || {{
# Compare length of VCF header to 
# the total length of the file
header_length=$(grep '^#' "{input.vcf}" | wc -l)
file_length=$(cat "{input.vcf}" | wc -l)
if [ $header_length -eq $file_length ]; then
    # VCF file only contains header
    # File contains no variants, catch
    # problem so pipeline can continue
    cat "{input.vcf}" > {output.pass2_vcf}
else
    # SOB Dectector failed for another reason
    echo "SOB Detector Failed... exiting now!" 1>&2
    exit 1
fi
}}

echo "Making info table..."
bcftools query \\
    -f '%INFO/numF1R2Alt\\t%INFO/numF2R1Alt\\t%INFO/numF1R2Ref\\t%INFO/numF2R1Ref\\t%INFO/numF1R2Other\\t%INFO/numF2R1Other\\t%INFO/SOB\\n' \\
    "{output.pass2_vcf}" \\
    | awk '{{if ($1 != "."){{tum_alt=$1+$2; tum_depth=$1+$2+$3+$4+$5+$6; if (tum_depth==0){{tum_af=1}} else {{tum_af=tum_alt/tum_depth }}; print tum_alt,tum_depth,tum_af,$7}}}}' > "{output.pass2_info}"

echo "Filtering out artifacts..."
if [ "{wildcards.vc_outdir}" == "{config[output_params][MERGED_SOMATIC_OUTDIR]}" ]; then
    echo "Adding 'set' annotation back from merged variants..."
    bgzip --threads {threads} -c "{input.vcf}" > "{input.vcf}.gz"
    bcftools index -f -t "{input.vcf}.gz"
    bgzip --threads {threads} -c "{output.pass2_vcf}" > "{output.pass2_vcf}.gz"
    bcftools index -f -t "{output.pass2_vcf}.gz"
    bcftools annotate \\
        -a "{input.vcf}.gz" \\
        -c "INFO/set" \\
        "$bcf_annotate_option" \\
        -Oz \\
        -o {output.filtered_vcf} {output.pass2_vcf}.gz
else
    bcftools filter \\
        "$bcf_annotate_option" \\
        -Oz \\
        -o {output.filtered_vcf} {output.pass2_vcf}
    bcftools index -f -t {output.filtered_vcf}
fi
"""

SnakeMake BCFtools From line 116 of rules/ffpe.smk

shell: """
echo -e "#ID\\tDefaultParam\\tCohortParam\\tTotalVariants" > {output.count_table}
echo -e "#SAMPLE_ID\\tParam\\tCHROM\\tPOS\\tnumF1R2Alt\\tnumF2R1Alt\\tnumF1R2Ref\\tnumF2R1Ref\\tnumF1R2Other\\tnumF2R1Other\\tSOB\\tpArtifact\\tFS\\tSOR\\tTLOD\\tReadPosRankSum" > {output.full_metric_table}

P1FILES=({input.pass1_vcf})
P2FILES=({input.pass2_vcf})
for (( i=0; i<${{#P1FILES[@]}}; i++ )); do
    MYID=$(basename -s ".sobdetect.vcf" ${{P1FILES[$i]}})
    echo "Collecting metrics from $MYID..."

    # grep may fail if input files do not contain any variants 
    total_count=$(grep -v ^# ${{P1FILES[$i]}} | wc -l) || total_count=0
    count_1p=$(bcftools query -f '%INFO/pArtifact\n' ${{P1FILES[$i]}} | awk '{{if ($1 != "." && $1 < 0.05){{print}}}}' | wc -l)
    count_2p=$(bcftools query -f '%INFO/pArtifact\n' ${{P2FILES[$i]}} | awk '{{if ($1 != "." && $1 < 0.05){{print}}}}' | wc -l)

    echo -e "$MYID\\t$count_1p\\t$count_2p\\t$total_count" >> {output.count_table}

    bcftools query -f '%CHROM\\t%POS\\t%INFO/numF1R2Alt\\t%INFO/numF2R1Alt\\t%INFO/numF1R2Ref\\t%INFO/numF2R1Ref\\t%INFO/numF1R2Other\\t%INFO/numF2R1Other\\t%INFO/SOB\\t%INFO/pArtifact\n' ${{P1FILES[$i]}} | awk -v id=$MYID 'BEGIN{{OFS="\t"}}{{print id,"PASS_1",$0}}' >> {output.full_metric_table}
    bcftools query -f '%CHROM\\t%POS\\t%INFO/numF1R2Alt\\t%INFO/numF2R1Alt\\t%INFO/numF1R2Ref\\t%INFO/numF2R1Ref\\t%INFO/numF1R2Other\\t%INFO/numF2R1Other\\t%INFO/SOB\\t%INFO/pArtifact\n' ${{P2FILES[$i]}} | awk -v id=$MYID 'BEGIN{{OFS="\t"}}{{print id,"PASS_2",$0}}' >> {output.full_metric_table}
done
"""

SnakeMake BCFtools From line 191 of rules/ffpe.smk

shell: """
filetype=$(file -b --mime-type {input.filtered_vcf})
if [ $filetype == "application/gzip" ] ; then
    zcat {input.filtered_vcf} > {output.filtered_vcf}
else 
    {input.filtered_vcf} > {output.filtered_vcf}
fi

vcf2maf.pl \\
    --input-vcf {output.filtered_vcf} \\
    --output-maf {output.maf} \\
    --tumor-id {params.tumorsample} \\
    --vep-path /opt/vep/src/ensembl-vep \\
    --vep-data {params.bundle} \\
    --ncbi-build {params.build} \\
    --species {params.species} \\
    --vep-forks {threads} \\
    --ref-fasta {params.genome} \\
    --vep-overwrite

"""

SnakeMake From line 232 of rules/ffpe.smk

shell: """    
echo "Combining MAFs..."
head -2 {input.mafs[0]} > {output.maf}
awk 'FNR>2 {{print}}' {input.mafs} >> {output.maf}
"""

SnakeMake From line 264 of rules/ffpe.smk

shell:
    """
    myoutdir="$(dirname {output.gzvcf})"
    if [ ! -d "$myoutdir" ]; then mkdir -p "$myoutdir"; fi

    gatk --java-options '-Xmx24g' HaplotypeCaller \\
        --reference {params.genome} \\
        --input {input.bam} \\
        --use-jdk-inflater \\
        --use-jdk-deflater \\
        --emit-ref-confidence GVCF \\
        --annotation-group StandardAnnotation \\
        --annotation-group AS_StandardAnnotation \\
        --dbsnp {params.snpsites} \\
        --output {output.gzvcf} \\
        --intervals {params.chrom} \\
        --max-alternate-alleles 3
    """

SnakeMake gatk From line 27 of rules/germline.smk

shell:
    """
    input_str="--variant $(echo "{input.gzvcf}" | sed -e 's/ / --variant /g')"

    gatk --java-options '-Xmx24g' CombineGVCFs \\
        --reference {params.genome} \\
        --annotation-group StandardAnnotation \\
        --annotation-group AS_StandardAnnotation \\
        $input_str \\
        --output {output.gzvcf} \\
        --intervals {wildcards.chroms} \\
        --use-jdk-inflater \\
        --use-jdk-deflater
    """

SnakeMake gatk From line 68 of rules/germline.smk

shell:
    """
    myoutdir="$(dirname {output.vcf})"
    if [ ! -d "$myoutdir" ]; then mkdir -p "$myoutdir"; fi

    gatk --java-options '-Xmx96g' GenotypeGVCFs \\
        --reference {params.genome} \\
        --use-jdk-inflater \\
        --use-jdk-deflater \\
        --annotation-group StandardAnnotation \\
        --annotation-group AS_StandardAnnotation \\
        --dbsnp {params.snpsites} \\
        --output {output.vcf} \\
        --variant {input.gzvcf} \\
        --intervals {params.chr}
    """

SnakeMake gatk From line 106 of rules/germline.smk

shell:
    """
    # Avoids ARG_MAX issue which limits max length of a command
    ls --color=never -d $(dirname "{output.clist}")/raw_variants.*.vcf.gz > "{output.clist}"

    gatk MergeVcfs \\
        -R {params.genome} \\
        --INPUT {output.clist} \\
        --OUTPUT {output.vcf}
    """

SnakeMake gatk From line 142 of rules/germline.smk

shell:
    """
    gatk SelectVariants \\
    -V {input.vcf} \\
    -select-type SNP \\
    -O snps.vcf.gz

    gatk SelectVariants \\
    -V {input.vcf} \\
    -select-type INDEL \\
    -O indels.vcf.gz

    gatk VariantFiltration \\
    -V snps.vcf.gz \\
    -filter "QD < 2.0" --filter-name "QD2" \\
    -filter "QUAL < 30.0" --filter-name "QUAL30" \\
    -filter "SOR > 3.0" --filter-name "SOR3" \\
    -filter "FS > 60.0" --filter-name "FS60" \\
    -filter "MQ < 40.0" --filter-name "MQ40" \\
    -filter "MQRankSum < -12.5" --filter-name "MQRankSum-12.5" \\
    -filter "ReadPosRankSum < -8.0" --filter-name "ReadPosRankSum-8" \\
    -O {output.snpvcf}

    gatk VariantFiltration \\
    -V indels.vcf.gz \\
    -filter "QD < 2.0" --filter-name "QD2" \\
    -filter "QUAL < 30.0" --filter-name "QUAL30" \\
    -filter "FS > 200.0" --filter-name "FS200" \\
    -filter "ReadPosRankSum < -20.0" --filter-name "ReadPosRankSum-20" \\
    -O {output.indelvcf}

    gatk MergeVcfs \\
    -R {params.genome} \\
    --INPUT {output.indelvcf} \\
    --INPUT {output.snpvcf} \\
    --OUTPUT {output.vcf}
    """

SnakeMake gatk From line 176 of rules/germline.smk

shell:
    """
    gatk SelectVariants \\
        -R {params.genome} \\
        --intervals {params.targets} \\
        --variant {input.vcf} \\
        --sample-name {params.Sname} \\
        --exclude-filtered \\
        --exclude-non-variants \\
        --output {output.vcf}
    """

SnakeMake gatk From line 235 of rules/germline.smk

    shell:"""
set -exo pipefail
if [ -d {params.outdir} ];then rm -rf {params.outdir};fi 
mkdir -p {params.outdir}
cd {params.outdir}
# last file in inputs is NIDAP_files.tsv ... col1 is file ... col2 is the same file hardlinked in the NIDAP folder
# this file is created in get_nidap_folder_input_files function
linking_file=$(echo {input}|awk '{{print $NF}}')
while read a b;do 
    ln $a $b
done < $linking_file
"""

SnakeMake From line 14 of rules/nidap.smk

shell: """ 
echo "Extracting sites to estimate ancestry"
somalier extract \\
    -d "$(dirname {output.somalierOut})" \\
    --sites {params.sites_vcf} \\
    -f {params.genomeFasta} \\
    {input.bam}
"""

SnakeMake From line 21 of rules/qc_human.smk

shell: """ 
echo "Estimating relatedness"
somalier relate \\
    -o "$(dirname {output.relatedness})/relatedness" \\
    {input.somalier}

echo "Estimating ancestry"
somalier ancestry \\
    -o "$(dirname {output.relatedness})/ancestry" \\
    --labels {params.ancestry_db}/ancestry-labels-1kg.tsv \\
    {params.ancestry_db}/*.somalier ++ \\
    {input.somalier}
Rscript {params.script_path_gender} \\
    {output.relatednessSamples} \\
    {output.finalFileGender}    

Rscript {params.script_path_samples} \\
    {output.relatedness} \\
    {output.finalFilePairs}

Rscript {params.script_path_pca} \\
    {output.ancestry} \\
    {output.finalFilePairs} \\
    {output.ancestoryPlot} \\
    {output.pairAncestoryHist}
"""

SnakeMake From line 60 of rules/qc_human.smk

shell: """
multiqc --ignore '*/.singularity/*' \\
    --ignore '*/*/*/*/*/*/*/*/pyflow.data/*' \\
    --ignore 'slurmfiles/' \\
    -f --interactive \\
    -n {output.report} \\
    {params.workdir}
"""

SnakeMake MultiQC From line 121 of rules/qc_human.smk

shell: """ 
echo "Extracting sites to estimate ancestry"
somalier extract \\
    -d "$(dirname {output.somalierOut})" \\
    --sites {params.sites_vcf} \\
    -f {params.genomeFasta} \\
    {input.bam}
"""

SnakeMake From line 21 of rules/qc_mm10.smk

shell: """ 
echo "Estimating relatedness"
somalier relate \\
    -o "$(dirname {output.relatedness})/relatedness" \\
    {input.somalier}

Rscript {params.script_path_gender} \\
    {output.relatednessSamples} \\
    {output.finalFileGender}    

Rscript {params.script_path_samples} \\
    {output.relatedness} \\
    {output.finalFilePairs}

"""

SnakeMake From line 59 of rules/qc_mm10.smk

shell: """
multiqc --ignore '*/.singularity/*' \\
    --ignore '*/*/*/*/*/*/*/*/pyflow.data/*' \\
    --ignore 'slurmfiles/' \\
    -f --interactive \\
    -n {output.report} \\
    {params.workdir}
"""

SnakeMake MultiQC From line 108 of rules/qc_mm10.smk

shell: """
if [ ! -d "$(dirname {output.txt})" ]; then 
    mkdir -p "$(dirname {output.txt})"
fi

python {params.get_flowcell_lanes} \\
    {input.r1} \\
    {wildcards.samples} > {output.txt}
"""

SnakeMake From line 25 of rules/qc.smk

shell: """
fastq_screen --conf {params.fastq_screen_config} \\
    --outdir {params.outdir} \\
    --threads {threads} \\
    --subset 1000000 \\
    --aligner bowtie2 \\
    --force \\
    {input.fq1} {input.fq2}
"""

SnakeMake Bowtie 2 From line 64 of rules/qc.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

# Copy kraken2 db to local node storage to reduce filesystem strain
cp -rv {params.bacdb} ${{tmp}}
kdb_base=$(basename {params.bacdb})
kraken2 --db ${{tmp}}/${{kdb_base}} \\
    --threads {threads} --report {output.taxa} \\
    --output {output.out} \\
    --gzip-compressed \\
    --paired {input.fq1} {input.fq2}
# Generate Krona Report
cut -f2,3 {output.out} | \\
    ktImportTaxonomy - -o {output.html}
"""

SnakeMake kraken2 Krona From line 103 of rules/qc.smk

shell: """
fastqc -t {threads} \\
    -f bam \\
    -o {params.outdir} \\
    {input.bam} 
"""

SnakeMake FastQC From line 145 of rules/qc.smk

shell: """
python {params.script_path_reformat_bed} \\
    --input_bed {input.targets} \\
    --output_bed {output.bed}.temp
python3 {params.script_path_correct_target_bed} {output.bed}.temp {output.bed}
rm -f {output.bed}.temp
"""

SnakeMake From line 175 of rules/qc.smk

shell: """
unset DISPLAY
qualimap bamqc -bam {input.bam} \\
    --java-mem-size=48G \\
    -c -ip \\
    -gff {input.bed} \\
    -outdir {params.outdir} \\
    -outformat HTML \\
    -nt {threads} \\
    --skip-duplicated \\
    -nw 500 \\
    -p NON-STRAND-SPECIFIC
"""

SnakeMake QualiMap From line 208 of rules/qc.smk

shell: """
samtools flagstat {input.bam} > {output.txt}
"""

SnakeMake SAMtools From line 243 of rules/qc.smk

shell: """
vcftools --gzvcf {input.vcf} --het --out {params.prefix}
"""

SnakeMake VCFtools From line 270 of rules/qc.smk

shell: """
java -Xmx24g -jar ${{PICARDJARPATH}}/picard.jar \\
    CollectVariantCallingMetrics \\
    INPUT={input.vcf} \\
    OUTPUT={params.prefix} \\
    DBSNP={params.dbsnp} Validation_Stringency=SILENT
"""

SnakeMake Picard From line 297 of rules/qc.smk

shell: """
bcftools stats {input.vcf} > {output.txt}
"""

SnakeMake BCFtools From line 328 of rules/qc.smk

shell: """
gatk --java-options '-Xmx12g -XX:ParallelGCThreads={threads}' VariantEval \\
    -R {params.genome} \\
    -O {output.grp} \\
    --dbsnp {params.dbsnp} \\
    --eval {input.vcf} 
"""

SnakeMake gatk From line 359 of rules/qc.smk

shell: """
java -Xmx12g -jar $SNPEFF_JAR \\
    -v -canon -c {params.config} \\
    -csvstats {output.csv} \\
    -stats {output.html} \\
    {params.genome} \\
    {input.vcf} > {output.vcf}
"""

SnakeMake From line 392 of rules/qc.smk

shell: """
if [ ! -d "$(dirname {output.split_bam})" ]; then
  mkdir -p "$(dirname {output.split_bam})"
fi

samtools view \\
    -b \\
    -o {output.split_bam} \\
    -@ {threads} \\
    {input.bam} {wildcards.chroms}

samtools index \\
    -@ {threads} \\
    {output.split_bam} {output.split_bam_idx}

cp {output.split_bam_idx} {output.split_bam}.bai
"""

SnakeMake SAMtools From line 18 of rules/somatic_snps.common.smk

shell: """
input_str="--input $(echo "{input.read_orientation_file}" | sed -e 's/ / --input /g')"

gatk LearnReadOrientationModel \\
    --output {output.model} \\
    $input_str
"""

SnakeMake gatk From line 51 of rules/somatic_snps.common.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

statfiles="--stats $(echo "{input.statsfiles}" | sed -e 's/ / --stats /g')"

gatk MergeMutectStats \\
    $statfiles \\
    -O {output.final}.stats

gatk FilterMutectCalls \\
    -R {params.genome} \\
    -V {input.vcf} \\
    --ob-priors {input.model} \\
    --contamination-table {input.summary} \\
    -O {output.marked_vcf} \\
    --stats {output.final}.stats

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {output.marked_vcf} \\
    --exclude-filtered \\
    --output {output.final}

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools From line 84 of rules/somatic_snps.common.smk

shell: """
input_str="-I $(echo "{input.vcf}" | sed -e 's/ / -I /g')"

gatk --java-options "-Xmx30g" MergeVcfs \\
    -O "{output.vcf}" \\
    -D {params.genomedict} \\
    $input_str
"""

SnakeMake gatk From line 135 of rules/somatic_snps.common.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

if [ ! -d "$(dirname {output.mergedvcf})" ]; then
  mkdir -p "$(dirname {output.mergedvcf})"
fi

input_str="--variant $(echo "{input.vcf}" | sed -e 's/ / --variant /g')"

java -Xmx60g -Djava.io.tmpdir=${{tmp}} -jar $GATK_JAR -T CombineVariants \\
    -R {params.genome} \\
    -nt {threads} \\
    --filteredrecordsmergetype KEEP_IF_ANY_UNFILTERED \\
    --genotypemergeoption PRIORITIZE \\
    --rod_priority_list {params.rodprioritylist} \\
    --minimumN 1 \\
    -o {output.mergedvcf} \\
    {params.variantsargs}
"""

SnakeMake From line 162 of rules/somatic_snps.common.smk

shell: """

vcf2maf.pl \\
    --input-vcf {input.filtered_vcf} \\
    --output-maf {output.maf} \\
    --tumor-id {params.tumorsample} {params.normalsample} \\
    --vep-path /opt/vep/src/ensembl-vep \\
    --vep-data {params.bundle} \\
    --ncbi-build {params.build} \\
    --species {params.species} \\
    --vep-forks {threads} \\
    --ref-fasta {params.genome} \\
    --vep-overwrite

"""

SnakeMake From line 205 of rules/somatic_snps.common.smk

shell: """
echo "Combining MAFs..."
head -2 {input.mafs[0]} > {output.maf}
awk 'FNR>2 {{print}}' {input.mafs} >> {output.maf}
"""

SnakeMake From line 230 of rules/somatic_snps.common.smk

shell: """
if [ ! -d "$(dirname {output.vcf})" ]; 
    then mkdir -p "$(dirname {output.vcf})";
fi
gatk Mutect2 \\
    -R {params.genome} \\
    -I {input.tumor} \\
    -I {input.normal} \\
    -normal {params.normalsample} \\
    --panel-of-normals {params.pon} \\
    {params.germsource} \\
    -L {params.chrom} \\
    -O {output.vcf} \\
    --f1r2-tar-gz {output.read_orientation_file} \\
    --independent-mates
"""

SnakeMake gatk From line 24 of rules/somatic_snps.paired.smk

shell: """
# Run GetPileupSummaries in bg concurrently for a tumor/normal pair 
gatk --java-options '-Xmx48g' GetPileupSummaries \\
    -I {input.tumor} \\
    -V {params.germsource} \\
    -L {input.intervals} \\
    -O {output.tumor_summary} & \\
gatk --java-options '-Xmx48g' GetPileupSummaries \\
    -I {input.normal} \\
    -V {params.germsource} \\
    -L {input.intervals} \\
    -O {output.normal_summary} & \\
wait
"""

SnakeMake gatk From line 61 of rules/somatic_snps.paired.smk

shell: """
gatk CalculateContamination \\
    -I {input.tumor} \\
    --matched-normal {input.normal} \\
    -O {output.tumor_summary}
gatk CalculateContamination \\
    -I {input.normal} \\
    -O {output.normal_summary}
"""

SnakeMake gatk From line 94 of rules/somatic_snps.paired.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

workdir={params.basedir}
myoutdir="$(dirname {output.vcf})/{wildcards.samples}/{wildcards.chroms}"
if [ -d "$myoutdir" ]; then rm -r "$myoutdir"; fi
mkdir -p "$myoutdir"

configureStrelkaSomaticWorkflow.py \\
    --ref={params.genome} \\
    --tumor={input.tumor} \\
    --normal={input.normal} \\
    --runDir="$myoutdir" \\
    --exome
cd "$myoutdir"
./runWorkflow.py -m local -j {threads}

java -Xmx12g -Djava.io.tmpdir=${{tmp}} -XX:ParallelGCThreads={threads} \\
    -jar $GATK_JAR -T CombineVariants \\
    -R {params.genome} \\
    --variant results/variants/somatic.snvs.vcf.gz \\
    --variant results/variants/somatic.indels.vcf.gz \\
    --assumeIdenticalSamples \\
    --filteredrecordsmergetype KEEP_UNCONDITIONAL \\
    -o "$(basename {output.vcf})"

cd $workdir
mv "$myoutdir/$(basename {output.vcf})" "{output.vcf}"
"""

SnakeMake From line 131 of rules/somatic_snps.paired.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {input.vcf} \\
    --discordance {params.pon} \\
    --exclude-filtered \\
    --output {output.filtered}

echo -e "TUMOR\t{params.tumorsample}\nNORMAL\t{params.normalsample}" > "{output.samplesfile}"

echo "Reheading VCFs with sample names..."
bcftools reheader \\
    -o "{output.final}" \\
    -s "{output.samplesfile}" "{output.filtered}"

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort \\
    -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools From line 188 of rules/somatic_snps.paired.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

if [ ! -d "$(dirname {output.vcf})" ]; then mkdir -p "$(dirname {output.vcf})"; fi

java -Xmx8g -Djava.io.tmpdir=${{tmp}} -jar ${{MUTECT_JAR}} \\
    --analysis_type MuTect \\
    --reference_sequence {params.genome} \\
    --normal_panel {params.pon} \\
    --vcf {output.vcf} \\
    {params.dbsnp_cosmic} \\
    --disable_auto_index_creation_and_locking_when_reading_rods \\
    --input_file:normal {input.normal} \\
    --input_file:tumor {input.tumor} \\
    --out {output.stats} \\
    -rf BadCigar
"""

SnakeMake MuTect From line 239 of rules/somatic_snps.paired.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {input.vcf} \\
    --exclude-filtered \\
    --output {output.final}

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools From line 282 of rules/somatic_snps.paired.smk

shell: """
if [ ! -d "$(dirname {output.vcf})" ]; then mkdir -p "$(dirname {output.vcf})"; fi
VarDict \\
    -G {params.genome} \\
    -f 0.05 \\
    -N \"{params.tumorsample}|{params.normalsample}\" \\
    --nosv \\
    -b \"{input.tumor}|{input.normal}\" \\
    -t \\
    -Q 20 \\
    -c 1 \\
    -S 2 \\
    -E 3 {params.targets} \\
    | testsomatic.R \\
    | var2vcf_paired.pl \\
        -S \\
        -Q 20 \\
        -d 10 \\
        -M \\
        -N \"{params.tumorsample}|{params.normalsample}\" \\
        -f 0.05 > {output.vcf}
"""

SnakeMake From line 322 of rules/somatic_snps.paired.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

bcftools filter \\
    --exclude \'STATUS=\"Germline\" | STATUS=\"LikelyLOH\" | STATUS=\"AFDiff\"\' \\
    {input.vcf} > {output.filtered}

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {output.filtered} \\
    --discordance {params.pon} \\
    --exclude-filtered \\
    --output {output.final}

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools From line 369 of rules/somatic_snps.paired.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

if [ ! -d "$(dirname {output.vcf})" ]; then mkdir -p "$(dirname {output.vcf})"; fi

tumor_purity=$( echo "1-$(printf '%.6f' $(tail -n -1 {input.tumor_summary} | cut -f2 ))" | bc -l)
normal_purity=$( echo "1-$(printf '%.6f' $(tail -n -1 {input.normal_summary} | cut -f2 ))" | bc -l)
varscan_opts="--strand-filter 1 --min-var-freq 0.01 --min-avg-qual 30 --somatic-p-value 0.05 --output-vcf 1 --normal-purity $normal_purity --tumor-purity $tumor_purity"
dual_pileup="samtools mpileup -d 10000 -q 15 -Q 15 -f {params.genome} {input.normal} {input.tumor}"
varscan_cmd="varscan somatic <($dual_pileup) {output.vcf} $varscan_opts --mpileup 1"    
eval "$varscan_cmd"

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' {output.vcf}.snp \\
    | sed '/^$/d' > {output.vcf}.snp_temp
awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' {output.vcf}.indel \\
    | sed '/^$/d' > {output.vcf}.indel_temp

java -Xmx12g -Djava.io.tmpdir=${{tmp}} -XX:ParallelGCThreads={threads} \\
    -jar $GATK_JAR -T CombineVariants \\
    -R {params.genome} \\
    --variant {output.vcf}.snp_temp \\
    --variant {output.vcf}.indel_temp \\
    --assumeIdenticalSamples \\
    --filteredrecordsmergetype KEEP_UNCONDITIONAL \\
    -o {output.vcf}     
"""

SnakeMake SAMtools VarScan From line 419 of rules/somatic_snps.paired.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

varscan filter \\
    {input.vcf} \\
    {params.filter_settings} > {output.filtered1}

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {output.filtered1} \\
    --discordance {params.pon} \\
    --exclude-filtered \\
    --output {output.filtered}

samplesFile="{output.samplesfile}"
echo -e "TUMOR\t{params.tumorsample}\nNORMAL\t{params.normalsample}" > "{output.samplesfile}"

bcftools reheader \\
    -o "{output.final}" \\
    -s "{output.samplesfile}" \\
    "{output.filtered}"

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools VarScan From line 481 of rules/somatic_snps.paired.smk

shell: """
if [ ! -d "$(dirname {output.vcf})" ]; then
    mkdir -p "$(dirname {output.vcf})"
fi

gatk Mutect2 \\
    -R {params.genome} \\
    -I {input.tumor} \\
    --panel-of-normals {params.pon} \\
    {params.germsource} \\
    -L {wildcards.chroms} \\
    -O {output.vcf} \\
    --f1r2-tar-gz {output.read_orientation_file} \\
    --independent-mates
"""

SnakeMake gatk From line 21 of rules/somatic_snps.tumor_only.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

gatk --java-options "-Xmx10g -Djava.io.tmpdir=${{tmp}}" GetPileupSummaries \\
    -R {params.genome} \\
    -I {input.tumor} \\
    -V {params.germsource} \\
    -L {input.intervals} \\
    -O {output.pileup}
"""

SnakeMake gatk From line 56 of rules/somatic_snps.tumor_only.smk

shell: """
gatk CalculateContamination \\
    -I {input.pileup} \\
    -O {output.tumor_summary}
"""

SnakeMake gatk From line 86 of rules/somatic_snps.tumor_only.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

if [ ! -d "$(dirname {output.vcf})" ]; then 
    mkdir -p "$(dirname {output.vcf})"
fi

java -Xmx8g -Djava.io.tmpdir=${{tmp}} -jar ${{MUTECT_JAR}} \\
    --analysis_type MuTect \\
    --reference_sequence {params.genome} \\
    --normal_panel {params.pon} \\
    --vcf {output.vcf} \\
    {params.dbsnp_cosmic} \\
    -L {wildcards.chroms} \\
    --disable_auto_index_creation_and_locking_when_reading_rods \\
    --input_file:tumor {input.tumor} \\
    --out {output.stats} \\
    -rf BadCigar
"""

SnakeMake MuTect From line 110 of rules/somatic_snps.tumor_only.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {input.vcf} \\
    --exclude-filtered \\
    --output {output.final}

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools From line 153 of rules/somatic_snps.tumor_only.smk

shell: """
if [ ! -d "$(dirname {output.vcf})" ]; then 
    mkdir -p "$(dirname {output.vcf})"
fi

VarDict \\
    -G {params.genome} \\
    -f 0.05 \\
    -x 500 \\
    --nosv \\
    -b {input.tumor} \\
    -t \\
    -Q 20 \\
    -c 1 \\
    -S 2 \\
    -E 3 {params.targets} \\
    | teststrandbias.R \\
    | var2vcf_valid.pl \\
        -N {wildcards.samples} \\
        -Q 20 \\
        -d 10 \\
        -v 6 \\
        -S \\
        -E \\
        -f 0.05 > {output.vcf}
"""

SnakeMake From line 191 of rules/somatic_snps.tumor_only.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {input.vcf} \\
    --discordance {params.pon} \\
    --exclude-filtered \\
    --output {output.final}

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools From line 240 of rules/somatic_snps.tumor_only.smk

shell: """
if [ ! -d "$(dirname {output.vcf})" ]; then
    mkdir -p "$(dirname {output.vcf})"
fi

varscan_opts="--strand-filter 0 --min-var-freq 0.01 --output-vcf 1 --variants 1"
pileup_cmd="samtools mpileup -d 100000 -q 15 -Q 15 -f {params.genome} {input.tumor}"
varscan_cmd="varscan mpileup2cns <($pileup_cmd) $varscan_opts"
eval "$varscan_cmd > {output.vcf}.gz"
eval "bcftools view -U {output.vcf}.gz > {output.vcf}"
"""

SnakeMake SAMtools BCFtools VarScan From line 279 of rules/somatic_snps.tumor_only.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

varscan filter \\
    {input.vcf} \\
    {params.filter_settings} > {output.filtered1}

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {output.filtered1} \\
    --discordance {params.pon} \\
    --exclude-filtered \\
    --output {output.filtered}

samplesFile="{output.samplesfile}"
echo -e "TUMOR\t{params.tumorsample}\n" > "{output.samplesfile}"

bcftools reheader \\
    -o "{output.final}" \\
    -s "{output.samplesfile}" \\
    "{output.filtered}"

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools VarScan From line 319 of rules/somatic_snps.tumor_only.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
{params.set_tmp}

mkdir -p fastqs
gatk SamToFastq \\
    --INPUT {input.bam} \\
    --FASTQ {output.r1} \\
    --SECOND_END_FASTQ {output.r2} \\
    --UNPAIRED_FASTQ {output.orphans} \\
    --TMP_DIR ${{tmp}} \\
    -R {params.genome}
"""

SnakeMake gatk From line 30 of rules/trim_map_preprocess.smk

shell: """
myoutdir="$(dirname {output.one})"
if [ ! -d "$myoutdir" ]; then mkdir -p "$myoutdir"; fi
trimmomatic PE \\
    -threads {threads} \\
    -phred33 \\
    {input.r1} {input.r2} \\
    {output.one} {output.two} \\
    {output.three} {output.four} \\
    ILLUMINACLIP:{params.adapterfile}:3:30:10 \\
    LEADING:10 \\
    TRAILING:10 \\
    SLIDINGWINDOW:4:20 \\
    MINLEN:20      
"""

SnakeMake Trimmomatic From line 74 of rules/trim_map_preprocess.smk

shell: """
myoutdir="$(dirname {output})"
if [ ! -d "$myoutdir" ]; then mkdir -p "$myoutdir"; fi
bwa mem -M \\
    -R \'@RG\\tID:{params.sample}\\tSM:{params.sample}\\tPL:illumina\\tLB:{params.sample}\\tPU:{params.sample}\\tCN:hgsc\\tDS:wes\' \\
    -t {threads} \\
    {params.genome} \\
    {input} | \\
samblaster -M | \\
samtools sort -@12 -m 4G - -o {output}
"""

SnakeMake SAMtools BWA SAMBLASTER From line 117 of rules/trim_map_preprocess.smk

shell: """
samtools index -@ 2 {input.bam} {output.bai}
"""

SnakeMake SAMtools From line 152 of rules/trim_map_preprocess.smk

shell: """
gatk --java-options '-Xmx48g' BaseRecalibrator \\
    --input {input.bam} \\
    --reference {params.genome} \\
    {params.knowns} \\
    --output {output.re} \\
    --intervals {params.intervals}

gatk --java-options '-Xmx48g' ApplyBQSR \\
    --reference {params.genome} \\
    --input {input.bam} \\
    --bqsr-recal-file {output.re} \\
    --output {output.bam} \\
    --use-jdk-inflater \\
    --use-jdk-deflater
"""

SnakeMake gatk From line 188 of rules/trim_map_preprocess.smk

shell: """
sample={wildcards.samples}
ID=$sample
PL="ILLUMINA"  # exposed as a config param
LB="na"        # exposed as a config param 

# Check if there is no header or any of the info
HEADER=`samtools view -H {input.bam} | grep ^@RG`
if [[ "$HEADER" != "" ]]; then
    t=(${{HEADER//\t/ }})
    echo ${{t[1]}}
    ID=`printf '%s\n' "${{t[@]}}" | grep -P '^ID' | cut -d":" -f2` #(${{t[1]//:/ }})
    PL=`printf '%s\n' "${{t[@]}}" | grep -P '^PL' | cut -d":" -f2` #(${{t[3]//:/ }})
    LB=`printf '%s\n' "${{t[@]}}" | grep -P '^LB' | cut -d":" -f2` #(${{t[2]//:/ }})
    if [[ "$ID" == "$sample" ]]; then
        echo "The header of the BAM file is correct"
    else
        ID=$sample
    fi
fi

gatk AddOrReplaceReadGroups \\
    --INPUT {input.bam} \\
    --OUTPUT {output.bam} \\
    --RGID ${{ID}} \\
    --RGLB ${{LB}} \\
    --RGPL ${{PL}} \\
    --RGSM ${{ID}} \\
    --RGPU na

samtools index -@ 2 {output.bam} {output.bai}
cp {output.bai} {output.bai2}
"""