Creating a synthetic tumor/normal sample pair from CHM1/CHM13 haploid cell lines.

public public 1yr ago Version: v1.0.0 0 bookmarks

This workflow generates a synthetic tumor/normal dataset based on the syndip benchmark .

Authors

  • Johannes Köster (@johanneskoester)

Usage

In any case, if you use this workflow in a paper, don't forget to give credits to the authors (and the authors of the syndip benchmark) by citing the URL of this (original) repository.

Perform a dry-run via

snakemake --use-conda -n

Execute the workflow locally via

snakemake --use-conda --cores $N

using $N cores or run it in a cluster environment via

snakemake --use-conda --cluster qsub --jobs 100

or

snakemake --use-conda --drmaa --jobs 100

See the Snakemake documentation for further details.

Code Snippets

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from cyvcf2 import VCF, Writer

vcf_in = VCF(snakemake.input[0])
vcf_in.add_info_to_header({"ID": "SOMATIC", "Number": 0, "Type": "Flag", "Description": "Somatic variant"})
vcf_in.add_info_to_header({"ID": "TAF", "Number": 1, "Type": "Float", "Description": "Tumor allele frequency"})
vcf_in.add_info_to_header({"ID": "NAF", "Number": 1, "Type": "Float", "Description": "Normal allele frequency"})

vcf_out = Writer(snakemake.output[0], vcf_in)

vaf = float(snakemake.wildcards.percentage) / 100

for rec in vcf_in:
    if rec.genotypes == [[1, 0, True]]:
        # if in CHM1 only, consider as somatic
        rec.INFO["SOMATIC"] = True
        rec.INFO["TAF"] = vaf
        rec.INFO["NAF"] = 0.0
    elif rec.genotypes == [[1, 1, True]] or rec.genotypes == [[0, 1, True]]:
        rec.INFO["NAF"] = 1.0
        rec.INFO["TAF"] = 0.0
    else:
        continue
    vcf_out.write_record(rec)

vcf_out.close()
vcf_in.close()
27
28
shell:
    "perl -U `which fasterq-dump` -O srr {wildcards.accession}"
34
35
36
shell:
    "curl -L https://github.com/lh3/CHM-eval/releases/download/v0.4/CHM-evalkit-20180221.tar \
| tar xf -"
46
47
shell:
    "bedtools intersect -header -a {input}/full.38.vcf.gz -b {input}/full.38.bed.gz > {output}"
54
55
shell:
    "gzip -d -c {input}/full.38.bed.gz > {output}"
65
66
script:
    "scripts/compose-truth.py"
84
85
86
87
88
shell:
    "seqtk sample -s{params.seed} {input.normal[0]} {params.ns[1]} > {output[0]}; "
    "seqtk sample -s{params.seed} {input.normal[1]} {params.ns[1]} > {output[1]}; "
    "seqtk sample -s{params.seed} {input.tumor[0]} {params.ns[0]} >> {output[0]}; "
    "seqtk sample -s{params.seed} {input.tumor[1]} {params.ns[0]} >> {output[1]}; "
97
98
99
shell:
    "cp {input[0]} {output[0]}; "
    "cp {input[1]} {output[1]}"
107
108
shell:
    "sed 's/chr//g' {input} > {output}"
ShowHide 6 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/koesterlab/chm-synthetic-tumor-normal
Name: chm-synthetic-tumor-normal
Version: v1.0.0
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...