Haplotype Phasing for large-scale genotype datasets

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

This pipeline conducts internal population phasing for assorted datasets using pre-existing software.

Authors

Arjun Biddanda (@aabiddanda)

This was largely built while @aabiddanda was employed by 54Gene, but has been

Code Snippets

shell:
    "tabix -f {input}"

SnakeMake tabix From line 12 of rules/common.smk

shell:
    "bcftools index -f {input.bcf}"

SnakeMake BCFtools From line 24 of rules/common.smk

shell:
    "for i in {input.vcfs}; do tabix -l $i; done > {output}"

SnakeMake tabix From line 39 of rules/common.smk

shell:
    "bcftools query -l {input.vcf} > {output}"

SnakeMake BCFtools From line 52 of rules/common.smk

shell:
    "bcftools view -r {wildcards.chrom} --threads {threads} {input} -Oz -o {output}"

SnakeMake BCFtools From line 86 of rules/common.smk

shell:
    "bcftools concat -a -D -Ou {input.vcfs} | bcftools sort -Oz -o {output}"

SnakeMake BCFtools From line 105 of rules/common.smk

shell:
    "bcftools view -r {wildcards.chrom} {input.unphased_vcf} --threads {threads} -Ob -o {output.bcf}"

SnakeMake BCFtools From line 119 of rules/common.smk

run:
    if analysis_configs[wildcards.outfix]["reference_panel"] == "":
        shell("touch {output}")
    else:
        ref_panel_manifest = pd.read_csv(
            analysis_configs[wildcards.outfix]["reference_panel"], sep="\t"
        )
        for c in ref_panel_manifest.chroms:
            assert c in CHROM
        assert (
            ref_panel_manifest.chroms.size
            == np.unique(ref_panel_manifest.chroms.values).size
        )
        index_files = [
            pc.determine_index(r) for r in ref_panel_manifest.ref_panel.values
        ]
        ref_panel_manifest["file_index"] = index_files
        ref_panel_manifest.to_csv(str(output), sep="\t", index=False)

SnakeMake From line 127 of rules/common.smk

run:
    if analysis_configs[wildcards.outfix]["recombination_maps"] == "":
        raise ValueError("Cannot convert a recombination map if none are provided!")
    else:
        recomb_map_manifest = pd.read_csv(
            analysis_configs[wildcards.outfix]["recombination_maps"],
            sep="\t",
            dtype=str,
        )
        for c in recomb_map_manifest.chroms:
            assert c in CHROM
        assert (
            recomb_map_manifest.chroms.size
            == np.unique(recomb_map_manifest.chroms.values).size
        )
        assert wildcards.chrom in recomb_map_manifest.chroms.values
        filename = recomb_map_manifest[
            recomb_map_manifest.chroms == wildcards.chrom
        ].recombination_map.values[0]
        transformed_df = pc.convert_hapmap_genmap(filename, wildcards.algo)
        transformed_df.to_csv(str(output), index=False, sep="\t")

SnakeMake From line 172 of rules/common.smk

shell:
    """
    cp ${{CONDA_PREFIX}}/share/eagle/tables/genetic_map_{params.hg_notation}_withX.txt.gz {output}
    """

SnakeMake From line 13 of rules/eagle.smk

shell:
    """
    eagle --chrom={wildcards.chrom}\
        --Kpbwt={params.kpbwt}\
        --pbwtIters={params.pbwt_iters}\
        --histFactor={params.hist_factor}\
        --genoErrProb={params.geno_err_prob}\
        --expectIBDcM={params.expect_ibd}\
        --numThreads={threads}\
        {params.imp_missing}\
        {params.vcf_target}\
        {params.ref_panel}\
        --geneticMapFile={input.genetic_map}\
        --outPrefix={params.outprefix} 2>&1 | tee {log}
    """

SnakeMake Eagle From line 76 of rules/eagle.smk

run:
    sample_ids = [x.rstrip() for x in open(input.sample_list).readlines()]
    fam_file_validator = FamFileValidator(params.fam)
    res = fam_file_validator.validate_fam(sample_ids)
    if res:
        fam_file_validator.fam_df.to_csv(
            output[0], sep=" ", index=False, header=False
        )  # noqa
    else:
        shell("touch {output}")

SnakeMake From line 14 of rules/evaluation.smk

shell:
    "switchError --gen {input.gen} --hap {input.hap} --reg {wildcards.chrom} --fam {input.fam_file} --maf {params.maf} --out {params.outprefix} 2>&1 | tee {log}"

SnakeMake From line 49 of rules/evaluation.smk

shell:
    """
    mv {input} resources/shapeit4.tar.gz
    cd resources/
    tar -zxvf shapeit4.tar.gz shapeit4-4.2.2/maps/
    cd shapeit4-4.2.2/maps/
    tar -xvf genetic_maps.{params.build}.tar.gz
    """

SnakeMake From line 25 of rules/shapeit4.smk

shell:
    """
    shapeit4 --input {input.unphased_bcf}\
        --map {input.genetic_map}\
        --seed {params.seed}\
        --region {wildcards.chrom}\
        --mcmc-iterations {params.mcmc_iterations}\
        {params.sequencing}\
        {params.ref_panel}\
        --thread {threads}\
        --output {output.phased_vcf} 2>&1 | tee {log}
    """