Variant Allele Frequency Lookup from gnomAD using RocksDB

public public 1yr ago 0 bookmarks

Fast look up interface allel frequency of variants from gnomad with rocksdb.

Installation

conda install -c conda-forge python-rocksdb
pip install git+https://github.com/MuhammedHasan/gnomad_rocksdb.git

Download database

Download rocksdb for gnomad

gnomad_rocksdb_download --version {version} --db_path {output_path}

Supported version (2.1.1, 3.1.2)

Usage

from gnomad_rocksdb import GnomadMafDB
db = GnomadMafDB(db_path)
db.get('17:1000:A>C')
# 0.001
db.get('chr17:1000:A>C')
# 0.001
db['17:1000:A>C']
# 0.001
'17:1000:A>C' in db
# True

Create Database

pip install tqdm kipoiseq snakemake cython cyvcf2
# modify workflow/config.yaml
python -m snakemake -j 1

Code Snippets

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import tarfile
import rocksdb


db = rocksdb.DB(snakemake.input['db'], rocksdb.Options())
backup = rocksdb.BackupEngine(snakemake.output['backup'])
backup.create_backup(db, flush_before_backup=True)

tar = tarfile.open(snakemake.output['backup_gzip'], "w:gz")
tar.add(snakemake.output['backup'], arcname="backup")
tar.close()
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
from tqdm import tqdm
import rocksdb
from kipoiseq.extractors import MultiSampleVCF


batch_size = snakemake.params['batch_size']
db = rocksdb.DB(snakemake.output['db'],
                rocksdb.Options(create_if_missing=True))

for vcf in snakemake.input['vcfs']:
    print('Processing %s' % vcf)
    vcf = MultiSampleVCF(vcf)

    for batch in tqdm(vcf.batch_iter(batch_size=batch_size)):
        batch_writer = rocksdb.WriteBatch()

        for v in batch:
            if v.info['AC'] > 0:
                af = v.info['AF']
                batch_writer.put(bytes(str(v).replace('chr', ''), 'utf-8'),
                                 bytes(str(af), 'utf-8'))
        db.write(batch_writer)
31
32
33
run:
    shell("wget {params.url_vcf} -O {output.vcf}")
    shell("wget {params.url_tbi} -O {output.tbi}")
53
54
script:
    "./create_rocksdb.py"
63
64
script:
    "./backup_rocksdb.py"
ShowHide 3 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/MuhammedHasan/gnomad_rocksdb
Name: gnomad_rocksdb
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...