HIV Drug Resistance Profiling Pipeline using Bowtie2, Lofreq, and SierraPy

public 10mo ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Introduction

This is a pipeline for drug-resistance profiling from HIV whole genome NGS data. It takes in a HIV reference genome (currently HXB2) and a dataset of reads in (gzipped) fastq format, and performs the following steps:

aligns the reads to the reference,
performs variant calling, and
queries the HIVDB system for the presence and degree of drug resistance (requires internet connection).

Currently, the pipeline uses Bowtie2 for Step1, Lofreq for Step 2, and sierrapy for Step 3.

This code is under development. We hope to test and add more options.

Installation

This pipeline requires the package manager Conda and the workflow management system Snakemake . All other dependencies are handled automatically by Snakemake.

Install Conda

Download Miniconda3 installer for Linux from here , or for macOS from here Installation instructions are here . Once installation is complete, you can test your Miniconda installation by running:

$ conda list

Install Snakemake

Snakemake recommends installation via Conda:

$ conda install -c conda-forge mamba
$ mamba create -c conda-forge -c bioconda -n snakemake snakemake

This creates an isolated enviroment containing the latest Snakemake. To activate it:

$ conda activate snakemake

To test snakemake installation

$ snakemake --help

Download the pipeline

Clone this pipeline by clicking the Clone button on the top-right of this page, or download it by clicking the ellipsis next to the Clone button.

Quickstart Guide

Let's try running the pipeline on sample data provided in the test_data folder. With the snakemake conda environment activated, and from the top-level directory (i.e. the one that contains this readme file), run:

snakemake --use-conda --configfile config/config.sample.yaml -np

to do a dry-run. If snakemake does not complain and everything seems ok, then run:

snakemake --use-conda --configfile config/config.sample.yaml --cores all

The results can be found inside the newly created directory called test_result . Interpretation of drug-resistance as provided by sierrapy can be found inside the drug_resistance_report folder. Intermediate files such as the read-to-reference alignments and variant calls can be found in their respective folders.

Running the pipeline on your own data

To the run the pipeline on your own data, you need to specify in a config file (in YAML format) the paths to the input data (reads and reference), path to the output directory. Optionally, in this config file, you can also set parameters for the various tools that make up this pipeline. You can use config/config.template.yaml as a template. Once the configfile is ready, run the pipeline like above:

snakemake --use-conda --configfile /path/to/configfile -np

for a dry run, and

snakemake --use-conda --configfile /path/to/configfile --cores all

for the actual run.

Contact

This is an ongoing work. If you have questions, concerns, issues, or suggestions, please contact: Anish Shrestha, Bioinformatics Lab, De La Salle University Manila at [email protected] .

Code Snippets

shell:
    """
    samtools faidx {input.reference}
    samtools view -bt {params.last_index_basename} {input.sam} > {output}
    """

SnakeMake SAMtools From line 13 of rules/bam.smk

shell:
    "samtools sort -o {output} {input}"

SnakeMake SAMtools From line 26 of rules/bam.smk

shell:
    "samtools index {input}"

SnakeMake SAMtools From line 36 of rules/bam.smk

shell:
   "bowtie2-build {input.reference} {params.index_basename}"

SnakeMake Bowtie 2 From line 29 of rules/bowtie2.smk

shell:
    "bowtie2 --local --{params.preset} -x {params.index_basename}  -1 {input.sub1} -2 {input.sub2} | samtools view -bS - > {output}"

SnakeMake SAMtools Bowtie 2 From line 47 of rules/bowtie2.smk

shell:
    """
    seqtk trimfq {input.reads1} > {output.trim1}
    seqtk trimfq {input.reads2} > {output.trim2}
    """

SnakeMake seqtk From line 12 of rules/filter_reads.smk

shell:
    """
    seqtk sample -s {params.seed} {input.trim1} {params.n} > {output.sub1}
    seqtk sample -s {params.seed} {input.trim2} {params.n} > {output.sub2}
    """

SnakeMake seqtk From line 32 of rules/filter_reads.smk

shell:
    "python workflow/scripts/vcfToHivdb.py --vcfFile {input} --outFile {output}"

SnakeMake From line 8 of rules/hivdb_query.smk

shell:
    "python workflow/scripts/tabulateJson.py --jsonFile {input} --outFile {output}"

SnakeMake From line 18 of rules/hivdb_query.smk

shell:
   "lastdb {params.index_basename} {input.reference}"

SnakeMake From line 75 of rules/last.smk

shell:
    """
    set +o pipefail;
    gzip -cdf {input.query} | head -n 400000 > {params.last_sample1}
    last-train --verbose --revsym --matsym --gapsym --sample-number=400000 -Q1  {params.last_index_basename} {params.last_sample1} > {output}
    """

SnakeMake From line 90 of rules/last.smk

shell:
   "picard CreateSequenceDictionary R={input.reference} O={output}"

SnakeMake Picard From line 126 of rules/last.smk

shell:
    """
    fastq-interleave {input.reads1} {input.reads2} |
    lastal -Q1 -i1 -p {input.scoring} {params.last_index_basename} |
    last-pair-probs -f {params.mean} -s {params.std} -m 0.01 -d 0.1 |
    maf-convert sam -f {input.dict} > {output}
    """

SnakeMake From line 146 of rules/last.smk

shell:
    """
    lofreq faidx {input.reference}
    lofreq call-parallel --pp-threads {threads} -f {input.reference} -o {output} {input.bam}
    """

SnakeMake lofreq From line 12 of rules/variant_calling.smk

from tabulate import tabulate
import json

if __name__ == '__main__':

    import argparse
    parser = argparse.ArgumentParser(description='')
    parser.add_argument('--jsonFile', required=True)
    parser.add_argument('--outFile', required=True)
    args = parser.parse_args()

    data = json.load(open(args.jsonFile))
    vResults = data['validationResults']
    drugResistance = data['drugResistance']

    dbResults = ""

    # Printing the validation results
    # print("Validation Results")
    for v in vResults:
        dbResults += "\n" + v['level'] + ": " + v['message'] + "\n"

    # Printing drug resistance results
    for d in drugResistance:
        dbResults += "\n\nDrug resistance interpretation: " + d['gene']['name']
        dbResults += "\nVersion: " + d['version']['text'] + "("  + d['version']['publishDate'] + ")"

        drugClass = []
        drug = []
        score = []
        partialScores = []
        text = []
        ctr = 0

        # Store the relevant information into a new dictionary
        for d2 in d['drugScores']:
            drugClass.append(d2['drugClass']['name'])
            drug.append(d2['drug']['displayAbbr'])
            score.append(d2['score'])
            partialScores.append(d2['partialScores'])
            text.append(d2['text'])
            ctr+=1

        results = {
            'drugClass' : drugClass,
            'drug' : drug,
            'score' : score,
            'text' : text
        }

        # Display Drug Resistance Results Table
        #display(pd.DataFrame(results))
        dbResults += "\n"
        dbResults += tabulate(results, headers = 'keys', tablefmt = 'psql')
        dbResults += "\n"

        # Printing partial scores, mutations, and comments
        for p in partialScores:
            if(p):
                mutations = p[0]['mutations']

                for m in mutations:
                    dbResults += "\n"
                    dbResults += "Mutation: " + m['text'] + "\n"
                    dbResults += "Type: " + m['primaryType'] + "\n"
                    dbResults += "Comments:\n"

                    comments = m['comments']
                    for c in comments:
                        dbResults += c['text'] + "\n"

    #print(dbResults)

    f = open(args.outFile, 'w')
    f.write(dbResults)
    f.close()
    #print (output)

Python JSON tabulate From line 1 of scripts/tabulateJson.py

import vcf
#from pysam import VariantFile
import subprocess

#coordinates of the 3 genes in 1-based coordinates (because VCF is 1-based)
#protease
pStart = 2253
pEnd = 2549
#reverse-transcriptase
rtStart = 2550
rtEnd = 4229
#integrase
iStart = 4230
iEnd = 5093

#HIVDB protein sequences:
protease_hivdb="PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF"

rt_hivdb = "PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKI"\
     "GPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGL"\
     "KKKKSVTVLDVGDAYFSVPLDKDFRKYTAFTIPSINNETPGIRYQYNVLP"\
     "QGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRT"\
     "KIEELRQHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKD"\
     "SWTVNDIQKLVGKLNWASQIYAGIKVKQLCKLLRGTKALTEVIPLTEEAE"\
     "LELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLK"\
     "TGKYARMRGAHTNDVKQLTEAVQKIATESIVIWGKTPKFKLPIQKETWEA"\
     "WWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRET"\
     "KLGKAGYVTDRGRQKVVSLTDTTNQKTELQAIHLALQDSGLEVNIVTDSQ"\
     "YALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDK"\
     "LVSAGIRKVL"

integrase_hivdb = "FLDGIDKAQEEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAM"\
            "HGQVDCSPGIWQLDCTHLEGKIILVAVHVASGYIEAEVIPAETGQETAYF"\
            "LLKLAGRWPVKTIHTDNGSNFTSTTVKAACWWAGIKQEFGIPYNPQSQGV"\
            "VESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERI"\
            "VDIIATDIQTKELQKQITKIQNFRVYYRDSRDPLWKGPAKLLWKGEGAVV"\
            "IQDNSDIKVVPRRKAKIIRDYGKQMAGDDCVASRQDED"

# #HXB2 translated protein sequences (translated based on gene coordinates above)
#protease_hxb2="PQVTLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF"
#rt_hxb2= "PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKI"\
#          "GPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGL"\
#          "KKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLP"\
#          "QGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRT"\
#          "KIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKD"\
#          "SWTVNDIQKLVGKLNWASQIYPGIKVRQLCKLLRGTKALTEVIPLTEEAE"\
#          "LELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLK"\
#          "TGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWET"\
#          "WWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRET"\
#          "KLGKAGYVTNRGRQKVVTLTDTTNQKTELQAIYLALQDSGLEVNIVTDSQ"\
#          "YALGIIQAQPDQSESELVNQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDK"\
#          "LVSAGIRKVL"
#integrase_hxb2 = "FLDGIDKAQDEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAM"\
#                  "HGQVDCSPGIWQLDCTHLEGKVILVAVHVASGYIEAEVIPAETGQETAYF"\
#                  "LLKLAGRWPVKTIHTDNGSNFTGATVRAACWWAGIKQEFGIPYNPQSQGV"\
#                  "VESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERI"\
#                  "VDIIATDIQTKELQKQITKIQNFRVYYRDSRNPLWKGPAKLLWKGEGAVV"\
#                  "IQDNSDIKVVPRRKAKIIRDYGKQMAGDDCVASRQDED"

protease_hxb2 = "CCTCAGGTCACTCTTTGGCAACGACCCCTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAGTTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTCATAGAAATCTGTGGACATAAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAATTTT"
rt_hxb2 = "CCCATTAGCCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAATTTGTACAGAGATGGAAAAGGAAGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAAGACAGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGTTCAATTAGGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGATGTGGGTGATGCATATTTTTCAGTTCCCTTAGATGAAGACTTCAGGAAGTATACTGCATTTACCATACCTAGTATAAACAATGAGACACCAGGGATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGACAAAAATCTTAGAGCCTTTTAGAAAACAAAATCCAGACATAGTTATCTATCAATACATGGATGATTTGTATGTAGGATCTGACTTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAGCTGAGACAACATCTGTTGAGGTGGGGACTTACCACACCAGACAAAAAACATCAGAAAGAACCTCCATTCCTTTGGATGGGTTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATAGTGCTGCCAGAAAAAGACAGCTGGACTGTCAATGACATACAGAAGTTAGTGGGGAAATTGAATTGGGCAAGTCAGATTTACCCAGGGATTAAAGTAAGGCAATTATGTAAACTCCTTAGAGGAACCAAAGCACTAACAGAAGTAATACCACTAACAGAAGAAGCAGAGCTAGAACTGGCAGAAAACAGAGAGATTCTAAAAGAACCAGTACATGGAGTGTATTATGACCCATCAAAAGACTTAATAGCAGAAATACAGAAGCAGGGGCAAGGCCAATGGACATATCAAATTTATCAAGAGCCATTTAAAAATCTGAAAACAGGAAAATATGCAAGAATGAGGGGTGCCCACACTAATGATGTAAAACAATTAACAGAGGCAGTGCAAAAAATAACCACAGAAAGCATAGTAATATGGGGAAAGACTCCTAAATTTAAACTGCCCATACAAAAGGAAACATGGGAAACATGGTGGACAGAGTATTGGCAAGCCACCTGGATTCCTGAGTGGGAGTTTGTTAATACCCCTCCCTTAGTGAAATTATGGTACCAGTTAGAGAAAGAACCCATAGTAGGAGCAGAAACCTTCTATGTAGATGGGGCAGCTAACAGGGAGACTAAATTAGGAAAAGCAGGATATGTTACTAATAGAGGAAGACAAAAAGTTGTCACCCTAACTGACACAACAAATCAGAAGACTGAGTTACAAGCAATTTATCTAGCTTTGCAGGATTCGGGATTAGAAGTAAACATAGTAACAGACTCACAATATGCATTAGGAATCATTCAAGCACAACCAGATCAAAGTGAATCAGAGTTAGTCAATCAAATAATAGAGCAGTTAATAAAAAAGGAAAAGGTCTATCTGGCATGGGTACCAGCACACAAAGGAATTGGAGGAAATGAACAAGTAGATAAATTAGTCAGTGCTGGAATCAGGAAAGTACTA"
integrase_hxb2 = "TTTTTAGATGGAATAGATAAGGCCCAAGATGAACATGAGAAATATCACAGTAATTGGAGAGCAATGGCTAGTGATTTTAACCTGCCACCTGTAGTAGCAAAAGAAATAGTAGCCAGCTGTGATAAATGTCAGCTAAAAGGAGAAGCCATGCATGGACAAGTAGACTGTAGTCCAGGAATATGGCAACTAGATTGTACACATTTAGAAGGAAAAGTTATCCTGGTAGCAGTTCATGTAGCCAGTGGATATATAGAAGCAGAAGTTATTCCAGCAGAAACAGGGCAGGAAACAGCATATTTTCTTTTAAAATTAGCAGGAAGATGGCCAGTAAAAACAATACATACTGACAATGGCAGCAATTTCACCGGTGCTACGGTTAGGGCCGCCTGTTGGTGGGCGGGAATCAAGCAGGAATTTGGAATTCCCTACAATCCCCAAAGTCAAGGAGTAGTAGAATCTATGAATAAAGAATTAAAGAAAATTATAGGACAGGTAAGAGATCAGGCTGAACATCTTAAGACAGCAGTACAAATGGCAGTATTCATCCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAAAGAATAGTAGACATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTACAAAAATTCAAAATTTTCGGGTTTATTACAGGGACAGCAGAAATCCACTTTGGAAAGGACCAGCAAAGCTCCTCTGGAAAGGTGAAGGGGCAGTAGTAATACAAGATAATAGTGACATAAAAGTAGTGCCAAGAAGAAAAGCAAAGATCATTAGGGATTATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAAGTAGACAGGATGAGGAT"

geneticCode ={
#Phenylalanine
"TTT": "F", "TTC": "F",
#Leucine
"TTA": "L", "TTG": "L", "CTT": "L", "CTC": "L", "CTA": "L", "CTG": "L",
#Isoleucine
"ATT": "I", "ATC": "I","ATA": "I",
#Methioinine
"ATG": "M",
#Valine
"GTT": "V", "GTC": "V", "GTA": "V", "GTG": "V",
#Serine
"TCT": "S", "TCC": "S", "TCA": "S","TCG": "S","AGT": "S", "AGC": "S",
#Proline
"CCT": "P", "CCC": "P", "CCA": "P","CCG": "P",
#Threonine
"ACT": "T","ACC": "T","ACA": "T","ACG": "T",
#Alanine
"GCT": "A","GCC": "A","GCA": "A","GCG": "A",
#Tyrosine
"TAT": "Y","TAC": "Y",
#Histidine
"CAT": "H","CAC": "H",
#Glutamine
"CAA": "Q","CAG": "Q",
#Asparagine
"AAT": "N","AAC": "N",
#Lysine
"AAA": "K","AAG": "K",
#Aspartic acid
"GAT": "D","GAC": "D",
#Glutamic acid
"GAA": "E","GAG": "E",
#Cysteine
"TGT": "C","TGC": "C",
#Tryptophan
"TGG": "W",
#Arginine
"CGT": "R","CGC": "R","CGA": "R","CGG": "R","AGA": "R","AGG": "R",
#Glycine
"GGT": "G","GGC": "G","GGA": "G","GGG": "G",
#stop
"TAA": "*","TAG": "*","TGA": "*"
}

def getCombinations(base1,  base2, base3,reference):
# returns all possible combinations

    mutations = []
    #print("Bases", base1, base2, base3)
    for i in range (len(base1)):
        for j in range (len(base2)):
            for k in range (len(base3)):
                codonStr = str(base1[i])+str(base2[j])+str(base3[k])
                #print("CODON:",codonStr)
                trans = geneticCode.get(codonStr,"!")
                #print("TRANS:",trans, reference)
                if trans!="!" and trans!=reference:
                    #add as a mutation only if it differs from the reference
                    if trans not in mutations:
                        #print("ALT codon",codonStr)
                        mutations.append(trans)
    return mutations


def getMutations_mult(mut,refG):
    #returns all possible amino acid mutations per position
    #protease
    mutList = []
    ref_gene = list(refG)
    codonct = 1
    #print("LENPROT",len(prot))
    for i in range(0,len(mut),3):
        #print("I",i,prot[i])
        #print("CODONCT ",codonct-1,ref_protease[codonct-1])
        #print("Base:",prot[i],prot[i+1],prot[i+2])
        combinations = getCombinations(mut[i],mut[i+1], mut[i+2],ref_gene[codonct-1])
        if len(combinations)>0:
            # print("MUTATION:", codonct, ref_gene[codonct-1], "comb", combinations)
            mutList.append([codonct,ref_gene[codonct-1],combinations])

        codonct = codonct + 1

    #print("M",mutList)
    return mutList


def createHIVDBRequest(prot,rt,integrase, out):
    cmd = "sierrapy mutations "
    #cmd = ""
    #protease
    # print("PROT1 ",len(prot))
    for i in prot:
        # print("PROT: ", i)
        for j in i[2]:
            #print("SIERRA: " , i[0],i[1], j)
            mut = "PR:"+i[1]+str(i[0])+j
            #print("SIERRA", mut)
            cmd = cmd + " "+mut
    #RT
    for i in rt:
        #print("PROT: ", i)
        for j in i[2]:
        #print("SIERRA: " , i[0],i[1], j)
            mut = "RT:"+i[1]+str(i[0])+j
            #print("SIERRA", mut)
            cmd = cmd + " "+mut
    #RT
    for i in integrase:
        #print("PROT: ", i)
        for j in i[2]:
            #print("SIERRA: " , i[0],i[1], j)
            mut = "RT:"+i[1]+str(i[0])+j
            #print("SIERRA", mut)
            cmd = cmd + " "+mut
    #protStr
    #cmd = cmd + " -o "+out
    #cmd = 'sierrapy mutations PR:L10I -o output_171.json'
    # print(cmd)
    #print (subprocess.check_output(cmd, shell=True))
    #output = subprocess.Popen(cmd, shell=True)
    output = subprocess.check_output(cmd, shell=True)
    file = open(out, 'w')
    file.write(output.decode("utf-8"))
    file.close()

if __name__ == '__main__':

    import argparse
    parser = argparse.ArgumentParser(description='')
    parser.add_argument('--vcfFile', required=True)
    parser.add_argument('--outFile', required=True)
    args = parser.parse_args()


    vcf_reader = vcf.Reader(open(args.vcfFile,'r'))

    # vcf_reader = VariantFile(snakemake.input[0])
    protease_seq = list(protease_hxb2)
    rt_seq = list(rt_hxb2)
    i_seq = list(integrase_hxb2)

    for record in vcf_reader:
        #print(record)
        if record.POS >=2253 and record.POS <2550: #protease
            alt = record.ALT
            temp = []
            for i in range(len(alt[0])):
                base = str(alt[0])[i]
                if base not in temp:
                    temp.append(base)
            protease_seq[record.POS-2253] = temp
        elif record.POS >=2550 and record.POS < 4229: #3869: #reverse transcriptase
            alt = record.ALT
            temp = []
            for i in range(len(alt[0])):
                base = str(alt[0])[i]
                if base not in temp:
                    temp.append(base)
            rt_seq[record.POS-2550] = temp
        elif record.POS >=4230 and record.POS <=5093: #5096 minus stop codon #integrase
            alt = record.ALT
            temp = []
            for i in range(len(alt[0])):
                base = str(alt[0])[i]
                if base not in temp:
                    temp.append(base)
            i_seq[record.POS-4230] = temp

    protease_mut = getMutations_mult(protease_seq,protease_hivdb)
    rt_mut = getMutations_mult(rt_seq,rt_hivdb)
    int_mut = getMutations_mult(i_seq,integrase_hivdb)

    #createHIVDBRequest(protease_mut, rt_mut,int_mut,snakemake.output[0])
    createHIVDBRequest(protease_mut, rt_mut,int_mut,args.outFile)