HIV Drug Resistance Profiling Pipeline using Bowtie2, Lofreq, and SierraPy

public public 8mo ago 0 bookmarks

Introduction

This is a pipeline for drug-resistance profiling from HIV whole genome NGS data. It takes in a HIV reference genome (currently HXB2) and a dataset of reads in (gzipped) fastq format, and performs the following steps:

  1. aligns the reads to the reference,

  2. performs variant calling, and

  3. queries the HIVDB system for the presence and degree of drug resistance (requires internet connection).

Currently, the pipeline uses Bowtie2 for Step1, Lofreq for Step 2, and sierrapy for Step 3.

This code is under development. We hope to test and add more options.

Installation

This pipeline requires the package manager Conda and the workflow management system Snakemake . All other dependencies are handled automatically by Snakemake.

Install Conda

Download Miniconda3 installer for Linux from here , or for macOS from here Installation instructions are here . Once installation is complete, you can test your Miniconda installation by running:

$ conda list

Install Snakemake

Snakemake recommends installation via Conda:

$ conda install -c conda-forge mamba
$ mamba create -c conda-forge -c bioconda -n snakemake snakemake

This creates an isolated enviroment containing the latest Snakemake. To activate it:

$ conda activate snakemake

To test snakemake installation

$ snakemake --help

Download the pipeline

Clone this pipeline by clicking the Clone button on the top-right of this page, or download it by clicking the ellipsis next to the Clone button.

Quickstart Guide

Let's try running the pipeline on sample data provided in the test_data folder. With the snakemake conda environment activated, and from the top-level directory (i.e. the one that contains this readme file), run:

snakemake --use-conda --configfile config/config.sample.yaml -np

to do a dry-run. If snakemake does not complain and everything seems ok, then run:

snakemake --use-conda --configfile config/config.sample.yaml --cores all

The results can be found inside the newly created directory called test_result . Interpretation of drug-resistance as provided by sierrapy can be found inside the drug_resistance_report folder. Intermediate files such as the read-to-reference alignments and variant calls can be found in their respective folders.

Running the pipeline on your own data

To the run the pipeline on your own data, you need to specify in a config file (in YAML format) the paths to the input data (reads and reference), path to the output directory. Optionally, in this config file, you can also set parameters for the various tools that make up this pipeline. You can use config/config.template.yaml as a template. Once the configfile is ready, run the pipeline like above:

snakemake --use-conda --configfile /path/to/configfile -np

for a dry run, and

snakemake --use-conda --configfile /path/to/configfile --cores all

for the actual run.

Contact

This is an ongoing work. If you have questions, concerns, issues, or suggestions, please contact: Anish Shrestha, Bioinformatics Lab, De La Salle University Manila at [email protected] .

Code Snippets

13
14
15
16
17
shell:
    """
    samtools faidx {input.reference}
    samtools view -bt {params.last_index_basename} {input.sam} > {output}
    """
26
27
shell:
    "samtools sort -o {output} {input}"
36
37
shell:
    "samtools index {input}"
29
30
shell:
   "bowtie2-build {input.reference} {params.index_basename}"
47
48
shell:
    "bowtie2 --local --{params.preset} -x {params.index_basename}  -1 {input.sub1} -2 {input.sub2} | samtools view -bS - > {output}"
12
13
14
15
16
shell:
    """
    seqtk trimfq {input.reads1} > {output.trim1}
    seqtk trimfq {input.reads2} > {output.trim2}
    """
32
33
34
35
36
shell:
    """
    seqtk sample -s {params.seed} {input.trim1} {params.n} > {output.sub1}
    seqtk sample -s {params.seed} {input.trim2} {params.n} > {output.sub2}
    """
8
9
shell:
    "python workflow/scripts/vcfToHivdb.py --vcfFile {input} --outFile {output}"
18
19
shell:
    "python workflow/scripts/tabulateJson.py --jsonFile {input} --outFile {output}"
75
76
shell:
   "lastdb {params.index_basename} {input.reference}"
SnakeMake From line 75 of rules/last.smk
90
91
92
93
94
95
shell:
    """
    set +o pipefail;
    gzip -cdf {input.query} | head -n 400000 > {params.last_sample1}
    last-train --verbose --revsym --matsym --gapsym --sample-number=400000 -Q1  {params.last_index_basename} {params.last_sample1} > {output}
    """
SnakeMake From line 90 of rules/last.smk
126
127
shell:
   "picard CreateSequenceDictionary R={input.reference} O={output}"
146
147
148
149
150
151
152
shell:
    """
    fastq-interleave {input.reads1} {input.reads2} |
    lastal -Q1 -i1 -p {input.scoring} {params.last_index_basename} |
    last-pair-probs -f {params.mean} -s {params.std} -m 0.01 -d 0.1 |
    maf-convert sam -f {input.dict} > {output}
    """
SnakeMake From line 146 of rules/last.smk
12
13
14
15
16
shell:
    """
    lofreq faidx {input.reference}
    lofreq call-parallel --pp-threads {threads} -f {input.reference} -o {output} {input.bam}
    """
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
from tabulate import tabulate
import json

if __name__ == '__main__':

    import argparse
    parser = argparse.ArgumentParser(description='')
    parser.add_argument('--jsonFile', required=True)
    parser.add_argument('--outFile', required=True)
    args = parser.parse_args()

    data = json.load(open(args.jsonFile))
    vResults = data['validationResults']
    drugResistance = data['drugResistance']

    dbResults = ""

    # Printing the validation results
    # print("Validation Results")
    for v in vResults:
        dbResults += "\n" + v['level'] + ": " + v['message'] + "\n"

    # Printing drug resistance results
    for d in drugResistance:
        dbResults += "\n\nDrug resistance interpretation: " + d['gene']['name']
        dbResults += "\nVersion: " + d['version']['text'] + "("  + d['version']['publishDate'] + ")"

        drugClass = []
        drug = []
        score = []
        partialScores = []
        text = []
        ctr = 0

        # Store the relevant information into a new dictionary
        for d2 in d['drugScores']:
            drugClass.append(d2['drugClass']['name'])
            drug.append(d2['drug']['displayAbbr'])
            score.append(d2['score'])
            partialScores.append(d2['partialScores'])
            text.append(d2['text'])
            ctr+=1

        results = {
            'drugClass' : drugClass,
            'drug' : drug,
            'score' : score,
            'text' : text
        }

        # Display Drug Resistance Results Table
        #display(pd.DataFrame(results))
        dbResults += "\n"
        dbResults += tabulate(results, headers = 'keys', tablefmt = 'psql')
        dbResults += "\n"

        # Printing partial scores, mutations, and comments
        for p in partialScores:
            if(p):
                mutations = p[0]['mutations']

                for m in mutations:
                    dbResults += "\n"
                    dbResults += "Mutation: " + m['text'] + "\n"
                    dbResults += "Type: " + m['primaryType'] + "\n"
                    dbResults += "Comments:\n"

                    comments = m['comments']
                    for c in comments:
                        dbResults += c['text'] + "\n"

    #print(dbResults)

    f = open(args.outFile, 'w')
    f.write(dbResults)
    f.close()
    #print (output)
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
import vcf
#from pysam import VariantFile
import subprocess

#coordinates of the 3 genes in 1-based coordinates (because VCF is 1-based)
#protease
pStart = 2253
pEnd = 2549
#reverse-transcriptase
rtStart = 2550
rtEnd = 4229
#integrase
iStart = 4230
iEnd = 5093

#HIVDB protein sequences:
protease_hivdb="PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF"

rt_hivdb = "PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKI"\
     "GPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGL"\
     "KKKKSVTVLDVGDAYFSVPLDKDFRKYTAFTIPSINNETPGIRYQYNVLP"\
     "QGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRT"\
     "KIEELRQHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKD"\
     "SWTVNDIQKLVGKLNWASQIYAGIKVKQLCKLLRGTKALTEVIPLTEEAE"\
     "LELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLK"\
     "TGKYARMRGAHTNDVKQLTEAVQKIATESIVIWGKTPKFKLPIQKETWEA"\
     "WWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRET"\
     "KLGKAGYVTDRGRQKVVSLTDTTNQKTELQAIHLALQDSGLEVNIVTDSQ"\
     "YALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDK"\
     "LVSAGIRKVL"

integrase_hivdb = "FLDGIDKAQEEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAM"\
            "HGQVDCSPGIWQLDCTHLEGKIILVAVHVASGYIEAEVIPAETGQETAYF"\
            "LLKLAGRWPVKTIHTDNGSNFTSTTVKAACWWAGIKQEFGIPYNPQSQGV"\
            "VESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERI"\
            "VDIIATDIQTKELQKQITKIQNFRVYYRDSRDPLWKGPAKLLWKGEGAVV"\
            "IQDNSDIKVVPRRKAKIIRDYGKQMAGDDCVASRQDED"

# #HXB2 translated protein sequences (translated based on gene coordinates above)
#protease_hxb2="PQVTLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF"
#rt_hxb2= "PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKI"\
#          "GPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGL"\
#          "KKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLP"\
#          "QGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRT"\
#          "KIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKD"\
#          "SWTVNDIQKLVGKLNWASQIYPGIKVRQLCKLLRGTKALTEVIPLTEEAE"\
#          "LELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLK"\
#          "TGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWET"\
#          "WWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRET"\
#          "KLGKAGYVTNRGRQKVVTLTDTTNQKTELQAIYLALQDSGLEVNIVTDSQ"\
#          "YALGIIQAQPDQSESELVNQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDK"\
#          "LVSAGIRKVL"
#integrase_hxb2 = "FLDGIDKAQDEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAM"\
#                  "HGQVDCSPGIWQLDCTHLEGKVILVAVHVASGYIEAEVIPAETGQETAYF"\
#                  "LLKLAGRWPVKTIHTDNGSNFTGATVRAACWWAGIKQEFGIPYNPQSQGV"\
#                  "VESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERI"\
#                  "VDIIATDIQTKELQKQITKIQNFRVYYRDSRNPLWKGPAKLLWKGEGAVV"\
#                  "IQDNSDIKVVPRRKAKIIRDYGKQMAGDDCVASRQDED"

protease_hxb2 = "CCTCAGGTCACTCTTTGGCAACGACCCCTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAGTTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTCATAGAAATCTGTGGACATAAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAATTTT"
rt_hxb2 = "CCCATTAGCCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAATTTGTACAGAGATGGAAAAGGAAGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAAGACAGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGTTCAATTAGGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGATGTGGGTGATGCATATTTTTCAGTTCCCTTAGATGAAGACTTCAGGAAGTATACTGCATTTACCATACCTAGTATAAACAATGAGACACCAGGGATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGACAAAAATCTTAGAGCCTTTTAGAAAACAAAATCCAGACATAGTTATCTATCAATACATGGATGATTTGTATGTAGGATCTGACTTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAGCTGAGACAACATCTGTTGAGGTGGGGACTTACCACACCAGACAAAAAACATCAGAAAGAACCTCCATTCCTTTGGATGGGTTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATAGTGCTGCCAGAAAAAGACAGCTGGACTGTCAATGACATACAGAAGTTAGTGGGGAAATTGAATTGGGCAAGTCAGATTTACCCAGGGATTAAAGTAAGGCAATTATGTAAACTCCTTAGAGGAACCAAAGCACTAACAGAAGTAATACCACTAACAGAAGAAGCAGAGCTAGAACTGGCAGAAAACAGAGAGATTCTAAAAGAACCAGTACATGGAGTGTATTATGACCCATCAAAAGACTTAATAGCAGAAATACAGAAGCAGGGGCAAGGCCAATGGACATATCAAATTTATCAAGAGCCATTTAAAAATCTGAAAACAGGAAAATATGCAAGAATGAGGGGTGCCCACACTAATGATGTAAAACAATTAACAGAGGCAGTGCAAAAAATAACCACAGAAAGCATAGTAATATGGGGAAAGACTCCTAAATTTAAACTGCCCATACAAAAGGAAACATGGGAAACATGGTGGACAGAGTATTGGCAAGCCACCTGGATTCCTGAGTGGGAGTTTGTTAATACCCCTCCCTTAGTGAAATTATGGTACCAGTTAGAGAAAGAACCCATAGTAGGAGCAGAAACCTTCTATGTAGATGGGGCAGCTAACAGGGAGACTAAATTAGGAAAAGCAGGATATGTTACTAATAGAGGAAGACAAAAAGTTGTCACCCTAACTGACACAACAAATCAGAAGACTGAGTTACAAGCAATTTATCTAGCTTTGCAGGATTCGGGATTAGAAGTAAACATAGTAACAGACTCACAATATGCATTAGGAATCATTCAAGCACAACCAGATCAAAGTGAATCAGAGTTAGTCAATCAAATAATAGAGCAGTTAATAAAAAAGGAAAAGGTCTATCTGGCATGGGTACCAGCACACAAAGGAATTGGAGGAAATGAACAAGTAGATAAATTAGTCAGTGCTGGAATCAGGAAAGTACTA"
integrase_hxb2 = "TTTTTAGATGGAATAGATAAGGCCCAAGATGAACATGAGAAATATCACAGTAATTGGAGAGCAATGGCTAGTGATTTTAACCTGCCACCTGTAGTAGCAAAAGAAATAGTAGCCAGCTGTGATAAATGTCAGCTAAAAGGAGAAGCCATGCATGGACAAGTAGACTGTAGTCCAGGAATATGGCAACTAGATTGTACACATTTAGAAGGAAAAGTTATCCTGGTAGCAGTTCATGTAGCCAGTGGATATATAGAAGCAGAAGTTATTCCAGCAGAAACAGGGCAGGAAACAGCATATTTTCTTTTAAAATTAGCAGGAAGATGGCCAGTAAAAACAATACATACTGACAATGGCAGCAATTTCACCGGTGCTACGGTTAGGGCCGCCTGTTGGTGGGCGGGAATCAAGCAGGAATTTGGAATTCCCTACAATCCCCAAAGTCAAGGAGTAGTAGAATCTATGAATAAAGAATTAAAGAAAATTATAGGACAGGTAAGAGATCAGGCTGAACATCTTAAGACAGCAGTACAAATGGCAGTATTCATCCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAAAGAATAGTAGACATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTACAAAAATTCAAAATTTTCGGGTTTATTACAGGGACAGCAGAAATCCACTTTGGAAAGGACCAGCAAAGCTCCTCTGGAAAGGTGAAGGGGCAGTAGTAATACAAGATAATAGTGACATAAAAGTAGTGCCAAGAAGAAAAGCAAAGATCATTAGGGATTATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAAGTAGACAGGATGAGGAT"

geneticCode ={
#Phenylalanine
"TTT": "F", "TTC": "F",
#Leucine
"TTA": "L", "TTG": "L", "CTT": "L", "CTC": "L", "CTA": "L", "CTG": "L",
#Isoleucine
"ATT": "I", "ATC": "I","ATA": "I",
#Methioinine
"ATG": "M",
#Valine
"GTT": "V", "GTC": "V", "GTA": "V", "GTG": "V",
#Serine
"TCT": "S", "TCC": "S", "TCA": "S","TCG": "S","AGT": "S", "AGC": "S",
#Proline
"CCT": "P", "CCC": "P", "CCA": "P","CCG": "P",
#Threonine
"ACT": "T","ACC": "T","ACA": "T","ACG": "T",
#Alanine
"GCT": "A","GCC": "A","GCA": "A","GCG": "A",
#Tyrosine
"TAT": "Y","TAC": "Y",
#Histidine
"CAT": "H","CAC": "H",
#Glutamine
"CAA": "Q","CAG": "Q",
#Asparagine
"AAT": "N","AAC": "N",
#Lysine
"AAA": "K","AAG": "K",
#Aspartic acid
"GAT": "D","GAC": "D",
#Glutamic acid
"GAA": "E","GAG": "E",
#Cysteine
"TGT": "C","TGC": "C",
#Tryptophan
"TGG": "W",
#Arginine
"CGT": "R","CGC": "R","CGA": "R","CGG": "R","AGA": "R","AGG": "R",
#Glycine
"GGT": "G","GGC": "G","GGA": "G","GGG": "G",
#stop
"TAA": "*","TAG": "*","TGA": "*"
}

def getCombinations(base1,  base2, base3,reference):
# returns all possible combinations

    mutations = []
    #print("Bases", base1, base2, base3)
    for i in range (len(base1)):
        for j in range (len(base2)):
            for k in range (len(base3)):
                codonStr = str(base1[i])+str(base2[j])+str(base3[k])
                #print("CODON:",codonStr)
                trans = geneticCode.get(codonStr,"!")
                #print("TRANS:",trans, reference)
                if trans!="!" and trans!=reference:
                    #add as a mutation only if it differs from the reference
                    if trans not in mutations:
                        #print("ALT codon",codonStr)
                        mutations.append(trans)
    return mutations


def getMutations_mult(mut,refG):
    #returns all possible amino acid mutations per position
    #protease
    mutList = []
    ref_gene = list(refG)
    codonct = 1
    #print("LENPROT",len(prot))
    for i in range(0,len(mut),3):
        #print("I",i,prot[i])
        #print("CODONCT ",codonct-1,ref_protease[codonct-1])
        #print("Base:",prot[i],prot[i+1],prot[i+2])
        combinations = getCombinations(mut[i],mut[i+1], mut[i+2],ref_gene[codonct-1])
        if len(combinations)>0:
            # print("MUTATION:", codonct, ref_gene[codonct-1], "comb", combinations)
            mutList.append([codonct,ref_gene[codonct-1],combinations])

        codonct = codonct + 1

    #print("M",mutList)
    return mutList


def createHIVDBRequest(prot,rt,integrase, out):
    cmd = "sierrapy mutations "
    #cmd = ""
    #protease
    # print("PROT1 ",len(prot))
    for i in prot:
        # print("PROT: ", i)
        for j in i[2]:
            #print("SIERRA: " , i[0],i[1], j)
            mut = "PR:"+i[1]+str(i[0])+j
            #print("SIERRA", mut)
            cmd = cmd + " "+mut
    #RT
    for i in rt:
        #print("PROT: ", i)
        for j in i[2]:
        #print("SIERRA: " , i[0],i[1], j)
            mut = "RT:"+i[1]+str(i[0])+j
            #print("SIERRA", mut)
            cmd = cmd + " "+mut
    #RT
    for i in integrase:
        #print("PROT: ", i)
        for j in i[2]:
            #print("SIERRA: " , i[0],i[1], j)
            mut = "RT:"+i[1]+str(i[0])+j
            #print("SIERRA", mut)
            cmd = cmd + " "+mut
    #protStr
    #cmd = cmd + " -o "+out
    #cmd = 'sierrapy mutations PR:L10I -o output_171.json'
    # print(cmd)
    #print (subprocess.check_output(cmd, shell=True))
    #output = subprocess.Popen(cmd, shell=True)
    output = subprocess.check_output(cmd, shell=True)
    file = open(out, 'w')
    file.write(output.decode("utf-8"))
    file.close()

if __name__ == '__main__':

    import argparse
    parser = argparse.ArgumentParser(description='')
    parser.add_argument('--vcfFile', required=True)
    parser.add_argument('--outFile', required=True)
    args = parser.parse_args()


    vcf_reader = vcf.Reader(open(args.vcfFile,'r'))

    # vcf_reader = VariantFile(snakemake.input[0])
    protease_seq = list(protease_hxb2)
    rt_seq = list(rt_hxb2)
    i_seq = list(integrase_hxb2)

    for record in vcf_reader:
        #print(record)
        if record.POS >=2253 and record.POS <2550: #protease
            alt = record.ALT
            temp = []
            for i in range(len(alt[0])):
                base = str(alt[0])[i]
                if base not in temp:
                    temp.append(base)
            protease_seq[record.POS-2253] = temp
        elif record.POS >=2550 and record.POS < 4229: #3869: #reverse transcriptase
            alt = record.ALT
            temp = []
            for i in range(len(alt[0])):
                base = str(alt[0])[i]
                if base not in temp:
                    temp.append(base)
            rt_seq[record.POS-2550] = temp
        elif record.POS >=4230 and record.POS <=5093: #5096 minus stop codon #integrase
            alt = record.ALT
            temp = []
            for i in range(len(alt[0])):
                base = str(alt[0])[i]
                if base not in temp:
                    temp.append(base)
            i_seq[record.POS-4230] = temp

    protease_mut = getMutations_mult(protease_seq,protease_hivdb)
    rt_mut = getMutations_mult(rt_seq,rt_hivdb)
    int_mut = getMutations_mult(i_seq,integrase_hivdb)

    #createHIVDBRequest(protease_mut, rt_mut,int_mut,snakemake.output[0])
    createHIVDBRequest(protease_mut, rt_mut,int_mut,args.outFile)
ShowHide 9 more snippets with no or duplicated tags.

Free

Created: 8mo ago
Updated: 8mo ago
Maitainers: public
URL: https://github.com/bioinfodlsu/hiv_hts_pipeline
Name: hiv_hts_pipeline
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...