Snakemake Workflow for Autoimmune Disease Data Extraction from GWAS and PGS Catalogs

public public 1yr ago 0 bookmarks

This repository contains a snakemake workflow that has been used to extract autoimmune disease-related data from the GWAS and PGS catalog according to experimental factor IDs. This allows regenerating Suppl. Tables and summary figures reported in:

Rochi Saurabh, Cesaire Fouodo, Inke R. König, Hauke Busch and Inken Wohlers.
A survey of genome-wide association studies (GWAS), polygenic scores (PGS) and UK Biobank (UKB) highlights resources for autoimmune disease genetics.

Code Snippets

40
shell: "head -n 1 {input} > {output}"
SnakeMake From line 40 of main/Snakefile
50
51
52
53
54
55
56
57
58
59
60
run:
    efo_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            efo_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split("\t")
             mapped_trait = s[35]
             if (not ',' in mapped_trait and mapped_trait.split("/")[-1] in efo_ids) or line[:10] == "DATE ADDED":
                f_out.write(line)
SnakeMake From line 50 of main/Snakefile
66
shell: "cat {input} | cut -f 36 | sort | uniq -c | sort -k1 -n -r > {output}"
SnakeMake From line 66 of main/Snakefile
72
73
74
75
76
77
78
run: 
    with open(input[0],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split("\t")
             mapped_trait = s[35]
             if mapped_trait.split("/")[-1] == wildcards.efo_id or line[:10] == "DATE ADDED":
                f_out.write(line)    
SnakeMake From line 72 of main/Snakefile
88
shell: "cat {input} | cut -f 37 | sort | uniq -c | sort -k1 -n -r > {output}"
SnakeMake From line 88 of main/Snakefile
95
shell: "cat {input} | cut -f 24 | sort | uniq -c | sort -k1 -n -r > {output}"
SnakeMake From line 95 of main/Snakefile
102
shell: "cat {input} | cut -f 22 | sort | uniq -c | sort -k1 -n -r > {output}"
SnakeMake From line 102 of main/Snakefile
109
shell: "cat {input} | cut -f 14 | sort | uniq -c | sort -k1 -n -r > {output}"
SnakeMake From line 109 of main/Snakefile
123
124
125
126
127
128
129
130
131
132
133
run:
    efo_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            efo_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split("\t")
             mapped_trait = s[13]
             if (not ',' in mapped_trait and mapped_trait.split("/")[-1] in efo_ids) or line[:10] == "DATE ADDED":
                f_out.write(line)
SnakeMake From line 123 of main/Snakefile
139
140
141
142
143
144
145
run: 
    with open(input[0],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split("\t")
             mapped_trait = s[13]
             if mapped_trait.split("/")[-1] == wildcards.efo_id or line[:10] == "DATE ADDED":
                f_out.write(line)
SnakeMake From line 139 of main/Snakefile
152
shell: "cat {input} | cut -f 15 | sort | uniq -c | sort -k1 -n -r > {output}"
SnakeMake From line 152 of main/Snakefile
162
163
164
165
166
167
168
169
170
171
172
run:
    efo_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            efo_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             pgs_efo = s[4]
             if (not "|" in pgs_efo and pgs_efo in efo_ids) or line[:15] == "Polygenic Score":
                f_out.write(line)    
SnakeMake From line 162 of main/Snakefile
181
182
183
184
185
186
187
run: 
    with open(input[0],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             mapped_trait = s[4]
             if mapped_trait == wildcards.efo_id or line[:15] == "Polygenic Score":
                f_out.write(line)   
SnakeMake From line 181 of main/Snakefile
192
shell: "cat {input} | grep -v 'Polygenic Score (PGS) ID' | cut -f 1 -d ',' | sort | uniq  > {output} & true"
SnakeMake From line 192 of main/Snakefile
197
shell: "cat {input} | grep -v 'Polygenic Score (PGS) ID' | cut -f 12 -d ',' | sort | uniq  > {output} & true"
SnakeMake From line 197 of main/Snakefile
202
shell: "cat {input} | grep -v 'PGS Performance Metric (PPM) ID' | cut -f 4 -d ',' | sort | uniq  > {output} & true"
SnakeMake From line 202 of main/Snakefile
208
209
210
211
shell: "cat {input[0]} | grep -v 'Polygenic Score (PGS) ID' | cut -f 12 -d ',' > {output}.tmp & true; " + \
       "cat {input[1]} | grep -v 'PGS Performance Metric (PPM) ID' | cut -f 4 -d ',' >> {output}.tmp & true; " + \
       "cat {output}.tmp | sort | uniq -c > {output}; " + \
       "rm {output}.tmp; "
SnakeMake From line 208 of main/Snakefile
217
218
219
220
221
222
223
224
225
226
227
run:
    pgs_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            pgs_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             pgs_id = s[0]
             if (pgs_id in pgs_ids) or line[:15] == "Polygenic Score":
                f_out.write(line)  
SnakeMake From line 217 of main/Snakefile
233
234
235
236
237
238
239
240
241
242
243
run:
    pgs_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            pgs_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             pgs_id = s[1]
             if (pgs_id in pgs_ids) or line[:22] == "PGS Performance Metric":
                f_out.write(line)  
SnakeMake From line 233 of main/Snakefile
249
250
251
252
253
254
255
256
257
258
259
run:
    pgs_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            pgs_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             pgs_id = s[1]
             if pgs_id in pgs_ids or line[:14] == "PGS Sample Set":
                f_out.write(line)  
SnakeMake From line 249 of main/Snakefile
264
shell: "cat {input} | cut -f 1 -d ',' | sort | uniq -c > {output}"
SnakeMake From line 264 of main/Snakefile
269
shell: "cat {input} | cut -f 12 | sort | uniq -c > {output}"
SnakeMake From line 269 of main/Snakefile
275
shell: "cat {input} | cut -f 5 -d ',' | sort | uniq -c | sort -k1 -n -r > {output}"
SnakeMake From line 275 of main/Snakefile
281
shell: "cat {input} | cut -f 1 -d ',' | sort | uniq > {output}"
SnakeMake From line 281 of main/Snakefile
288
289
290
291
292
293
294
295
296
297
298
run:
    pgs_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            pgs_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             pgs_id = s[0]
             if pgs_id in pgs_ids or line[:15] == "Polygenic Score":
                f_out.write(line)
SnakeMake From line 288 of main/Snakefile
305
306
307
308
309
310
311
312
313
314
315
run:
    pgs_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            pgs_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             pgs_id = s[1]
             if pgs_id in pgs_ids or line[:15] == "PGS Performance":
                f_out.write(line)  
SnakeMake From line 305 of main/Snakefile
322
323
324
325
326
327
328
329
330
331
332
run:
    pgs_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            pgs_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             pgs_id = s[1]
             if pgs_id in pgs_ids or line[:14] == "PGS Sample Set":
                f_out.write(line)    
SnakeMake From line 322 of main/Snakefile
339
340
341
342
343
344
345
346
347
348
349
run:
    efo_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            efo_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.strip("\n").split("\t")
             assert(len(s)==7)
             ukb_efo = s[2]
             if (not "|" in ukb_efo and (ukb_efo in efo_ids) or line[:5] == "ZOOMA"):
SnakeMake From line 339 of main/Snakefile
ShowHide 28 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/iwohlers/2022_autoimmune_review
Name: 2022_autoimmune_review
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...