16S pipeline using qiime2 created with snakemake

public public 1yr ago Version: v0.5.0 0 bookmarks

Qiime2 workflow for 16S analysis created with snakemake.

Authors

  • Ann-Kathrin Brüggemann (@AKBrueggemann)

  • Thomas Battenfeld (@thomasbtf)

Usage

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).

Step 1: Obtain a copy of this workflow

If you want to add your own changes to the workflow, create a GitHub repository of your own, then clone this one.

  1. Create a new github repository using this workflow as a template .

  2. Clone the newly created repository to your local system, into the place where you want to perform the data analysis.

If you just want to use this workflow locally, then simply clone it or download it as zip-file.

When you have the folder structure added on your local machine, please add a "data" folder manually.

Step 2: Configure workflow

Configure the workflow according to your needs via editing the files in the config/ folder. Adjust config.yaml to configure the workflow execution, and metadata.txt to specify your sample setup.

Some important parameters you should check and set according to your own fatsq-files in the config.yaml are primers for the forward and reverse reads, the datatype , that should be used by QIIME2 and the min-seq-length . Based on the sequencing, the length of the reads can vary.

The default parameters for filtering and truncation were validated with the help of a MOCK community and fitted to retrieve all bacteria from that community.

In addition to that, you need to fit the metadata-parameters to your data. Please change the names of the used metadata-columns according to your information.

If your metadata is not containing numeric values, please use the "reduced-analysis" option in the config file to run the workflow, as the workflow is currently not able to run only on categorical metadata for the full analysis version. We are going to fix that in the future.

The workflow is able to perform clustering and denoising either with vsearch, leading to OTU creation, or with DADA2, creating ASVs. You can decide which modus to use by setting the variable "DADA2" to True (DADA2 usage) or False (vsearch).

Please make sure, that the names of your fastq files are correctly formatted. They should look like this:

samplename_SNumber_Lane_R1/R2_001.fastq.gz

Step 3: Install Snakemake

Create a snakemake environment using mamba via:

mamba create -c conda-forge -c bioconda -n snakemake snakemake

For installation details, see the instructions in the Snakemake documentation .

Step 4: Execute workflow

Activate the conda environment:

conda activate snakemake

Fill up the metadata.txt with the information of your samples:

Please be careful to not include spaces between the commas. If there is a column, that you don't have any information about, please leave it empty and simply 
go on with the next column.

Test your configuration by performing a dry-run via

snakemake --use-conda -n

Executing the workflow takes two steps:

Data preparation: snakemake --cores $N --use-conda data_prep
Workflow execution: snakemake --cores $N --use-conda

using $N cores.

Step 5: Investigate results

After successful execution, the workflow provides you with a compressed folder, holding all interesting results ready to decompress or to download to your local machine. The compressed file 16S-report.tar.gz holds several qiime2-artifacts that can be inspected via qiime-view. In the zipped folder report.zip is the snakemake html report holding graphics as well as the DAG of the executed jobs and html files leading you directly to the qiime2-results, without the need of using qiime-view.

This report can, e.g., be forwarded to your collaborators.

Step 6: Commit changes

Whenever you change something, don't forget to commit the changes back to your github copy of the repository:

git commit -a
git push

Step 7: Obtain updates from upstream

Whenever you want to synchronize your workflow copy with new developments from upstream, do the following.

  1. Once, register the upstream repository in your local copy: git remote add -f upstream [email protected]:snakemake-workflows/16S.git or git remote add -f upstream https://github.com/snakemake-workflows/16S.git if you do not have setup ssh keys.

  2. Update the upstream version: git fetch upstream .

  3. Create a diff with the current version: git diff HEAD upstream/master workflow > upstream-changes.diff .

  4. Investigate the changes: vim upstream-changes.diff .

  5. Apply the modified diff via: git apply upstream-changes.diff .

  6. Carefully check whether you need to update the config files: git diff HEAD upstream/master config . If so, do it manually, and only where necessary, since you would otherwise likely overwrite your settings and samples.

Step 8: Contribute back

In case you have also changed or added steps, please consider contributing them back to the original repository:

  1. Fork the original repo to a personal or lab account.

  2. Clone the fork to your local system, to a different place than where you ran your analysis.

  3. Copy the modified files from your analysis to the clone of your fork, e.g., cp -r workflow path/to/fork . Make sure to not accidentally copy config file contents or sample sheets. Instead, manually update the example config files if necessary.

  4. Commit and push your changes to your fork.

  5. Create a pull request against the original repository.

Testing

Test cases are in the subfolder .test . They are automatically executed via continuous integration with Github Actions .

Tools

A list of the tools used in this pipeline:

Tool Link
QIIME2 www.doi.org/10.1038/s41587-019-0209-9
Snakemake www.doi.org/10.12688/f1000research.29032.1
FastQC www.bioinformatics.babraham.ac.uk/projects/fastqc
MultiQC www.doi.org/10.1093/bioinformatics/btw354
pandas pandas.pydata.org
kraken2 www.doi.org/10.1186/s13059-019-1891-0
vsearch www.github.com/torognes/vsearch
DADA2 www.doi.org/10.1038/nmeth.3869
songbird www.doi.org/10.1038/s41467-019-10656-5
bowtie2 www.doi.org/10.1038/nmeth.1923
Ancom www.doi.org/10.3402/mehd.v26.27663

Code Snippets

15
16
17
shell:
    "mkdir {output.dirc} \n"
    "bowtie2-build {input} {output.dirc}/{params.filename}"
32
33
34
35
36
37
shell:
    "bowtie2 -p 8 -x {input.db}/bowtie_host_DB "
    "-1 {input.read1} "
    "-2 {input.read2} "
    "-S {output} "
    "2> {log} "
51
52
53
shell:
    "bowtie2 -x {input.db}/bowtie_host_DB {input.read1} -S {output} "
    "2> {log} "
64
65
66
shell:
    "samtools view -bS {input} > {output} "
    "2> {log} "
79
80
81
82
shell:
    "samtools view -b -f 12 -F 256 "
    "{input} > {output} "
    "2> {log} "
 95
 96
 97
 98
 99
100
101
shell:
    "samtools sort -n -m 5G -@ 2 {input} -o {output.sorted} 2> {log} \n"
    "samtools fastq -@ 8 {output.sorted} "
    "-1 {output.read1} "
    "-2 {output.read2} "
    "-0 /dev/null -s /dev/null -n "
    "2> {log} "
114
115
116
117
shell:
    "samtools view -b -F 256 "
    "{input} > {output} "
    "2> {log} "
129
130
131
132
shell:
    "samtools sort -n -m 5G -@ 2 {input} -o {output.sorted} 2> {log} \n"
    "samtools fastq -0 {output.read1} {output.sorted} "
    "2> {log} "
143
144
145
shell:
    "samtools view -F 4 -c {input} > {output} "
    "2> {log} "
163
164
script:
    "../scripts/bowtie_frequency.py"
SnakeMake From line 163 of rules/bowtie.smk
13
14
15
16
17
18
19
20
21
22
23
shell:
    "qiime feature-table filter-features "
    "--i-table {input.table} "
    "--m-metadata-file {input.direc}/chimeras.qza "
    "--p-no-exclude-ids "
    "--o-filtered-table {output.table} \n"
    "qiime feature-table filter-seqs "
    "--i-data {input.seqs} "
    "--m-metadata-file {input.direc}/chimeras.qza "
    "--p-no-exclude-ids "
    "--o-filtered-data {output.seqs}"
41
42
43
44
45
46
47
48
49
50
51
shell:
    "qiime feature-classifier classify-consensus-vsearch "
    "--i-query {input.query} "
    "--i-reference-reads {input.reference_reads} "
    "--i-reference-taxonomy {input.reference_taxonomy} "
    "--p-maxaccepts {params.maxaccepts} "
    "--p-maxrejects {params.maxrejects} "
    "--p-perc-identity {params.perc_identity} "
    "--p-threads 4 "
    "--o-classification {output} "
    "--verbose"
64
65
66
67
68
69
shell:
    "qiime taxa barplot "
    "--i-table {input.table} "
    "--i-taxonomy {input.taxonomy} "
    "--m-metadata-file config/pep/sample.tsv "
    "--o-visualization {output}"
13
14
15
16
17
18
shell:
    "qiime vsearch dereplicate-sequences "
    "--i-sequences {input} "
    "--o-dereplicated-table {output.table} "
    "--o-dereplicated-sequences {output.seqs} "
    "--verbose 2> {log}"
34
35
36
37
38
39
40
41
42
shell:
    "qiime vsearch cluster-features-de-novo "
    "--i-table {input.table} "
    "--i-sequences {input.seqs} "
    "--p-perc-identity {params.perc_identity} "
    "--p-threads {params.threads} "
    "--o-clustered-table {output.table} "
    "--o-clustered-sequences {output.seqs} "
    "--verbose 2> {log}"
64
65
66
67
68
69
70
71
72
73
74
75
76
77
shell:
    "qiime feature-classifier classify-consensus-vsearch "
    "--i-query {input.query} "
    "--i-reference-reads {input.reference_reads} "
    "--i-reference-taxonomy {input.reference_taxonomy} "
    "--p-maxaccepts {params.maxaccepts} "
    "--p-maxrejects {params.maxrejects} "
    "--p-perc-identity {params.perc_identity} "
    "--p-query-cov {params.query_cov} "
    "--p-min-consensus {params.min_consensus} "
    "--p-threads {params.threads} "
    "--o-classification {output.tax} "
    "--o-search-results {output.search} "
    "--verbose 2> {log} "
92
93
94
95
96
97
98
99
shell:
    "qiime phylogeny align-to-tree-mafft-fasttree "
    "--i-sequences {input} "
    "--o-alignment {output.alignment} "
    "--o-masked-alignment {output.masked_alignment} "
    "--o-tree {output.tree} "
    "--o-rooted-tree {output.rooted_tree} "
    "--verbose 2> {log}"
115
116
117
118
119
120
121
122
shell:
    "qiime diversity core-metrics-phylogenetic "
    "--i-phylogeny {input.phylogeny} "
    "--i-table {input.table} "
    "--p-sampling-depth {params.depth} "
    "--m-metadata-file {input.metadata} "
    "--output-dir {output} "
    "--verbose 2> {log}"
137
138
139
140
141
142
143
shell:
    "qiime repeat-rarefy repeat-rarefy "
    "--i-table {input.table} "
    "--p-sampling-depth {params.sampling_depth} "
    "--p-repeat-times {params.repeats} "
    "--o-rarefied-table {output.table} "
    "--verbose 2> {log}"
21
22
23
24
25
26
27
28
29
shell:
    "qiime quality-filter q-score "
    "--i-demux {input} "
    "--p-min-quality {params.min_quality} "
    "--p-min-length-fraction {params.min_length_frac} "
    "--p-max-ambiguous {params.max_ambig} "
    "--o-filtered-sequences {output.filtering} "
    "--o-filter-stats {output.stats} "
    "--verbose 2> {log}"
52
53
54
55
56
57
58
59
60
shell:
    "qiime quality-filter q-score "
    "--i-demux {input} "
    "--p-min-quality {params.min_quality} "
    "--p-min-length-fraction {params.min_length_frac} "
    "--p-max-ambiguous {params.max_ambig} "
    "--o-filtered-sequences {output.filtering} "
    "--o-filter-stats {output.stats} "
    "--verbose 2> {log}"
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
shell:
    "qiime vsearch uchime-denovo "
    "--i-table {input.table} "
    "--i-sequences {input.seqs} "
    "--p-minh {params.minh} "
    "--output-dir {output.direc} \n"
    "qiime feature-table filter-features "
    "--i-table {input.table} "
    "--m-metadata-file {output.direc}/chimeras.qza "
    "--p-exclude-ids "
    "--o-filtered-table {output.table} \n"
    "qiime feature-table filter-seqs "
    "--i-data {input.seqs} "
    "--m-metadata-file {output.direc}/chimeras.qza "
    "--p-exclude-ids "
    "--o-filtered-data {output.seqs} "
    "--verbose 2> {log}"
110
111
112
113
114
115
116
117
118
119
120
shell:
    "qiime feature-table filter-seqs "
    "--i-data {input.seq} "
    "--m-metadata-file {input.seq} "
    "--p-where 'length(sequence) > {params.min_length}' "
    "--o-filtered-data {output.seq} \n"
    "qiime feature-table filter-features "
    "--i-table {input.table} "
    "--m-metadata-file {output.seq} "
    "--o-filtered-table {output.table} "
    "--verbose 2> {log}"
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
shell:
    "qiime dada2 denoise-paired "
    "--i-demultiplexed-seqs {input} "
    "--p-trunc-len-f {params.trunc_len_f} "
    "--p-trunc-len-r {params.trunc_len_r} "
    "--p-trim-left-f {params.trim_left_f} "
    "--p-trim-left-r  {params.trim_left_r} "
    "--p-max-ee-f {params.max_ee_f} "
    "--p-max-ee-r {params.max_ee_r} "
    "--p-trunc-q {params.trunc_q} "
    "--p-min-overlap {params.min_overlap} "
    "--p-pooling-method {params.pooling_method} "
    "--p-chimera-method {params.chimera_method} "
    "--p-min-fold-parent-over-abundance {params.min_fold_parent_over_abundance} "
    "--p-n-threads {params.threads} "
    "--p-n-reads-learn {params.n_reads_learn} "
    "--o-table {output.table} "
    "--o-representative-sequences {output.seq} "
    "--o-denoising-stats {output.stats} "
    "--verbose 2> {log}"
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
shell:
    "qiime dada2 denoise-paired "
    "--i-demultiplexed-seqs {input} "
    "--p-trunc-len {params.trunc_len_f} "
    "--p-trim-left {params.trim_left_f} "
    "--p-max-ee {params.max_ee_f} "
    "--p-trunc-q {params.trunc_q} "
    "--p-pooling-method {params.pooling_method} "
    "--p-chimera-method {params.chimera_method} "
    "--p-min-fold-parent-over-abundance {params.min_fold_parent_over_abundance} "
    "--p-n-threads {params.threads} "
    "--p-n-reads-learn {params.n_reads_learn} "
    "--o-table {output.table} "
    "--o-representative-sequences {output.seq} "
    "--o-denoising-stats {output.stats} "
    "--verbose 2> {log}"
237
238
script:
    "../scripts/relative_abundance.py"
253
254
255
256
257
258
259
260
261
262
263
264
265
266
shell:
    "value=$(<{input.abundance}) \n"
    "echo $value \n"
    "qiime feature-table filter-features "
    "--i-table {input.table} "
    "--p-min-frequency $value "
    "--o-filtered-table {output.table} "
    "--verbose 2> {log} \n"
    "qiime feature-table filter-seqs "
    "--i-data {input.seqs} "
    "--i-table {output.table} "
    "--p-no-exclude-ids "
    "--o-filtered-data {output.seqs} "
    "--verbose 2> {log} "
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
shell:
    "qiime quality-control exclude-seqs "
    "--i-query-sequences {input.seq} "
    "--i-reference-sequences {input.ref_seq} "
    "--p-threads {params.threads} "
    "--p-perc-identity {params.perc_identity} "
    "--p-perc-query-aligned {params.perc_query_aligned} "
    "--o-sequence-hits {output.human_hit} "
    "--o-sequence-misses {output.seq} "
    "--verbose 2> {log} \n"
    "qiime feature-table filter-features "
    "--i-table {input.table} "
    "--m-metadata-file {output.seq} "
    "--o-filtered-table {output.table} "
    "--verbose 2> {log} "
316
317
318
319
320
321
322
shell:
    "qiime taxa collapse "
    "--i-table {input.table} "
    "--i-taxonomy {input.taxonomy} "
    "--p-level 6 "
    "--o-collapsed-table {output} "
    "--verbose 2> {log} "
337
338
339
340
341
342
343
344
345
346
347
348
349
shell:
    "qiime taxa filter-table "
    "--i-table {input.table} "
    "--i-taxonomy {input.taxonomy} "
    "--p-exclude mitochondria,chloroplast "
    "--o-filtered-table {output.table} "
    "--verbose 2> {log} \n"
    "qiime taxa filter-seqs "
    "--i-sequences {input.seq} "
    "--i-taxonomy {input.taxonomy} "
    "--p-exclude mitochondria,chloroplast "
    "--o-filtered-sequences {output.seq} "
    "--verbose 2> {log} "
14
15
16
17
18
19
20
21
22
23
24
25
26
shell:
    "qiime tools export "
    "--input-path {input.table} "
    "--output-path {output.table_biom} \n"
    "qiime tools export "
    "--input-path {input.taxonomy} "
    "--output-path {output.taxa_biom} \n"
    "qiime feature-table presence-absence "
    "--i-table {input.table} "
    "--o-presence-absence-table {output.table_binary} \n"
    "qiime tools export "
    "--input-path {output.table_binary} "
    "--output-path {output.binary_biom}"
38
39
40
41
42
shell:
    "qiime metadata tabulate "
    "--m-input-file {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"
113
114
script:
    "../scripts/rename_qzv.py"
188
189
script:
    "../scripts/extract_reports.py"
209
210
script:
    "../scripts/extract_beta_corr.py"
230
231
script:
    "../scripts/extract_significance.py"
251
252
script:
    "../scripts/extract_significance.py"
272
273
script:
    "../scripts/extract_significance.py"
291
292
script:
    "../scripts/extract_significance.py"
310
311
script:
    "../scripts/extract_beta_corr.py"
393
394
395
396
shell:
    "snakemake --nolock --report {output} --report-stylesheet resources/custom-stylesheet.css "
    "{params.for_testing} "
    "> {log} 2>&1"
413
414
shell:
    "tar -czvf {output} {params.directory} "
430
431
script:
    "../scripts/yaml_to_table.py"
477
478
479
480
481
482
483
484
shell:
    """
    mkdir results/{wildcards.date}/16S-report
    cp -r {input} results/{wildcards.date}/16S-report/
    tar -czvf results/{wildcards.date}/16S-report.tar.gz results/{wildcards.date}/16S-report/
    cp results/{wildcards.date}/16S-report.tar.gz {params.outpath}
    rm -r results/{wildcards.date}/16S-report
    """
14
15
16
17
18
19
20
21
22
23
24
25
26
shell:
    "qiime tools export "
    "--input-path {input.table} "
    "--output-path {output.table_biom} \n"
    "qiime tools export "
    "--input-path {input.taxonomy} "
    "--output-path {output.taxa_biom} \n"
    "qiime feature-table presence-absence "
    "--i-table {input.table} "
    "--o-presence-absence-table {output.table_binary} \n"
    "qiime tools export "
    "--input-path {output.table_binary} "
    "--output-path {output.binary_biom}"
 99
100
script:
    "../scripts/rename_qzv.py"
174
175
script:
    "../scripts/extract_reports.py"
195
196
script:
    "../scripts/extract_beta_corr.py"
216
217
script:
    "../scripts/extract_significance.py"
237
238
script:
    "../scripts/extract_significance.py"
258
259
script:
    "../scripts/extract_significance.py"
277
278
script:
    "../scripts/extract_significance.py"
296
297
script:
    "../scripts/extract_beta_corr.py"
380
381
382
383
shell:
    "snakemake --nolock --report {output} --report-stylesheet resources/custom-stylesheet.css "
    "{params.for_testing} "
    "> {log} 2>&1"
400
401
shell:
    "tar -czvf {output} {params.directory} "
417
418
script:
    "../scripts/yaml_to_table.py"
465
466
467
468
469
470
471
472
shell:
    """
    mkdir results/{wildcards.date}/16S-report
    cp -r {input} results/{wildcards.date}/16S-report/
    tar -czvf results/{wildcards.date}/16S-report.tar.gz results/{wildcards.date}/16S-report/
    cp results/{wildcards.date}/16S-report.tar.gz {params.outpath}
    rm -r results/{wildcards.date}/16S-report
    """
14
15
script:
    "../scripts/create_sample_metadata.py"
16
17
18
19
20
21
shell:
    "cd resources; "
    "wget {params.genomic}; "
    "wget {params.kraken}; "
    "wget {params.seq}; "
    "wget {params.tax}; "
70
71
shell:
    "gzip -dc {input} > {output} 2> {log}; "
83
84
script:
    "../scripts/lower_to_upper_fasta.py"
 96
 97
 98
 99
100
101
shell:
    "qiime tools import "
    "--input-path {input} "
    "--output-path {output} "
    "--type 'FeatureData[Sequence]' "
    "2> {log} "
113
114
shell:
    "tar -zxvf {input} -C resources;"
132
133
134
135
136
137
138
shell:
    "qiime tools import "
    "--type {params.datatype} "
    "--input-path {params.direc} "
    "--input-format CasavaOneEightSingleLanePerSampleDirFmt "
    "--output-path {output} "
    "2> {log} "
167
168
169
170
171
172
173
shell:
    "qiime tools import "
    "--type {params.datatype} "
    "--input-path {params.dirc} "
    "--input-format CasavaOneEightSingleLanePerSampleDirFmt "
    "--output-path {output} "
    "2> {log} "
198
199
200
201
202
203
204
shell:
    "qiime tools import "
    "--type {params.datatype} "
    "--input-path {params.dirc} "
    "--input-format CasavaOneEightSingleLanePerSampleDirFmt "
    "--output-path {output} "
    "2> {log} "
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
shell:
    """
    if [[ '${params.datatype}' == '$SampleData[PairedEndSequencesWithQuality]' ]] 
    then 
        qiime cutadapt trim-paired \
        --i-demultiplexed-sequences {input} \
        --p-adapter-f {params.adapter1} \
        --p-front-f {params.primer1} \
        --p-front-r {params.primer2} \
        --p-adapter-r {params.adapter2} \
        --p-error-rate {params.error_rate} \
        --p-times {params.rep_times} \
        --p-overlap {params.overlap} \
        --p-minimum-length {params.min_length} \
        --o-trimmed-sequences {output} \
        --verbose 2> {log}
    else 
        qiime cutadapt trim-single \
        --i-demultiplexed-sequences {input} \
        --p-cores 10 \
        --p-adapter {params.primer1} \
        --p-front {params.primer2} \
        --p-error-rate {params.error_rate} \
        --p-times {params.rep_times} \
        --p-overlap {params.overlap} \
        --p-minimum-length {params.min_length} \
        --o-trimmed-sequences {output} \
        --verbose 2> {log}
    fi
    """
283
284
285
286
287
288
289
290
291
292
shell:
    "qiime vsearch merge-pairs "
    "--i-demultiplexed-seqs {input} "
    "--p-allowmergestagger "
    "--p-minovlen {params.minovlen} "
    "--p-minlen {params.minlen} "
    "--p-maxdiffs {params.maxdiffs} "
    "--p-threads {params.threads} "
    "--o-merged-sequences {output} "
    "--verbose 2> {log}"
19
20
shell:
    "(kraken2 --db {input.db} --memory-mapping --threads {params.threads} --paired --use-names --report {output.report} {input.read1} {input.read2}) 2> {log}"
40
41
shell:
    "(kraken2 --db {input.db} --memory-mapping --threads {params.threads} --use-names --report {output.report} {input.read}) 2> {log}"
53
54
wrapper:
    "v1.3.1/bio/fastqc"
76
77
wrapper:
    "v1.23.3/bio/multiqc"
10
11
12
13
14
shell:
    "qiime demux summarize "
    "--i-data {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"
26
27
28
29
30
shell:
    "qiime feature-table summarize "
    "--i-table {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"
42
43
44
45
46
shell:
    "qiime vsearch fastq-stats "
    "--i-sequences {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"
58
59
60
61
62
shell:
    "qiime metadata tabulate "
    "--m-input-file {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"
76
77
78
79
80
shell:
    "qiime feature-table tabulate-seqs "
    "--i-data {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"
98
99
script:
    "../scripts/extract_humancount.py"
111
112
shell:
    "mkdir {output}"
127
128
129
130
131
132
133
134
shell:
    "qiime feature-table heatmap "
    "--i-table {input} "
    "--m-sample-metadata-file config/pep/sample.tsv "
    "--m-sample-metadata-column {params.metadata} "
    "--p-cluster {params.cluster} "
    "--o-visualization {output} "
    "--verbose 2> {log}"
147
148
149
150
151
152
153
shell:
    "qiime taxa barplot "
    "--i-table {input.table} "
    "--i-taxonomy {input.taxonomy} "
    "--m-metadata-file config/pep/sample.tsv "
    "--o-visualization {output} "
    "--verbose 2> {log}"
165
166
167
168
169
shell:
    "qiime metadata tabulate "
    "--m-input-file {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"
181
182
183
184
185
shell:
    "qiime tools export "
    "--input-path {input} "
    "--output-path {output} "
    "--verbose 2> {log}"
197
198
script:
    "../scripts/rename_taxonomy.py"
212
213
214
215
216
217
shell:
    """
    biom add-metadata -i {input.biom}/feature-table.biom -o {output.biom} --observation-metadata-fp {input.taxonomy}

    biom convert -i {output.biom} -o {output.txt} --to-tsv --header-key taxonomy
    """
231
232
233
234
235
236
shell:
    """
    biom add-metadata -i {input.biom}/feature-table.biom -o {output.biom} --observation-metadata-fp {input.taxonomy}

    biom convert -i {output.biom} -o {output.txt} --to-tsv --header-key taxonomy
    """
253
254
script:
    "../scripts/binaryheatmap.py"
271
272
script:
    "../scripts/absolute_taxabarplot.py"
288
289
290
291
292
293
294
295
296
297
298
299
300
shell:
    "qiime tools export "
    "--input-path {input.table} "
    "--output-path {output.table_biom} \n"
    "qiime tools export "
    "--input-path {input.taxonomy} "
    "--output-path {output.taxa_biom} \n"
    "qiime feature-table presence-absence "
    "--i-table {input.table} "
    "--o-presence-absence-table {output.table_binary} \n"
    "qiime tools export "
    "--input-path {output.table_binary} "
    "--output-path {output.binary_biom}"
320
321
script:
    "../scripts/rename_qzv.py"
369
370
script:
    "../scripts/extract_reports.py"
389
390
391
392
shell:
    "snakemake --nolock --report {output} --report-stylesheet resources/custom-stylesheet.css "
    "{params.for_testing} "
    "> {log} 2>&1"
409
410
shell:
    "tar -czvf {output} {params.directory} "
424
425
426
427
428
429
430
431
432
shell:
    "qiime feature-table summarize "
    "--i-table {input.table_wh} "
    "--o-visualization {output.visual_wh} "
    "--verbose 2> {log} \n"
    "qiime feature-table summarize "
    "--i-table {input.table_woh} "
    "--o-visualization {output.visual_woh} "
    "--verbose 2> {log}"
453
454
script:
    "../scripts/sample_freq_difference.py"
470
471
script:
    "../scripts/yaml_to_table.py"
502
503
504
505
506
507
508
509
shell:
    """
    mkdir results/{wildcards.date}/16S-report
    cp -r {input} results/{wildcards.date}/16S-report/
    tar -czvf results/{wildcards.date}/16S-report.tar.gz results/{wildcards.date}/16S-report/
    cp results/{wildcards.date}/16S-report.tar.gz {params.outpath}
    rm -r results/{wildcards.date}/16S-report
    """
12
13
14
15
16
17
shell:
    "qiime diversity alpha "
    "--i-table {input} "
    "--p-metric {params.metric} "
    "--o-alpha-diversity {output} "
    "--verbose 2> {log}"
29
30
31
32
33
34
shell:
    "qiime diversity alpha-group-significance "
    "--i-alpha-diversity {input} "
    "--m-metadata-file config/pep/sample.tsv "
    "--o-visualization {output} "
    "--verbose 2> {log} "
49
50
51
52
53
54
55
56
shell:
    "qiime diversity alpha-correlation "
    "--i-alpha-diversity {input} "
    "--m-metadata-file {params.metadata} "
    "--p-method {params.method} "
    "--p-intersect-ids "
    "--o-visualization {output} "
    "--verbose 2> {log}"
72
73
74
75
76
77
78
79
shell:
    "qiime diversity beta "
    "--i-table {input} "
    "--p-metric {params.metric} "
    "--p-pseudocount {params.pseudocount} "
    "--p-n-jobs {params.n_jobs} "
    "--o-distance-matrix {output} "
    "--verbose 2> {log}"
93
94
95
96
97
98
99
shell:
    "qiime diversity beta-group-significance "
    "--i-distance-matrix {input} "
    "--m-metadata-file config/pep/sample.tsv "
    "--m-metadata-column {wildcards.metadata_column} "
    "--o-visualization {output.out} "
    "--verbose 2> {log}"
117
118
119
120
121
122
123
124
125
126
127
128
129
shell:
    "qiime diversity beta-correlation "
    "--i-distance-matrix {input} "
    "--m-metadata-file {params.metadata_file} "
    "--m-metadata-column {wildcards.metadata_column} "
    "--p-method {params.method} "
    "--p-permutations {params.permutations} "
    "--p-intersect-ids "
    "--p-label1 jaccard-distance-matrix "
    "--p-label2 {params.metadata} "
    "--o-metadata-distance-matrix {output.distance_matrix} "
    "--o-mantel-scatter-visualization {output.mantel_scatter_vis} "
    "--verbose 2> {log}"
142
143
144
145
146
147
148
shell:
    "qiime diversity alpha-phylogenetic "
    "--i-phylogeny {input.phylogeny} "
    "--i-table {input.table} "
    "--p-metric faith_pd "
    "--o-alpha-diversity {output} "
    "--verbose 2> {log} "
165
166
167
168
169
170
171
172
173
shell:
    "qiime diversity beta-phylogenetic "
    "--i-table {input.table} "
    "--i-phylogeny {input.phylogeny} "
    "--p-metric {params.metrics} "
    "--p-threads {params.threads} "
    "--p-variance-adjusted {params.variance_adjusted} "
    "--o-distance-matrix {output} "
    "--verbose 2> {log}"
185
186
187
188
189
shell:
    "qiime diversity pcoa "
    "--i-distance-matrix {input} "
    "--o-pcoa {output} "
    "--verbose 2> {log} "
203
204
205
206
207
208
shell:
    "qiime emperor plot "
    "--i-pcoa {input} "
    "--m-metadata-file {params.metadata} "
    "--o-visualization {output} "
    "--verbose 2> {log} "
10
11
12
13
14
shell:
    "qiime demux summarize "
    "--i-data {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"
26
27
28
29
30
shell:
    "qiime feature-table summarize "
    "--i-table {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"
44
45
46
47
48
49
50
51
52
shell:
    "qiime feature-table summarize "
    "--i-table {input.table_wh} "
    "--o-visualization {output.visual_wh} "
    "--verbose 2> {log} \n"
    "qiime feature-table summarize "
    "--i-table {input.table_woh} "
    "--o-visualization {output.visual_woh} "
    "--verbose 2> {log}"
64
65
66
67
68
shell:
    "qiime vsearch fastq-stats "
    "--i-sequences {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"
80
81
82
83
84
shell:
    "qiime metadata tabulate "
    "--m-input-file {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"
 98
 99
100
101
102
shell:
    "qiime feature-table tabulate-seqs "
    "--i-data {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"
120
121
script:
    "../scripts/extract_humancount.py"
133
134
shell:
    "mkdir {output}"
149
150
151
152
153
154
155
156
shell:
    "qiime feature-table heatmap "
    "--i-table {input} "
    "--m-sample-metadata-file config/pep/sample.tsv "
    "--m-sample-metadata-column {params.metadata} "
    "--p-cluster {params.cluster} "
    "--o-visualization {output} "
    "--verbose 2> {log}"
169
170
171
172
173
174
175
shell:
    "qiime taxa barplot "
    "--i-table {input.table} "
    "--i-taxonomy {input.taxonomy} "
    "--m-metadata-file config/pep/sample.tsv "
    "--o-visualization {output} "
    "--verbose 2> {log}"
187
188
189
190
191
shell:
    "qiime metadata tabulate "
    "--m-input-file {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"
203
204
205
206
207
shell:
    "qiime tools export "
    "--input-path {input} "
    "--output-path {output} "
    "--verbose 2> {log}"
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
shell:
    "qiime diversity alpha-rarefaction "
    "--i-table {input.table} "
    "--i-phylogeny {input.phylogeny} "
    "--p-max-depth {params.max_depth} "
    "--m-metadata-file config/pep/sample.tsv "
    "--o-visualization {output.alpha} "
    "--verbose 2> {log} \n"
    "qiime diversity beta-rarefaction "
    "--i-table {input.table} "
    "--i-phylogeny {input.phylogeny} "
    "--p-metric {params.metric} "
    "--p-clustering-method {params.clustering_method} "
    "--m-metadata-file config/pep/sample.tsv "
    "--p-sampling-depth {params.sampling_depth} "
    "--o-visualization {output.beta} "
    "--verbose 2> {log}"
257
258
259
260
261
262
263
264
265
266
267
268
269
shell:
    "qiime gneiss correlation-clustering "
    "--i-table {input} "
    "--o-clustering {output.tree} \n"

    "qiime gneiss dendrogram-heatmap "
    "--i-table {input} "
    "--i-tree {output.tree} "
    "--m-metadata-file config/pep/sample.tsv "
    "--m-metadata-column {params.metadata} "
    "--p-color-map seismic "
    "--o-visualization {output.heatmap_gneiss} "
    "--verbose 2> {log}"
281
282
script:
    "../scripts/rename_taxonomy.py"
296
297
298
299
300
301
shell:
    """
    (biom add-metadata -i {input.biom}/feature-table.biom -o {output.biom} --observation-metadata-fp {input.taxonomy}) > {log} 2>&1

    (biom convert -i {output.biom} -o {output.txt} --to-tsv --header-key taxonomy) > {log} 2>&1
    """
315
316
317
318
319
320
shell:
    """
    (biom add-metadata -i {input.biom}/feature-table.biom -o {output.biom} --observation-metadata-fp {input.taxonomy}) > {log} 2>&1

    (biom convert -i {output.biom} -o {output.txt} --to-tsv --header-key taxonomy) > {log} 2>&1
    """
337
338
script:
    "../scripts/binaryheatmap.py"
355
356
script:
    "../scripts/absolute_taxabarplot.py"
374
375
376
377
378
379
380
381
382
383
384
385
shell:
    "songbird multinomial "
    "--input-biom {input} "
    "--metadata-file config/pep/sample.tsv "
    "--formula  {params.formula} "
    "--epochs 10000 "
    "--differential-prior {params.differential_prior} "
    "--min-sample-count {params.min_sample_count} "
    "--min-feature-count {params.min_feature_count} "
    "--summary-interval {params.summary_interval} "
    "--summary-dir {output} "
    "2> {log}"
401
402
script:
    "../scripts/featuretable_songbird.py"
425
426
427
428
429
430
431
432
shell:
    "qurro "
    "--ranks {params.differentials} "
    "--table {input.table} "
    "--sample-metadata {params.metadata_file} "
    "--feature-metadata {input.feature_metadata} "
    "--output-dir {output.plot} "
    "2> {log} "
448
449
450
451
452
453
454
455
456
457
shell:
    "qiime composition add-pseudocount "
    "--i-table {input} "
    "--o-composition-table {output.pseudocount_table} \n"
    "qiime composition ancom "
    "--i-table {output.pseudocount_table} "
    "--m-metadata-file {params.metadata_file} "
    "--m-metadata-column {params.metadata_column} "
    "--o-visualization {output.ancom_output} "
    "--verbose 2> {log}"
478
479
script:
    "../scripts/sample_freq_difference.py"
498
499
script:
    "../scripts/prepare_diversity.py"
516
517
script:
    "../scripts/plot_distance.py"
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from pylab import savefig
import sys


sys.stderr = open(snakemake.log[0], "w")

df = pd.read_csv(str(snakemake.input), delimiter="\t", header=1, index_col="taxonomy")
df.drop("#OTU ID", axis=1, inplace=True)
df_reduced = df.groupby(df.index).sum()
species = df_reduced.index.unique()
for i in df_reduced.index:
    if "Unassigned" not in i:
        df_reduced.index = df_reduced.index.str.split("; s__").str[0]
df_reduced_second = df_reduced.groupby(df_reduced.index).sum()
for i in df_reduced_second.index:
    if "Chloroplast" in i:
        df_reduced_second.drop(index=i, inplace=True)
df_trans = df_reduced_second.T
df_trans.replace(-np.inf, 0, inplace=True)
np.random.seed(100)
mycolors = np.random.choice(
    list(mpl.colors.XKCD_COLORS.keys()), len(species), replace=False
)
fig = df_trans.plot(kind="bar", stacked=True, color=mycolors, figsize=(16, 8))
handles, labels = fig.get_legend_handles_labels()
lgd = fig.legend(
    handles,
    labels,
    loc="center left",
    bbox_to_anchor=(1, 0.5),
    borderaxespad=0.0,
    fontsize=8,
)
plt.yscale("log")
if len(df) < 40:
    plt.xticks(fontsize=7, rotation=45)
if len(df) > 40:
    plt.xticks(fontsize=5, rotation=45)
if len(df) > 100:
    plt.xticks(fontsize=4, rotation=45)
plt.xlabel("Sample name")
plt.ylabel("Log10 abundance")
plt.title("Taxa-bar-plot of log10(abundance)")
plt.savefig(str(snakemake.output), bbox_extra_artists=(lgd,), bbox_inches="tight")
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import numpy as np
from pylab import savefig
import sys


sys.stderr = open(snakemake.log[0], "w")

# Reading information from a file, holding binary information whether an OTU is present or absent in a sample

# Reading the file, deleting unnecessary columns
df = pd.read_csv(str(snakemake.input), delimiter="\t", header=1, index_col="taxonomy")
del df["#OTU ID"]
# Creating a specific colour scheme
my_colors = [(0.2, 0.6, 0.3), (0.1, 0.4, 0.5)]
fig, ax = plt.subplots(figsize=(20, 15))
# Creating a heatmap with seaborn
heatmap = sb.heatmap(
    df, cmap=my_colors, linewidth=0.01, linecolor="Black", cbar=False, yticklabels=True
)
# Setting the properties for the legend
cbar = heatmap.figure.colorbar(heatmap.collections[0])
cbar.set_ticks([0.25, 0.75])
cbar.set_ticklabels(["absent", "present"])
heatmap.set_yticklabels(heatmap.get_ymajorticklabels(), fontsize=6)
if len(df) < 40:
    plt.xticks(fontsize=8, rotation=45)
if len(df) > 40:
    plt.xticks(fontsize=6, rotation=45)
if len(df) > 100:
    plt.xticks(fontsize=5, rotation=45)
heatmap.set(xlabel="Sample", ylabel="Taxonomic classification of bacteria")
# Setting the orientation of the plot
plt.subplots_adjust(bottom=0.2, left=0.4)
plt.show()

figure = heatmap.get_figure()
figure.savefig(str(snakemake.output), dpi=400)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import numpy as np
import os
from pylab import savefig
import sys


sys.stderr = open(snakemake.log[0], "w")

path = os.path.dirname(str(snakemake.input[0]))

inputs = []
for file in os.listdir(path):
    if file.endswith(".txt"):
        inputs.append(os.path.join(path, file))
# concatenate all txt files in a file
with open(str(snakemake.output), "w") as outfile:
    i = 0
    outfile.write("Sample: Difference without human " + "\n")
    while i < len(inputs):
        with open(inputs[i], encoding="utf-8", errors="ignore") as infile:
            path_name = inputs[i].split("_")[0]
            name = path_name.split("/")[-1] + ": " + infile.read()
            outfile.write(name)
            i += 1
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
import os
import pandas as pd
from os.path import isfile, join
from os import path, getcwd, listdir
from os import mkdir
from shutil import move, copy2
from datetime import date, datetime
import sys


sys.stderr = open(snakemake.log[0], "w")

# Creating a metadata sample-sheet, that holds additional information concerning the samples.
# Metadata information is needed to be able to create plots and for metadata-specific analysis.

config = snakemake.config

DATA_PATH = str(config["data"])
IN_PATH = str(config["input"])
# today = date.today().strftime("%Y-%m-%d")
incoming_files = []
# Running through all files in the IN_PATH and checking them
# Adding the files to a list, when they meet the requirements
for f in listdir(IN_PATH):
    if (
        path.isfile(path.join(IN_PATH, f))
        and "ndetermined" not in f
        and ".fastq.gz" in f
        and os.stat(IN_PATH + f).st_size > 100
    ):
        incoming_files.append(f)
    else:
        print(f, "not used")
metadata = pd.read_csv(str(snakemake.input), header=0, delimiter=",")
date = metadata["run_date"].iloc[1]
print(date)
# Adding a direcory for the specific date as subdirectory to the DATA_PATH
DATA_PATH += date
if not os.path.isdir(DATA_PATH):
    mkdir(DATA_PATH)
# Checking if some files are already present in the DATA_PATH, those are not copied
data_files = [f for f in listdir(DATA_PATH) if path.isfile(path.join(DATA_PATH, f))]
files_not_to_copy = [f for f in data_files if f in incoming_files]
files_to_copy = [f for f in incoming_files if f not in data_files]
for file in files_to_copy:
    if file.endswith(".fastq.gz") and not "ndetermined" in file:
        copy2(IN_PATH + file, DATA_PATH)
print(files_to_copy)

# Reading the sample names and the metadata from the file-names and the metadata.csv file, that needs to be provided
files = os.listdir(DATA_PATH)
print(files)
sample_list = []
for name in files:
    sample = name.split("_")[0]
    sample_list.append(sample)
sample_list = list(set(sample_list))
metadata = pd.read_csv(str(snakemake.input), header=0, delimiter=",")
metadata.columns = metadata.columns.str.lower()
# Samples, that are not mentioned in the metadata.csv are excluded from the sample-metadata-sheet
for name in sample_list:
    if name not in metadata["sample_name"].tolist():
        print("No metadata was found for " + name)
        sample_list.remove(name)
        # Also remove the fastq file from folder?

# Error if names don't fit
metadata_name_list = metadata["sample_name"].tolist()
metadata_name_list.remove("#q2:types")
if len(sample_list) == len(metadata_name_list):
    if set(sample_list) != set(metadata_name_list):
        print(
            "There seems to be a discrepancy between fastq file and metadata. Please check the spelling in the metadata.txt."
        )
        raise Exception(
            "There seems to be a discrepancy between fastq file and metadata. Please check the spelling in the metadata.txt."
        )

# Replacing empty metadata-columns with 0 and after that removing them from the file
metadata.fillna(0, inplace=True)
metadata.rename(columns={"sample_name": "sample-ID"}, inplace=True)
metadata.to_csv(snakemake.output.metadata, sep="\t", index=False)

# Creating a sample_info df, that holds paths to the fastq files and the date.
# Common functions can read information from sample_info.txt instead from the data directories
data = [metadata["sample-ID"], metadata["run_date"]]
header = ["sample-ID", "run_date"]
sample_info = pd.concat(data, axis=1, keys=header)
if "SampleData[PairedEndSequencesWithQuality]" in str(snakemake.params.datatype):
    sample_info["path1"] = ""
    sample_info["path2"] = ""
elif "SampleData[SequencesWithQuality]" in str(snakemake.params.datatype):
    sample_info["path1"] = ""
sample_info.set_index("sample-ID", inplace=True)
sample_info.drop(labels=["#q2:types"], axis=0, inplace=True)
i = 0
while i < len(sample_info.index):
    for file in files_to_copy:
        if sample_info.index[i] in file and len(sample_info.index[i]) == len(
            file.split("_")[0]
        ):
            if "SampleData[PairedEndSequencesWithQuality]" in str(
                snakemake.params.datatype
            ):
                if "R1" in file:
                    sample_info.loc[sample_info.index[i], "path1"] = (
                        DATA_PATH + "/" + file
                    )
                elif "R2" in file:
                    sample_info.loc[sample_info.index[i], "path2"] = (
                        DATA_PATH + "/" + file
                    )
            elif "SampleData[SequencesWithQuality]" in str(snakemake.params.datatype):
                if "R1" in file:
                    sample_info.loc[sample_info.index[i], "path1"] = (
                        DATA_PATH + "/" + file
                    )
        else:
            continue
    i = i + 1
sample_info.to_csv(snakemake.output.sample_info, sep=",", mode="a")
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import os
import pandas as pd
import shutil
from os.path import isfile, join
from os import path, getcwd, listdir
from os import mkdir
from shutil import copy, copy2
from datetime import date, datetime
import sys


sys.stderr = open(snakemake.log[0], "w")
# Function to extract results from the qiime2 artifacts, already saved in a zip file

"""print("results/2022-02-24/visual/unzipped/")
os.walk("results/2022-02-24/visual/unzipped/")
for x in os.walk("results/2022-02-24/visual/unzipped/"):
    print(x[0])"""
# Reading the zip-files from the directory
dirlist = os.listdir(str(snakemake.input))
# Getting the subdirectories of the files
subdir = []
i = 0
while i < len(dirlist):
    directory = str(snakemake.input) + "/" + dirlist[i] + "/"
    subdir.append(directory)
    i = i + 1
"""z = 0
while z < len(subdir):
    if path.isdir(subdir[z]):
        dir = os.listdir(subdir[z])
        print(dir)
        subdir[z] = subdir[z] + dir[0] + "/"
        print(subdir[z])
        z = 1 + z
    else:
        z = 1 + z"""
# Iterating through the different subdirectories and copying the important information into the output directory
b = 0
path_name = str(snakemake.output)
print(path_name)
plot_name = path_name.split("/")[-1]
while b < len(subdir):
    datadir = subdir[b] + "data/"
    if plot_name == subdir[b].split("/")[-2]:
        html = datadir
        shutil.copytree(html, str(snakemake.output))
    b = b + 1
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import os
import shutil
import gzip
import zipfile
from os.path import isfile, join
from os import path, getcwd, listdir
from os import mkdir
from shutil import copy, copy2
from datetime import date, datetime
import sys


sys.stderr = open(snakemake.log[0], "w")

file = str(snakemake.input)
name = os.path.splitext(file)[0]
shutil.copy(file, name + ".zip")
zipped_file = name + ".zip"

with zipfile.ZipFile(zipped_file, "r") as zip_ref:
    name = zipped_file.split("/")[-1]
    new_dir = str(snakemake.params.between) + "/" + name
    zip_ref.extractall(os.path.splitext(new_dir)[0] + "/")
directory = os.listdir(str(snakemake.params.between))
subdir = os.listdir(str(snakemake.params.between) + "/" + directory[0])
orig_dir = str(snakemake.params.between) + "/" + directory[0] + "/" + subdir[0]
new_dir = str(snakemake.params.between) + "/" + directory[0]
print(new_dir)
dir_name = new_dir.split("/")[-1]
for f in os.listdir(orig_dir):
    path = orig_dir + "/" + f
    print(path)
    unzipped_path = str(snakemake.params.between) + "/" + dir_name + "/" + f
    shutil.move(path, unzipped_path)
os.rmdir(orig_dir)

dirlist = os.listdir(str(snakemake.params.between))
subdir = []
i = 0
while i < len(dirlist):
    directory = str(snakemake.params.between) + "/" + dirlist[i] + "/"
    subdir.append(directory)
    i = i + 1

b = 0
while b < len(subdir):
    datadir = subdir[b] + "/" + "data/"
    if "human-count" in subdir[b]:
        html = datadir
        shutil.copytree(html, snakemake.output.human_count)
    b += 1
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import os
import pandas as pd
import shutil
from os.path import isfile, join
from os import path, getcwd, listdir
from os import mkdir
from shutil import copy, copy2
from datetime import date, datetime
import sys


sys.stderr = open(snakemake.log[0], "w")

# Function to extract results from the qiime2 artifacts, already saved in a zip file

"""print("results/2022-02-24/visual/unzipped/")
os.walk("results/2022-02-24/visual/unzipped/")
for x in os.walk("results/2022-02-24/visual/unzipped/"):
    print(x[0])"""
# Reading the zip-files from the directory
dirlist = os.listdir(str(snakemake.input))
# Getting the subdirectories of the files
subdir = []
i = 0
while i < len(dirlist):
    directory = str(snakemake.input) + "/" + dirlist[i] + "/"
    subdir.append(directory)
    i = i + 1
"""z = 0
while z < len(subdir):
    if path.isdir(subdir[z]):
        dir = os.listdir(subdir[z])
        print(dir)
        subdir[z] = subdir[z] + dir[0] + "/"
        print(subdir[z])
        z = 1 + z
    else:
        z = 1 + z"""
# Iterating through the different subdirectories and copying the important information into the output directory
b = 0
while b < len(subdir):
    datadir = subdir[b] + "data/"
    if "beta-rarefaction" in subdir[b]:
        svg = datadir + "heatmap.svg"
        shutil.copy(svg, snakemake.output.beta_svg)
        html = datadir
        shutil.copytree(html, snakemake.output.beta_html)
    elif "heatmap" in subdir[b] and "gneiss" not in subdir[b]:
        svg = datadir + "feature-table-heatmap.svg"
        shutil.copy(svg, snakemake.output.heatmap)
    elif "taxonomy" in subdir[b]:
        tsv = datadir + "metadata.tsv"
        shutil.copy(tsv, snakemake.output.taxonomy_tsv)
    elif "taxa-bar-plots" in subdir[b]:
        html = datadir
        shutil.copytree(html, snakemake.output.taxa_barplot)
    elif "heatmap_gneiss" in subdir[b]:
        svg = datadir + "heatmap.svg"
        shutil.copy(svg, snakemake.output.gneiss)
    if "alpha-rarefaction" in subdir[b]:
        html = datadir
        shutil.copytree(html, snakemake.output.alpha_html)
    if "paired-seqs" in subdir[b]:
        html = datadir
        shutil.copytree(html, snakemake.output.paired_seqs)
    if "fastq_stats" in subdir[b]:
        html = datadir
        shutil.copytree(html, snakemake.output.fastq_stats)
    if "demux-joined-filter-stats" in subdir[b]:
        html = datadir
        shutil.copytree(html, snakemake.output.demux_filter_stats)
    if "dada2-stats" in subdir[b]:
        html = datadir
        shutil.copytree(html, snakemake.output.dada2)
    b = b + 1
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import os
import pandas as pd
import shutil
from os.path import isfile, join
from os import path, getcwd, listdir
from os import mkdir
from shutil import copy, copy2
from datetime import date, datetime
import sys


sys.stderr = open(snakemake.log[0], "w")

# Reading the zip-files from the directory
dirlist = os.listdir(str(snakemake.input))
# Getting the subdirectories of the files
subdir = []
i = 0
while i < len(dirlist):
    directory = str(snakemake.input) + "/" + dirlist[i] + "/"
    subdir.append(directory)
    i = i + 1

# Iterating through the different subdirectories and copying the important information into the output directory
b = 0
path_name = str(snakemake.output)
print(path_name)
plot_name = path_name.split("/")[-1]
while b < len(subdir):
    datadir = subdir[b] + "data/"
    if plot_name == subdir[b].split("/")[-2]:
        html = datadir
        shutil.copytree(html, str(snakemake.output))
    b = b + 1
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import pandas as pd
import os
import sys


sys.stderr = open(snakemake.log[0], "w")

differentials_df = pd.read_csv(
    str(snakemake.params), header=0, index_col="featureid", sep="\t"
)
taxa_df = pd.read_csv(str(snakemake.input.taxa), header=1, sep="\t")
taxa_small = taxa_df[["#OTU ID", "taxonomy"]].copy()
taxa_small_new = taxa_small.set_index(["#OTU ID"])
diff_taxa_df = pd.concat([differentials_df, taxa_small_new], axis=1)
diff_taxa_df_new = diff_taxa_df.set_index(["taxonomy"])
diff_taxa_df_new.to_csv(str(snakemake.output.diff), sep="\t", index=True)
taxa_small_new.rename_axis("featurue-id", inplace=True)
taxa_small_new.to_csv(str(snakemake.output.feature_meta), sep="\t", index=True)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import pandas as pd
import sys


sys.stderr = open(snakemake.log[0], "w")

fasta = open(str(snakemake.input), "r")
fasta_upper_file = open(str(snakemake.output), "w")
print("read!")
for x in fasta.read():
    fasta_upper = x.upper()
    fasta_upper_file.write(fasta_upper)
print("ready")
fasta.close()
fasta_upper_file.close()
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import os
import shutil
import sys
import pandas as pd
import seaborn as sb
import numpy as np
import matplotlib.pyplot as plt

sys.stderr = open(snakemake.log[0], "w")

directories = os.listdir(str(snakemake.input))

i = 0
while i < len(directories):
    output_name = os.path.split(str(snakemake.output))[1]
    name = output_name.split(".")[0]
    if name in directories[i]:
        df = pd.read_csv(
            str(snakemake.input)
            + "/"
            + directories[i]
            + "/data"
            + "/distance-matrix.tsv",
            sep="\t",
            header=0,
            index_col=0,
        )
        matrix = np.triu(df)
        fig, ax = plt.subplots(figsize=(20, 15))
        heatmap = sb.heatmap(
            df, mask=matrix, cbar_kws={"label": "Distance"}, annot=True, fmt=".1f"
        )
        heatmap.set(xlabel="Samplename", ylabel="Samplename")
        figure = heatmap.get_figure()
        figure.savefig(str(snakemake.output), dpi=400)
    i = i + 1
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import os
import shutil
import sys
import zipfile

sys.stderr = open(snakemake.log[0], "w")

zipped_files = []
i = 0
os.mkdir(str(snakemake.output))
while i < len(snakemake.input):
    file = str(snakemake.input[i])
    name_path = os.path.splitext(file)[0]
    name = os.path.split(name_path)[1]
    output_path = str(snakemake.output) + "/" + name + ".zip"
    shutil.copy2(file, output_path)
    zipped_files.append(output_path)
    i += 1

j = 0
while j < len(zipped_files):
    with zipfile.ZipFile(zipped_files[j], "r") as zip_ref:
        name = zipped_files[j].split("/")[-1]
        print(name)
        new_dir = str(snakemake.output) + "/" + name
        print(new_dir)
        zip_ref.extractall(os.path.splitext(new_dir)[0] + "/")
        os.remove(new_dir)
    j = j + 1

directory = os.listdir(str(snakemake.output))
print(directory)

z = 0
while z < len(directory):
    subdir = os.listdir(str(snakemake.output) + "/" + directory[z])
    orig_dir = str(snakemake.output) + "/" + directory[z] + "/" + subdir[0]
    new_dir = str(snakemake.output) + "/" + directory[z]
    for f in os.listdir(orig_dir):
        path = orig_dir + "/" + f
        shutil.move(path, new_dir)
    os.rmdir(orig_dir)
    z = z + 1
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import pandas as pd
import gzip
import shutil
import os
import zipfile
import sys


sys.stderr = open(snakemake.log[0], "w")

# Extracting the number of total features over all samples from the sample-table.
# Multiplying the number with the relative abundance filtering value to create a threshold for the
# qiime filtering.

# Reading the sample-table, creating a zip file
file = str(snakemake.input)
name = os.path.splitext(file)[0]
shutil.copy(file, name + ".zip")
filename = name + ".zip"
# Extract zip files to folder
with zipfile.ZipFile(filename, "r") as zip_ref:
    name = filename.split("/")[-1]
    dir_name = os.path.dirname(str(snakemake.input))
    new_dir = dir_name + "/" + name
    zip_ref.extractall(os.path.splitext(new_dir)[0] + "/")
name = name.split(".")[0]
directory = os.path.dirname(str(snakemake.input)) + "/" + name
# Moving the folder inventory one folder up
b = 0
subdir = os.listdir(directory)
while b < len(subdir):
    orig_dir = directory + "/" + subdir[b]
    new_dir = directory
    for f in os.listdir(orig_dir):
        path = orig_dir + "/" + f
        shutil.move(path, new_dir)
    b = b + 1
# Read the specific csv holding the information, creating a dataframe, adding up all feature frequencies
datadir = str(snakemake.output.feature_table) + "/"
csv = datadir + "sample-frequency-detail.csv"
frequency = pd.read_csv(csv, header=0, delimiter=",")
number = frequency["0"].sum()
# Creating the abundance threshold and storing it in an output file
abundance = float(str(snakemake.params))
endnumber = number * abundance
endnumber = int(endnumber)
if endnumber == 0:
    endnumber += 1
with open(str(snakemake.output.abundance), "w") as f:
    f.write(str(endnumber))
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import pandas as pd
import gzip
import shutil
import os
import zipfile
import sys


sys.stderr = open(snakemake.log[0], "w")

# Renaming the qiime2 artifacts for viewing to zip files so the information can be accessed

# Iterating over the input files and renaming them to zip
i = 0
zipped_files = []
while i < len(snakemake.input):
    file = snakemake.input[i]
    name = os.path.splitext(file)[0]
    shutil.copy(file, name + ".zip")
    zipped_files.append(name + ".zip")
    i = i + 1
z = 0
# opening the zip files and saving them to the output directory
while z < len(zipped_files):
    with zipfile.ZipFile(zipped_files[z], "r") as zip_ref:
        name = zipped_files[z].split("/")[-1]
        new_dir = str(snakemake.output) + "/" + name
        zip_ref.extractall(os.path.splitext(new_dir)[0] + "/")
        os.remove(zipped_files[z])
    z = z + 1
directory = os.listdir(str(snakemake.output))
# Iterating through the directory holding the unzipped files, moving the file content one folder up.
# Folder created while unzipping can be removed, because it gives no further information
j = 0
while j < len(directory):
    subdir = os.listdir(str(snakemake.output) + "/" + directory[j])
    orig_dir = str(snakemake.output) + "/" + directory[j] + "/" + subdir[0]
    new_dir = str(snakemake.output) + "/" + directory[j]
    for f in os.listdir(orig_dir):
        path = orig_dir + "/" + f
        shutil.move(path, new_dir)
    os.rmdir(orig_dir)
    j = j + 1
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import pandas as pd
import os
import sys


sys.stderr = open(snakemake.log[0], "w")

# Renaming the headers in the taxonomy.tsv file, so that they match with the headers of the table-biom file.
# When the headers are equal, the files can be merged into one file.

taxonomy_file = str(snakemake.input) + "/" + "taxonomy.tsv"
print(os.listdir(str(snakemake.input))[0])
taxonomy_df = pd.read_csv(taxonomy_file, delimiter="\t", header=0)
taxonomy_df.rename(
    columns={"Feature ID": "#OTU ID", "Taxon": "taxonomy", "Consensus": "confidence"},
    inplace=True,
)
taxonomy_df.to_csv(str(snakemake.output), sep="\t", index=False)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import pandas as pd

sys.stderr = open(snakemake.log[0], "w")

whuman = str(snakemake.params.visual_wh)
wohuman = str(snakemake.params.visual_woh)

whuman_df = pd.read_csv(whuman, sep=",", header=0, index_col=0)
wohuman_df = pd.read_csv(wohuman, sep=",", header=0, index_col=0)

whuman_df.rename(columns={"0": "whuman"}, inplace=True)
wohuman_df.rename(columns={"0": "wohuman"}, inplace=True)

combined = pd.concat([whuman_df, wohuman_df], axis=1)

for sample in combined.index:
    if combined.at[sample, "whuman"] == combined.at[sample, "wohuman"]:
        combined.drop([sample], inplace=True)
# combined.drop_duplicates(subset = ["whuman", "wohuman"], inplace = True)

for sample in combined.index:
    difference = combined.at[sample, "whuman"] - combined.at[sample, "wohuman"]
    combined.at[sample, "difference"] = difference
combined.index.name = "Sample"
combined.rename(
    columns={"whuman": "Features with human", "wohuman": "Features without human"},
    inplace=True,
)
combined.to_csv(str(snakemake.output))
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
import sys
from pathlib import Path
import oyaml as yaml
from prettytable import PrettyTable
import argparse
import os

"""

Example :

Input YAML :
~~~~~~~~~~
metadata:
   name: nginx-deployment
    labels:
      app: nginx

Output Table :
~~~~~~~~~~~~
metadata:
+--------+------------------+----------+-----------------------------------------------------------+
| Field  | Example Value    | Required | Description                                               |
+--------+------------------+----------+-----------------------------------------------------------+
| name   | nginx-deployment |    --    | Lorem ipsum dolor sit amet, consecteteur adipiscing elit. |
| labels | --               |    --    | Lorem ipsum.                                              |
|   app  | nginx            |    --    | Lorem ipsum.                                              |
+--------+------------------+----------+-----------------------------------------------------------+	    

"""

# parser = argparse.ArgumentParser(description='YAML file to (HTML) table converter',
#                epilog='text table will be printed as STDOUT - html table will be save in html file ')
# parser.add_argument('--inputFile', dest='inputfile', required=True, help="input yaml file to process")
# parser.add_argument('--out', dest='format', choices=['txt', 'html', 'text'], help="convert yaml to text table or html "
#                                                                                  "table")
# parser.add_argument('--outpath', dest='outputfile', required=True, help="path to the output file")
# args = parser.parse_args()

sys.stderr = open(snakemake.log[0], "w")

outputFmt = "html"
INPUT_YAML = str(snakemake.input)

if outputFmt == "text" or outputFmt == "txt":
    PRINT_HTML = False
else:
    PRINT_HTML = True

in_file = Path(INPUT_YAML)
if not in_file.is_file():
    sys.exit("Input file [" + INPUT_YAML + "] does not exists")

SPACE_CHAR = "~"
OUTPUT_HTMl = str(snakemake.output)

CSS_TEXT = """
        <html>
        <head>
          <meta name="viewport" content="width=device-width, initial-scale=1">
          <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css">
          <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
          <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/js/bootstrap.min.js"></script>
          <style>
        body
        {
            padding-left: 20px;
        }
        th:nth-child(1) {
          width: 200px;
          }

        /* the second */
        th:nth-child(2) {
          width: 200px;
        }

        /* the third */
        th:nth-child(3) {
          width: 100px;
         }
         /* the third */
         th:nth-child(4) {
          width: 420px;
         }

         pre {
            white-space: -moz-pre-wrap; /* Mozilla, supported since 1999 */
            white-space: -pre-wrap; /* Opera */
            white-space: -o-pre-wrap; /* Opera */
            white-space: pre-wrap; /* CSS3 - Text module (Candidate Recommendation) http://www.w3.org/TR/css3-text/#white-space */
            word-wrap: break-word; /* IE 5.5+ */
            width: 725px
         }
          </style>
        </head>
        """


def listToString(inList):
    """Convert list to String"""
    ret = ""
    for line in inList:
        ret = ret + line
    return ret


def printDic(inDictionary, inPTable, indent):
    """
    Iterate over Dictionary
        If needed call same function again (recursively) until we key : value dictionary
        Add key , value , isItRequired , description to pretty table object (inPTable)
    """
    global SPACE_CHAR  # No space character that will be replaced when we print this table to text/html

    # Go ver dictionary
    for item in inDictionary:
        if isinstance(
            item, dict
        ):  # If it again dictionary call same function with this new dictionary
            inPTable.add_row([SPACE_CHAR, SPACE_CHAR])
            printDic(item, inPTable, indent)
        else:
            # Two way to get next item based on input type
            if isinstance(inDictionary, dict):
                moreStuff = inDictionary.get(item)
            elif isinstance(inDictionary, list):
                # If it simple array/list we just print all it's value and we are done
                for _item in inDictionary:
                    inPTable.add_row([indent + _item, SPACE_CHAR + SPACE_CHAR])
                break

            # if it is dictionary or list process them accordingly
            if isinstance(moreStuff, dict):
                inPTable.add_row([indent + item, SPACE_CHAR + SPACE_CHAR])
                printDic(moreStuff, inPTable, SPACE_CHAR + SPACE_CHAR + indent)
            elif isinstance(moreStuff, list):
                # If we are not in nested call (as indent is empty string) we add one extra row in table (for clarity)
                # if indent is "":
                #    inPTable.add_row([SPACE_CHAR, SPACE_CHAR, SPACE_CHAR])
                #
                # inPTable.add_row([indent + item, "", ""])
                for dicInDic in moreStuff:
                    if dicInDic is not None:
                        if isinstance(dicInDic, dict):
                            printDic(
                                dicInDic,
                                inPTable,
                                SPACE_CHAR
                                + SPACE_CHAR
                                + SPACE_CHAR
                                + SPACE_CHAR
                                + indent,
                            )
                        else:
                            inPTable.add_row([indent + item, dicInDic])

            else:
                # Most of the call will end-up eventually here -
                # this will print - key,value,isItRequired, Lorem ipsum (description)
                inPTable.add_row([indent + item, inDictionary[item]])


"""
    Read given yaml file 
       process each to level element build table (HTML) out of it and print it console/file 
"""
with open(INPUT_YAML) as file:
    # The FullLoader parameter handles the conversion from YAML
    # scalar values to Python the dictionary format
    yaml_file_object = yaml.load(file, Loader=yaml.FullLoader)

    if PRINT_HTML:
        html_st = []
        f = open(OUTPUT_HTMl, "w")
        html_st.append(CSS_TEXT)
    i = 0
    for key in yaml_file_object:
        body_st = []
        prettyTable = PrettyTable()

        prettyTable.field_names = ["Field", "Example Value"]

        if not PRINT_HTML:
            prettyTable.align["Field"] = "l"
            prettyTable.align["Example Value"] = "l"

        if isinstance(yaml_file_object, list):
            dic = yaml_file_object[i]
            i += 1
        elif isinstance(yaml_file_object, dict):
            dic = yaml_file_object.get(key)

        if isinstance(dic, dict) or isinstance(dic, list):
            printDic(dic, prettyTable, "")
            if isinstance(yaml_file_object, dict):
                yaml_snippet = yaml.dump({key: dic})

            else:
                yaml_snippet = yaml.dump(dic)

        else:
            prettyTable.add_row([key, dic])
            yaml_snippet = yaml.dump({key: dic})

        if isinstance(yaml_file_object, dict):
            if PRINT_HTML:
                body_st.append("<h2>" + key + "</h2>")
            else:
                print("=> " + key + ":")

        table = prettyTable.get_html_string(
            attributes={
                "name": key,
                "id": key,
                "class": "table table-striped table-condensed",
                "style": "width: 1450px;table-layout: fixed;overflow-wrap: "
                "break-word;",
            }
        )
        table = table.replace(SPACE_CHAR, "&nbsp;")
        body_st.append(table)

        if PRINT_HTML:
            html_st.append(" ".join(body_st))
        else:
            print(str(prettyTable).replace(SPACE_CHAR, " "))

    if PRINT_HTML:
        html_st.append("</html>")
        f.write(" ".join(html_st))
        f.close()
        print("File " + OUTPUT_HTMl + " has been generated")
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "[email protected]"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
# Set this to False if multiqc should use the actual input directly
# instead of parsing the folders where the provided files are located
use_input_files_only = snakemake.params.get("use_input_files_only", False)

if not use_input_files_only:
    input_data = set(path.dirname(fp) for fp in snakemake.input)
else:
    input_data = set(snakemake.input)

output_dir = path.dirname(snakemake.output[0])
output_name = path.basename(snakemake.output[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "multiqc"
    " {extra}"
    " --force"
    " -o {output_dir}"
    " -n {output_name}"
    " {input_data}"
    " {log}"
)
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "[email protected]"
__license__ = "MIT"


from os import path
import re
from tempfile import TemporaryDirectory

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)


def basename_without_ext(file_path):
    """Returns basename of file path, without the file extension."""

    base = path.basename(file_path)
    # Remove file extension(s) (similar to the internal fastqc approach)
    base = re.sub("\\.gz$", "", base)
    base = re.sub("\\.bz2$", "", base)
    base = re.sub("\\.txt$", "", base)
    base = re.sub("\\.fastq$", "", base)
    base = re.sub("\\.fq$", "", base)
    base = re.sub("\\.sam$", "", base)
    base = re.sub("\\.bam$", "", base)

    return base


# Run fastqc, since there can be race conditions if multiple jobs
# use the same fastqc dir, we create a temp dir.
with TemporaryDirectory() as tempdir:
    shell(
        "fastqc {snakemake.params} -t {snakemake.threads} "
        "--outdir {tempdir:q} {snakemake.input[0]:q}"
        " {log}"
    )

    # Move outputs into proper position.
    output_base = basename_without_ext(snakemake.input[0])
    html_path = path.join(tempdir, output_base + "_fastqc.html")
    zip_path = path.join(tempdir, output_base + "_fastqc.zip")

    if snakemake.output.html != html_path:
        shell("mv {html_path:q} {snakemake.output.html:q}")

    if snakemake.output.zip != zip_path:
        shell("mv {zip_path:q} {snakemake.output.zip:q}")
ShowHide 141 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/IKIM-Essen/16S
Name: 16s
Version: v0.5.0
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...