16S pipeline using qiime2 created with snakemake

public 1yr ago Version: v0.5.0 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Qiime2 workflow for 16S analysis created with snakemake.

Authors

Ann-Kathrin Brüggemann (@AKBrueggemann)
Thomas Battenfeld (@thomasbtf)

Usage

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).

Step 1: Obtain a copy of this workflow

If you want to add your own changes to the workflow, create a GitHub repository of your own, then clone this one.

Create a new github repository using this workflow as a template .
Clone the newly created repository to your local system, into the place where you want to perform the data analysis.

If you just want to use this workflow locally, then simply clone it or download it as zip-file.

When you have the folder structure added on your local machine, please add a "data" folder manually.

Step 2: Configure workflow

Configure the workflow according to your needs via editing the files in the config/ folder. Adjust config.yaml to configure the workflow execution, and metadata.txt to specify your sample setup.

Some important parameters you should check and set according to your own fatsq-files in the config.yaml are primers for the forward and reverse reads, the datatype , that should be used by QIIME2 and the min-seq-length . Based on the sequencing, the length of the reads can vary.

The default parameters for filtering and truncation were validated with the help of a MOCK community and fitted to retrieve all bacteria from that community.

In addition to that, you need to fit the metadata-parameters to your data. Please change the names of the used metadata-columns according to your information.

If your metadata is not containing numeric values, please use the "reduced-analysis" option in the config file to run the workflow, as the workflow is currently not able to run only on categorical metadata for the full analysis version. We are going to fix that in the future.

The workflow is able to perform clustering and denoising either with vsearch, leading to OTU creation, or with DADA2, creating ASVs. You can decide which modus to use by setting the variable "DADA2" to True (DADA2 usage) or False (vsearch).

Please make sure, that the names of your fastq files are correctly formatted. They should look like this:

samplename_SNumber_Lane_R1/R2_001.fastq.gz

Step 3: Install Snakemake

Create a snakemake environment using mamba via:

mamba create -c conda-forge -c bioconda -n snakemake snakemake

For installation details, see the instructions in the Snakemake documentation .

Step 4: Execute workflow

Activate the conda environment:

conda activate snakemake

Fill up the metadata.txt with the information of your samples:

Please be careful to not include spaces between the commas. If there is a column, that you don't have any information about, please leave it empty and simply 
go on with the next column.

Test your configuration by performing a dry-run via

snakemake --use-conda -n

Executing the workflow takes two steps:

Data preparation: snakemake --cores $N --use-conda data_prep
Workflow execution: snakemake --cores $N --use-conda

using $N cores.

Step 5: Investigate results

After successful execution, the workflow provides you with a compressed folder, holding all interesting results ready to decompress or to download to your local machine. The compressed file 16S-report.tar.gz holds several qiime2-artifacts that can be inspected via qiime-view. In the zipped folder report.zip is the snakemake html report holding graphics as well as the DAG of the executed jobs and html files leading you directly to the qiime2-results, without the need of using qiime-view.

This report can, e.g., be forwarded to your collaborators.

Step 6: Commit changes

Whenever you change something, don't forget to commit the changes back to your github copy of the repository:

git commit -a
git push

Step 7: Obtain updates from upstream

Whenever you want to synchronize your workflow copy with new developments from upstream, do the following.

Once, register the upstream repository in your local copy: git remote add -f upstream [email protected]:snakemake-workflows/16S.git or git remote add -f upstream https://github.com/snakemake-workflows/16S.git if you do not have setup ssh keys.
Update the upstream version: git fetch upstream .
Create a diff with the current version: git diff HEAD upstream/master workflow > upstream-changes.diff .
Investigate the changes: vim upstream-changes.diff .
Apply the modified diff via: git apply upstream-changes.diff .
Carefully check whether you need to update the config files: git diff HEAD upstream/master config . If so, do it manually, and only where necessary, since you would otherwise likely overwrite your settings and samples.

Step 8: Contribute back

In case you have also changed or added steps, please consider contributing them back to the original repository:

Fork the original repo to a personal or lab account.
Clone the fork to your local system, to a different place than where you ran your analysis.
Copy the modified files from your analysis to the clone of your fork, e.g., cp -r workflow path/to/fork . Make sure to not accidentally copy config file contents or sample sheets. Instead, manually update the example config files if necessary.
Commit and push your changes to your fork.
Create a pull request against the original repository.

Testing

Test cases are in the subfolder .test . They are automatically executed via continuous integration with Github Actions .

Tools

A list of the tools used in this pipeline:

Tool	Link
QIIME2	www.doi.org/10.1038/s41587-019-0209-9
Snakemake	www.doi.org/10.12688/f1000research.29032.1
FastQC	www.bioinformatics.babraham.ac.uk/projects/fastqc
MultiQC	www.doi.org/10.1093/bioinformatics/btw354
pandas	pandas.pydata.org
kraken2	www.doi.org/10.1186/s13059-019-1891-0
vsearch	www.github.com/torognes/vsearch
DADA2	www.doi.org/10.1038/nmeth.3869
songbird	www.doi.org/10.1038/s41467-019-10656-5
bowtie2	www.doi.org/10.1038/nmeth.1923
Ancom	www.doi.org/10.3402/mehd.v26.27663

Code Snippets

shell:
    "mkdir {output.dirc} \n"
    "bowtie2-build {input} {output.dirc}/{params.filename}"

SnakeMake Bowtie 2 From line 15 of rules/bowtie.smk

shell:
    "bowtie2 -p 8 -x {input.db}/bowtie_host_DB "
    "-1 {input.read1} "
    "-2 {input.read2} "
    "-S {output} "
    "2> {log} "

SnakeMake Bowtie 2 From line 32 of rules/bowtie.smk

shell:
    "bowtie2 -x {input.db}/bowtie_host_DB {input.read1} -S {output} "
    "2> {log} "

SnakeMake Bowtie 2 From line 51 of rules/bowtie.smk

shell:
    "samtools view -bS {input} > {output} "
    "2> {log} "

SnakeMake SAMtools From line 64 of rules/bowtie.smk

shell:
    "samtools view -b -f 12 -F 256 "
    "{input} > {output} "
    "2> {log} "

SnakeMake SAMtools From line 79 of rules/bowtie.smk

shell:
    "samtools sort -n -m 5G -@ 2 {input} -o {output.sorted} 2> {log} \n"
    "samtools fastq -@ 8 {output.sorted} "
    "-1 {output.read1} "
    "-2 {output.read2} "
    "-0 /dev/null -s /dev/null -n "
    "2> {log} "

SnakeMake SAMtools From line 95 of rules/bowtie.smk

shell:
    "samtools view -b -F 256 "
    "{input} > {output} "
    "2> {log} "

SnakeMake SAMtools From line 114 of rules/bowtie.smk

shell:
    "samtools sort -n -m 5G -@ 2 {input} -o {output.sorted} 2> {log} \n"
    "samtools fastq -0 {output.read1} {output.sorted} "
    "2> {log} "

SnakeMake SAMtools From line 129 of rules/bowtie.smk

shell:
    "samtools view -F 4 -c {input} > {output} "
    "2> {log} "

SnakeMake SAMtools From line 143 of rules/bowtie.smk

script:
    "../scripts/bowtie_frequency.py"

SnakeMake From line 163 of rules/bowtie.smk

shell:
    "qiime feature-table filter-features "
    "--i-table {input.table} "
    "--m-metadata-file {input.direc}/chimeras.qza "
    "--p-no-exclude-ids "
    "--o-filtered-table {output.table} \n"
    "qiime feature-table filter-seqs "
    "--i-data {input.seqs} "
    "--m-metadata-file {input.direc}/chimeras.qza "
    "--p-no-exclude-ids "
    "--o-filtered-data {output.seqs}"

SnakeMake QIIME2.0 From line 13 of rules/chimera_checking.smk

shell:
    "qiime feature-classifier classify-consensus-vsearch "
    "--i-query {input.query} "
    "--i-reference-reads {input.reference_reads} "
    "--i-reference-taxonomy {input.reference_taxonomy} "
    "--p-maxaccepts {params.maxaccepts} "
    "--p-maxrejects {params.maxrejects} "
    "--p-perc-identity {params.perc_identity} "
    "--p-threads 4 "
    "--o-classification {output} "
    "--verbose"

SnakeMake QIIME2.0 From line 41 of rules/chimera_checking.smk

shell:
    "qiime taxa barplot "
    "--i-table {input.table} "
    "--i-taxonomy {input.taxonomy} "
    "--m-metadata-file config/pep/sample.tsv "
    "--o-visualization {output}"

SnakeMake QIIME2.0 From line 64 of rules/chimera_checking.smk

shell:
    "qiime vsearch dereplicate-sequences "
    "--i-sequences {input} "
    "--o-dereplicated-table {output.table} "
    "--o-dereplicated-sequences {output.seqs} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 VSEARCH From line 13 of rules/classification.smk

shell:
    "qiime vsearch cluster-features-de-novo "
    "--i-table {input.table} "
    "--i-sequences {input.seqs} "
    "--p-perc-identity {params.perc_identity} "
    "--p-threads {params.threads} "
    "--o-clustered-table {output.table} "
    "--o-clustered-sequences {output.seqs} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 VSEARCH From line 34 of rules/classification.smk

shell:
    "qiime feature-classifier classify-consensus-vsearch "
    "--i-query {input.query} "
    "--i-reference-reads {input.reference_reads} "
    "--i-reference-taxonomy {input.reference_taxonomy} "
    "--p-maxaccepts {params.maxaccepts} "
    "--p-maxrejects {params.maxrejects} "
    "--p-perc-identity {params.perc_identity} "
    "--p-query-cov {params.query_cov} "
    "--p-min-consensus {params.min_consensus} "
    "--p-threads {params.threads} "
    "--o-classification {output.tax} "
    "--o-search-results {output.search} "
    "--verbose 2> {log} "

SnakeMake QIIME2.0 From line 64 of rules/classification.smk

shell:
    "qiime phylogeny align-to-tree-mafft-fasttree "
    "--i-sequences {input} "
    "--o-alignment {output.alignment} "
    "--o-masked-alignment {output.masked_alignment} "
    "--o-tree {output.tree} "
    "--o-rooted-tree {output.rooted_tree} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 92 of rules/classification.smk

shell:
    "qiime diversity core-metrics-phylogenetic "
    "--i-phylogeny {input.phylogeny} "
    "--i-table {input.table} "
    "--p-sampling-depth {params.depth} "
    "--m-metadata-file {input.metadata} "
    "--output-dir {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 115 of rules/classification.smk

shell:
    "qiime repeat-rarefy repeat-rarefy "
    "--i-table {input.table} "
    "--p-sampling-depth {params.sampling_depth} "
    "--p-repeat-times {params.repeats} "
    "--o-rarefied-table {output.table} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 137 of rules/classification.smk

shell:
    "qiime quality-filter q-score "
    "--i-demux {input} "
    "--p-min-quality {params.min_quality} "
    "--p-min-length-fraction {params.min_length_frac} "
    "--p-max-ambiguous {params.max_ambig} "
    "--o-filtered-sequences {output.filtering} "
    "--o-filter-stats {output.stats} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 21 of rules/filtering.smk

shell:
    "qiime quality-filter q-score "
    "--i-demux {input} "
    "--p-min-quality {params.min_quality} "
    "--p-min-length-fraction {params.min_length_frac} "
    "--p-max-ambiguous {params.max_ambig} "
    "--o-filtered-sequences {output.filtering} "
    "--o-filter-stats {output.stats} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 52 of rules/filtering.smk

shell:
    "qiime vsearch uchime-denovo "
    "--i-table {input.table} "
    "--i-sequences {input.seqs} "
    "--p-minh {params.minh} "
    "--output-dir {output.direc} \n"
    "qiime feature-table filter-features "
    "--i-table {input.table} "
    "--m-metadata-file {output.direc}/chimeras.qza "
    "--p-exclude-ids "
    "--o-filtered-table {output.table} \n"
    "qiime feature-table filter-seqs "
    "--i-data {input.seqs} "
    "--m-metadata-file {output.direc}/chimeras.qza "
    "--p-exclude-ids "
    "--o-filtered-data {output.seqs} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 VSEARCH From line 79 of rules/filtering.smk

shell:
    "qiime feature-table filter-seqs "
    "--i-data {input.seq} "
    "--m-metadata-file {input.seq} "
    "--p-where 'length(sequence) > {params.min_length}' "
    "--o-filtered-data {output.seq} \n"
    "qiime feature-table filter-features "
    "--i-table {input.table} "
    "--m-metadata-file {output.seq} "
    "--o-filtered-table {output.table} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 110 of rules/filtering.smk

shell:
    "qiime dada2 denoise-paired "
    "--i-demultiplexed-seqs {input} "
    "--p-trunc-len-f {params.trunc_len_f} "
    "--p-trunc-len-r {params.trunc_len_r} "
    "--p-trim-left-f {params.trim_left_f} "
    "--p-trim-left-r  {params.trim_left_r} "
    "--p-max-ee-f {params.max_ee_f} "
    "--p-max-ee-r {params.max_ee_r} "
    "--p-trunc-q {params.trunc_q} "
    "--p-min-overlap {params.min_overlap} "
    "--p-pooling-method {params.pooling_method} "
    "--p-chimera-method {params.chimera_method} "
    "--p-min-fold-parent-over-abundance {params.min_fold_parent_over_abundance} "
    "--p-n-threads {params.threads} "
    "--p-n-reads-learn {params.n_reads_learn} "
    "--o-table {output.table} "
    "--o-representative-sequences {output.seq} "
    "--o-denoising-stats {output.stats} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 155 of rules/filtering.smk

shell:
    "qiime dada2 denoise-paired "
    "--i-demultiplexed-seqs {input} "
    "--p-trunc-len {params.trunc_len_f} "
    "--p-trim-left {params.trim_left_f} "
    "--p-max-ee {params.max_ee_f} "
    "--p-trunc-q {params.trunc_q} "
    "--p-pooling-method {params.pooling_method} "
    "--p-chimera-method {params.chimera_method} "
    "--p-min-fold-parent-over-abundance {params.min_fold_parent_over_abundance} "
    "--p-n-threads {params.threads} "
    "--p-n-reads-learn {params.n_reads_learn} "
    "--o-table {output.table} "
    "--o-representative-sequences {output.seq} "
    "--o-denoising-stats {output.stats} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 202 of rules/filtering.smk

script:
    "../scripts/relative_abundance.py"

SnakeMake From line 237 of rules/filtering.smk

shell:
    "value=$(<{input.abundance}) \n"
    "echo $value \n"
    "qiime feature-table filter-features "
    "--i-table {input.table} "
    "--p-min-frequency $value "
    "--o-filtered-table {output.table} "
    "--verbose 2> {log} \n"
    "qiime feature-table filter-seqs "
    "--i-data {input.seqs} "
    "--i-table {output.table} "
    "--p-no-exclude-ids "
    "--o-filtered-data {output.seqs} "
    "--verbose 2> {log} "

SnakeMake QIIME2.0 From line 253 of rules/filtering.smk

shell:
    "qiime quality-control exclude-seqs "
    "--i-query-sequences {input.seq} "
    "--i-reference-sequences {input.ref_seq} "
    "--p-threads {params.threads} "
    "--p-perc-identity {params.perc_identity} "
    "--p-perc-query-aligned {params.perc_query_aligned} "
    "--o-sequence-hits {output.human_hit} "
    "--o-sequence-misses {output.seq} "
    "--verbose 2> {log} \n"
    "qiime feature-table filter-features "
    "--i-table {input.table} "
    "--m-metadata-file {output.seq} "
    "--o-filtered-table {output.table} "
    "--verbose 2> {log} "

SnakeMake QIIME2.0 From line 289 of rules/filtering.smk

shell:
    "qiime taxa collapse "
    "--i-table {input.table} "
    "--i-taxonomy {input.taxonomy} "
    "--p-level 6 "
    "--o-collapsed-table {output} "
    "--verbose 2> {log} "

SnakeMake QIIME2.0 From line 316 of rules/filtering.smk

shell:
    "qiime taxa filter-table "
    "--i-table {input.table} "
    "--i-taxonomy {input.taxonomy} "
    "--p-exclude mitochondria,chloroplast "
    "--o-filtered-table {output.table} "
    "--verbose 2> {log} \n"
    "qiime taxa filter-seqs "
    "--i-sequences {input.seq} "
    "--i-taxonomy {input.taxonomy} "
    "--p-exclude mitochondria,chloroplast "
    "--o-filtered-sequences {output.seq} "
    "--verbose 2> {log} "

SnakeMake QIIME2.0 From line 337 of rules/filtering.smk

shell:
    "qiime tools export "
    "--input-path {input.table} "
    "--output-path {output.table_biom} \n"
    "qiime tools export "
    "--input-path {input.taxonomy} "
    "--output-path {output.taxa_biom} \n"
    "qiime feature-table presence-absence "
    "--i-table {input.table} "
    "--o-presence-absence-table {output.table_binary} \n"
    "qiime tools export "
    "--input-path {output.table_binary} "
    "--output-path {output.binary_biom}"

SnakeMake QIIME2.0 From line 14 of rules/outputs_DADA2.smk

shell:
    "qiime metadata tabulate "
    "--m-input-file {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 38 of rules/outputs_DADA2.smk

script:
    "../scripts/rename_qzv.py"

SnakeMake From line 113 of rules/outputs_DADA2.smk

script:
    "../scripts/extract_reports.py"

SnakeMake From line 188 of rules/outputs_DADA2.smk

script:
    "../scripts/extract_beta_corr.py"

SnakeMake From line 209 of rules/outputs_DADA2.smk

script:
    "../scripts/extract_significance.py"

SnakeMake From line 230 of rules/outputs_DADA2.smk

script:
    "../scripts/extract_significance.py"

SnakeMake From line 251 of rules/outputs_DADA2.smk

script:
    "../scripts/extract_significance.py"

SnakeMake From line 272 of rules/outputs_DADA2.smk

script:
    "../scripts/extract_significance.py"

SnakeMake From line 291 of rules/outputs_DADA2.smk

script:
    "../scripts/extract_beta_corr.py"

SnakeMake From line 310 of rules/outputs_DADA2.smk

shell:
    "snakemake --nolock --report {output} --report-stylesheet resources/custom-stylesheet.css "
    "{params.for_testing} "
    "> {log} 2>&1"

SnakeMake Snakemake From line 393 of rules/outputs_DADA2.smk

shell:
    "tar -czvf {output} {params.directory} "

SnakeMake From line 413 of rules/outputs_DADA2.smk

script:
    "../scripts/yaml_to_table.py"

SnakeMake From line 430 of rules/outputs_DADA2.smk

shell:
    """
    mkdir results/{wildcards.date}/16S-report
    cp -r {input} results/{wildcards.date}/16S-report/
    tar -czvf results/{wildcards.date}/16S-report.tar.gz results/{wildcards.date}/16S-report/
    cp results/{wildcards.date}/16S-report.tar.gz {params.outpath}
    rm -r results/{wildcards.date}/16S-report
    """

SnakeMake From line 477 of rules/outputs_DADA2.smk

shell:
    "qiime tools export "
    "--input-path {input.table} "
    "--output-path {output.table_biom} \n"
    "qiime tools export "
    "--input-path {input.taxonomy} "
    "--output-path {output.taxa_biom} \n"
    "qiime feature-table presence-absence "
    "--i-table {input.table} "
    "--o-presence-absence-table {output.table_binary} \n"
    "qiime tools export "
    "--input-path {output.table_binary} "
    "--output-path {output.binary_biom}"

SnakeMake QIIME2.0 From line 14 of rules/outputs_vsearch.smk

script:
    "../scripts/rename_qzv.py"

SnakeMake From line 99 of rules/outputs_vsearch.smk

script:
    "../scripts/extract_reports.py"

SnakeMake From line 174 of rules/outputs_vsearch.smk

script:
    "../scripts/extract_beta_corr.py"

SnakeMake From line 195 of rules/outputs_vsearch.smk

script:
    "../scripts/extract_significance.py"

SnakeMake From line 216 of rules/outputs_vsearch.smk

script:
    "../scripts/extract_significance.py"

SnakeMake From line 237 of rules/outputs_vsearch.smk

script:
    "../scripts/extract_significance.py"

SnakeMake From line 258 of rules/outputs_vsearch.smk

script:
    "../scripts/extract_significance.py"

SnakeMake From line 277 of rules/outputs_vsearch.smk

script:
    "../scripts/extract_beta_corr.py"

SnakeMake From line 296 of rules/outputs_vsearch.smk

shell:
    "snakemake --nolock --report {output} --report-stylesheet resources/custom-stylesheet.css "
    "{params.for_testing} "
    "> {log} 2>&1"

SnakeMake Snakemake From line 380 of rules/outputs_vsearch.smk

shell:
    "tar -czvf {output} {params.directory} "

SnakeMake From line 400 of rules/outputs_vsearch.smk

script:
    "../scripts/yaml_to_table.py"

SnakeMake From line 417 of rules/outputs_vsearch.smk

shell:
    """
    mkdir results/{wildcards.date}/16S-report
    cp -r {input} results/{wildcards.date}/16S-report/
    tar -czvf results/{wildcards.date}/16S-report.tar.gz results/{wildcards.date}/16S-report/
    cp results/{wildcards.date}/16S-report.tar.gz {params.outpath}
    rm -r results/{wildcards.date}/16S-report
    """

SnakeMake From line 465 of rules/outputs_vsearch.smk

script:
    "../scripts/create_sample_metadata.py"

SnakeMake From line 14 of rules/preparedata.smk

shell:
    "cd resources; "
    "wget {params.genomic}; "
    "wget {params.kraken}; "
    "wget {params.seq}; "
    "wget {params.tax}; "

SnakeMake From line 16 of rules/preprocessing.smk

shell:
    "gzip -dc {input} > {output} 2> {log}; "

SnakeMake From line 70 of rules/preprocessing.smk

script:
    "../scripts/lower_to_upper_fasta.py"

SnakeMake From line 83 of rules/preprocessing.smk

shell:
    "qiime tools import "
    "--input-path {input} "
    "--output-path {output} "
    "--type 'FeatureData[Sequence]' "
    "2> {log} "

SnakeMake QIIME2.0 From line 96 of rules/preprocessing.smk

shell:
    "tar -zxvf {input} -C resources;"

SnakeMake From line 113 of rules/preprocessing.smk

shell:
    "qiime tools import "
    "--type {params.datatype} "
    "--input-path {params.direc} "
    "--input-format CasavaOneEightSingleLanePerSampleDirFmt "
    "--output-path {output} "
    "2> {log} "

SnakeMake QIIME2.0 From line 132 of rules/preprocessing.smk

shell:
    "qiime tools import "
    "--type {params.datatype} "
    "--input-path {params.dirc} "
    "--input-format CasavaOneEightSingleLanePerSampleDirFmt "
    "--output-path {output} "
    "2> {log} "

SnakeMake QIIME2.0 From line 167 of rules/preprocessing.smk

shell:
    "qiime tools import "
    "--type {params.datatype} "
    "--input-path {params.dirc} "
    "--input-format CasavaOneEightSingleLanePerSampleDirFmt "
    "--output-path {output} "
    "2> {log} "

SnakeMake QIIME2.0 From line 198 of rules/preprocessing.smk

shell:
    """
    if [[ '${params.datatype}' == '$SampleData[PairedEndSequencesWithQuality]' ]] 
    then 
        qiime cutadapt trim-paired \
        --i-demultiplexed-sequences {input} \
        --p-adapter-f {params.adapter1} \
        --p-front-f {params.primer1} \
        --p-front-r {params.primer2} \
        --p-adapter-r {params.adapter2} \
        --p-error-rate {params.error_rate} \
        --p-times {params.rep_times} \
        --p-overlap {params.overlap} \
        --p-minimum-length {params.min_length} \
        --o-trimmed-sequences {output} \
        --verbose 2> {log}
    else 
        qiime cutadapt trim-single \
        --i-demultiplexed-sequences {input} \
        --p-cores 10 \
        --p-adapter {params.primer1} \
        --p-front {params.primer2} \
        --p-error-rate {params.error_rate} \
        --p-times {params.rep_times} \
        --p-overlap {params.overlap} \
        --p-minimum-length {params.min_length} \
        --o-trimmed-sequences {output} \
        --verbose 2> {log}
    fi
    """

SnakeMake Cutadapt QIIME2.0 From line 228 of rules/preprocessing.smk

shell:
    "qiime vsearch merge-pairs "
    "--i-demultiplexed-seqs {input} "
    "--p-allowmergestagger "
    "--p-minovlen {params.minovlen} "
    "--p-minlen {params.minlen} "
    "--p-maxdiffs {params.maxdiffs} "
    "--p-threads {params.threads} "
    "--o-merged-sequences {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 VSEARCH From line 283 of rules/preprocessing.smk

shell:
    "(kraken2 --db {input.db} --memory-mapping --threads {params.threads} --paired --use-names --report {output.report} {input.read1} {input.read2}) 2> {log}"

SnakeMake From line 19 of rules/qualitycontroll.smk

shell:
    "(kraken2 --db {input.db} --memory-mapping --threads {params.threads} --use-names --report {output.report} {input.read}) 2> {log}"

SnakeMake From line 40 of rules/qualitycontroll.smk

wrapper:
    "v1.3.1/bio/fastqc"

SnakeMake FastQC From line 53 of rules/qualitycontroll.smk

wrapper:
    "v1.23.3/bio/multiqc"

SnakeMake MultiQC From line 76 of rules/qualitycontroll.smk

shell:
    "qiime demux summarize "
    "--i-data {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 10 of rules/reduced_analysis.smk

shell:
    "qiime feature-table summarize "
    "--i-table {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 26 of rules/reduced_analysis.smk

shell:
    "qiime vsearch fastq-stats "
    "--i-sequences {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 VSEARCH From line 42 of rules/reduced_analysis.smk

shell:
    "qiime metadata tabulate "
    "--m-input-file {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 58 of rules/reduced_analysis.smk

shell:
    "qiime feature-table tabulate-seqs "
    "--i-data {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 76 of rules/reduced_analysis.smk

script:
    "../scripts/extract_humancount.py"

SnakeMake From line 98 of rules/reduced_analysis.smk

shell:
    "mkdir {output}"

SnakeMake From line 111 of rules/reduced_analysis.smk

shell:
    "qiime feature-table heatmap "
    "--i-table {input} "
    "--m-sample-metadata-file config/pep/sample.tsv "
    "--m-sample-metadata-column {params.metadata} "
    "--p-cluster {params.cluster} "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 127 of rules/reduced_analysis.smk

shell:
    "qiime taxa barplot "
    "--i-table {input.table} "
    "--i-taxonomy {input.taxonomy} "
    "--m-metadata-file config/pep/sample.tsv "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 147 of rules/reduced_analysis.smk

shell:
    "qiime metadata tabulate "
    "--m-input-file {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 165 of rules/reduced_analysis.smk

shell:
    "qiime tools export "
    "--input-path {input} "
    "--output-path {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 181 of rules/reduced_analysis.smk

script:
    "../scripts/rename_taxonomy.py"

SnakeMake From line 197 of rules/reduced_analysis.smk

shell:
    """
    biom add-metadata -i {input.biom}/feature-table.biom -o {output.biom} --observation-metadata-fp {input.taxonomy}

    biom convert -i {output.biom} -o {output.txt} --to-tsv --header-key taxonomy
    """

SnakeMake From line 212 of rules/reduced_analysis.smk

shell:
    """
    biom add-metadata -i {input.biom}/feature-table.biom -o {output.biom} --observation-metadata-fp {input.taxonomy}

    biom convert -i {output.biom} -o {output.txt} --to-tsv --header-key taxonomy
    """

SnakeMake From line 231 of rules/reduced_analysis.smk

script:
    "../scripts/binaryheatmap.py"

SnakeMake From line 253 of rules/reduced_analysis.smk

script:
    "../scripts/absolute_taxabarplot.py"

SnakeMake From line 271 of rules/reduced_analysis.smk

shell:
    "qiime tools export "
    "--input-path {input.table} "
    "--output-path {output.table_biom} \n"
    "qiime tools export "
    "--input-path {input.taxonomy} "
    "--output-path {output.taxa_biom} \n"
    "qiime feature-table presence-absence "
    "--i-table {input.table} "
    "--o-presence-absence-table {output.table_binary} \n"
    "qiime tools export "
    "--input-path {output.table_binary} "
    "--output-path {output.binary_biom}"

SnakeMake QIIME2.0 From line 288 of rules/reduced_analysis.smk

script:
    "../scripts/rename_qzv.py"

SnakeMake From line 320 of rules/reduced_analysis.smk

script:
    "../scripts/extract_reports.py"

SnakeMake From line 369 of rules/reduced_analysis.smk

shell:
    "snakemake --nolock --report {output} --report-stylesheet resources/custom-stylesheet.css "
    "{params.for_testing} "
    "> {log} 2>&1"

SnakeMake Snakemake From line 389 of rules/reduced_analysis.smk

shell:
    "tar -czvf {output} {params.directory} "

SnakeMake From line 409 of rules/reduced_analysis.smk

shell:
    "qiime feature-table summarize "
    "--i-table {input.table_wh} "
    "--o-visualization {output.visual_wh} "
    "--verbose 2> {log} \n"
    "qiime feature-table summarize "
    "--i-table {input.table_woh} "
    "--o-visualization {output.visual_woh} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 424 of rules/reduced_analysis.smk

script:
    "../scripts/sample_freq_difference.py"

SnakeMake From line 453 of rules/reduced_analysis.smk

script:
    "../scripts/yaml_to_table.py"

SnakeMake From line 470 of rules/reduced_analysis.smk

shell:
    """
    mkdir results/{wildcards.date}/16S-report
    cp -r {input} results/{wildcards.date}/16S-report/
    tar -czvf results/{wildcards.date}/16S-report.tar.gz results/{wildcards.date}/16S-report/
    cp results/{wildcards.date}/16S-report.tar.gz {params.outpath}
    rm -r results/{wildcards.date}/16S-report
    """

SnakeMake From line 502 of rules/reduced_analysis.smk

shell:
    "qiime diversity alpha "
    "--i-table {input} "
    "--p-metric {params.metric} "
    "--o-alpha-diversity {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 12 of rules/statistic_analysis.smk

shell:
    "qiime diversity alpha-group-significance "
    "--i-alpha-diversity {input} "
    "--m-metadata-file config/pep/sample.tsv "
    "--o-visualization {output} "
    "--verbose 2> {log} "

SnakeMake QIIME2.0 From line 29 of rules/statistic_analysis.smk

shell:
    "qiime diversity alpha-correlation "
    "--i-alpha-diversity {input} "
    "--m-metadata-file {params.metadata} "
    "--p-method {params.method} "
    "--p-intersect-ids "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 49 of rules/statistic_analysis.smk

shell:
    "qiime diversity beta "
    "--i-table {input} "
    "--p-metric {params.metric} "
    "--p-pseudocount {params.pseudocount} "
    "--p-n-jobs {params.n_jobs} "
    "--o-distance-matrix {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 72 of rules/statistic_analysis.smk

shell:
    "qiime diversity beta-group-significance "
    "--i-distance-matrix {input} "
    "--m-metadata-file config/pep/sample.tsv "
    "--m-metadata-column {wildcards.metadata_column} "
    "--o-visualization {output.out} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 93 of rules/statistic_analysis.smk

shell:
    "qiime diversity beta-correlation "
    "--i-distance-matrix {input} "
    "--m-metadata-file {params.metadata_file} "
    "--m-metadata-column {wildcards.metadata_column} "
    "--p-method {params.method} "
    "--p-permutations {params.permutations} "
    "--p-intersect-ids "
    "--p-label1 jaccard-distance-matrix "
    "--p-label2 {params.metadata} "
    "--o-metadata-distance-matrix {output.distance_matrix} "
    "--o-mantel-scatter-visualization {output.mantel_scatter_vis} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 117 of rules/statistic_analysis.smk

shell:
    "qiime diversity alpha-phylogenetic "
    "--i-phylogeny {input.phylogeny} "
    "--i-table {input.table} "
    "--p-metric faith_pd "
    "--o-alpha-diversity {output} "
    "--verbose 2> {log} "

SnakeMake QIIME2.0 From line 142 of rules/statistic_analysis.smk

shell:
    "qiime diversity beta-phylogenetic "
    "--i-table {input.table} "
    "--i-phylogeny {input.phylogeny} "
    "--p-metric {params.metrics} "
    "--p-threads {params.threads} "
    "--p-variance-adjusted {params.variance_adjusted} "
    "--o-distance-matrix {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 165 of rules/statistic_analysis.smk

shell:
    "qiime diversity pcoa "
    "--i-distance-matrix {input} "
    "--o-pcoa {output} "
    "--verbose 2> {log} "

SnakeMake QIIME2.0 From line 185 of rules/statistic_analysis.smk

shell:
    "qiime emperor plot "
    "--i-pcoa {input} "
    "--m-metadata-file {params.metadata} "
    "--o-visualization {output} "
    "--verbose 2> {log} "

SnakeMake QIIME2.0 From line 203 of rules/statistic_analysis.smk

shell:
    "qiime demux summarize "
    "--i-data {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 10 of rules/visualisation.smk

shell:
    "qiime feature-table summarize "
    "--i-table {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 26 of rules/visualisation.smk

shell:
    "qiime feature-table summarize "
    "--i-table {input.table_wh} "
    "--o-visualization {output.visual_wh} "
    "--verbose 2> {log} \n"
    "qiime feature-table summarize "
    "--i-table {input.table_woh} "
    "--o-visualization {output.visual_woh} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 44 of rules/visualisation.smk

shell:
    "qiime vsearch fastq-stats "
    "--i-sequences {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 VSEARCH From line 64 of rules/visualisation.smk

shell:
    "qiime metadata tabulate "
    "--m-input-file {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 80 of rules/visualisation.smk

shell:
    "qiime feature-table tabulate-seqs "
    "--i-data {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 98 of rules/visualisation.smk

script:
    "../scripts/extract_humancount.py"

SnakeMake From line 120 of rules/visualisation.smk

shell:
    "mkdir {output}"

SnakeMake From line 133 of rules/visualisation.smk

shell:
    "qiime feature-table heatmap "
    "--i-table {input} "
    "--m-sample-metadata-file config/pep/sample.tsv "
    "--m-sample-metadata-column {params.metadata} "
    "--p-cluster {params.cluster} "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 149 of rules/visualisation.smk

shell:
    "qiime taxa barplot "
    "--i-table {input.table} "
    "--i-taxonomy {input.taxonomy} "
    "--m-metadata-file config/pep/sample.tsv "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 169 of rules/visualisation.smk

shell:
    "qiime metadata tabulate "
    "--m-input-file {input} "
    "--o-visualization {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 187 of rules/visualisation.smk

shell:
    "qiime tools export "
    "--input-path {input} "
    "--output-path {output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 203 of rules/visualisation.smk

shell:
    "qiime diversity alpha-rarefaction "
    "--i-table {input.table} "
    "--i-phylogeny {input.phylogeny} "
    "--p-max-depth {params.max_depth} "
    "--m-metadata-file config/pep/sample.tsv "
    "--o-visualization {output.alpha} "
    "--verbose 2> {log} \n"
    "qiime diversity beta-rarefaction "
    "--i-table {input.table} "
    "--i-phylogeny {input.phylogeny} "
    "--p-metric {params.metric} "
    "--p-clustering-method {params.clustering_method} "
    "--m-metadata-file config/pep/sample.tsv "
    "--p-sampling-depth {params.sampling_depth} "
    "--o-visualization {output.beta} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 226 of rules/visualisation.smk

shell:
    "qiime gneiss correlation-clustering "
    "--i-table {input} "
    "--o-clustering {output.tree} \n"

    "qiime gneiss dendrogram-heatmap "
    "--i-table {input} "
    "--i-tree {output.tree} "
    "--m-metadata-file config/pep/sample.tsv "
    "--m-metadata-column {params.metadata} "
    "--p-color-map seismic "
    "--o-visualization {output.heatmap_gneiss} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 257 of rules/visualisation.smk

script:
    "../scripts/rename_taxonomy.py"

SnakeMake From line 281 of rules/visualisation.smk

shell:
    """
    (biom add-metadata -i {input.biom}/feature-table.biom -o {output.biom} --observation-metadata-fp {input.taxonomy}) > {log} 2>&1

    (biom convert -i {output.biom} -o {output.txt} --to-tsv --header-key taxonomy) > {log} 2>&1
    """

SnakeMake From line 296 of rules/visualisation.smk

shell:
    """
    (biom add-metadata -i {input.biom}/feature-table.biom -o {output.biom} --observation-metadata-fp {input.taxonomy}) > {log} 2>&1

    (biom convert -i {output.biom} -o {output.txt} --to-tsv --header-key taxonomy) > {log} 2>&1
    """

SnakeMake From line 315 of rules/visualisation.smk

script:
    "../scripts/binaryheatmap.py"

SnakeMake From line 337 of rules/visualisation.smk

script:
    "../scripts/absolute_taxabarplot.py"

SnakeMake From line 355 of rules/visualisation.smk

shell:
    "songbird multinomial "
    "--input-biom {input} "
    "--metadata-file config/pep/sample.tsv "
    "--formula  {params.formula} "
    "--epochs 10000 "
    "--differential-prior {params.differential_prior} "
    "--min-sample-count {params.min_sample_count} "
    "--min-feature-count {params.min_feature_count} "
    "--summary-interval {params.summary_interval} "
    "--summary-dir {output} "
    "2> {log}"

SnakeMake From line 374 of rules/visualisation.smk

script:
    "../scripts/featuretable_songbird.py"

SnakeMake From line 401 of rules/visualisation.smk

shell:
    "qurro "
    "--ranks {params.differentials} "
    "--table {input.table} "
    "--sample-metadata {params.metadata_file} "
    "--feature-metadata {input.feature_metadata} "
    "--output-dir {output.plot} "
    "2> {log} "

SnakeMake From line 425 of rules/visualisation.smk

shell:
    "qiime composition add-pseudocount "
    "--i-table {input} "
    "--o-composition-table {output.pseudocount_table} \n"
    "qiime composition ancom "
    "--i-table {output.pseudocount_table} "
    "--m-metadata-file {params.metadata_file} "
    "--m-metadata-column {params.metadata_column} "
    "--o-visualization {output.ancom_output} "
    "--verbose 2> {log}"

SnakeMake QIIME2.0 From line 448 of rules/visualisation.smk

script:
    "../scripts/sample_freq_difference.py"

SnakeMake From line 478 of rules/visualisation.smk

script:
    "../scripts/prepare_diversity.py"

SnakeMake From line 498 of rules/visualisation.smk

script:
    "../scripts/plot_distance.py"

SnakeMake From line 516 of rules/visualisation.smk

import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from pylab import savefig
import sys


sys.stderr = open(snakemake.log[0], "w")

df = pd.read_csv(str(snakemake.input), delimiter="\t", header=1, index_col="taxonomy")
df.drop("#OTU ID", axis=1, inplace=True)
df_reduced = df.groupby(df.index).sum()
species = df_reduced.index.unique()
for i in df_reduced.index:
    if "Unassigned" not in i:
        df_reduced.index = df_reduced.index.str.split("; s__").str[0]
df_reduced_second = df_reduced.groupby(df_reduced.index).sum()
for i in df_reduced_second.index:
    if "Chloroplast" in i:
        df_reduced_second.drop(index=i, inplace=True)
df_trans = df_reduced_second.T
df_trans.replace(-np.inf, 0, inplace=True)
np.random.seed(100)
mycolors = np.random.choice(
    list(mpl.colors.XKCD_COLORS.keys()), len(species), replace=False
)
fig = df_trans.plot(kind="bar", stacked=True, color=mycolors, figsize=(16, 8))
handles, labels = fig.get_legend_handles_labels()
lgd = fig.legend(
    handles,
    labels,
    loc="center left",
    bbox_to_anchor=(1, 0.5),
    borderaxespad=0.0,
    fontsize=8,
)
plt.yscale("log")
if len(df) < 40:
    plt.xticks(fontsize=7, rotation=45)
if len(df) > 40:
    plt.xticks(fontsize=5, rotation=45)
if len(df) > 100:
    plt.xticks(fontsize=4, rotation=45)
plt.xlabel("Sample name")
plt.ylabel("Log10 abundance")
plt.title("Taxa-bar-plot of log10(abundance)")
plt.savefig(str(snakemake.output), bbox_extra_artists=(lgd,), bbox_inches="tight")

Python Pandas numpy matplotlib pyLabLib From line 1 of scripts/absolute_taxabarplot.py

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import numpy as np
from pylab import savefig
import sys


sys.stderr = open(snakemake.log[0], "w")

# Reading information from a file, holding binary information whether an OTU is present or absent in a sample

# Reading the file, deleting unnecessary columns
df = pd.read_csv(str(snakemake.input), delimiter="\t", header=1, index_col="taxonomy")
del df["#OTU ID"]
# Creating a specific colour scheme
my_colors = [(0.2, 0.6, 0.3), (0.1, 0.4, 0.5)]
fig, ax = plt.subplots(figsize=(20, 15))
# Creating a heatmap with seaborn
heatmap = sb.heatmap(
    df, cmap=my_colors, linewidth=0.01, linecolor="Black", cbar=False, yticklabels=True
)
# Setting the properties for the legend
cbar = heatmap.figure.colorbar(heatmap.collections[0])
cbar.set_ticks([0.25, 0.75])
cbar.set_ticklabels(["absent", "present"])
heatmap.set_yticklabels(heatmap.get_ymajorticklabels(), fontsize=6)
if len(df) < 40:
    plt.xticks(fontsize=8, rotation=45)
if len(df) > 40:
    plt.xticks(fontsize=6, rotation=45)
if len(df) > 100:
    plt.xticks(fontsize=5, rotation=45)
heatmap.set(xlabel="Sample", ylabel="Taxonomic classification of bacteria")
# Setting the orientation of the plot
plt.subplots_adjust(bottom=0.2, left=0.4)
plt.show()

figure = heatmap.get_figure()
figure.savefig(str(snakemake.output), dpi=400)

Python Pandas numpy matplotlib seaborn pyLabLib From line 1 of scripts/binaryheatmap.py

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import numpy as np
import os
from pylab import savefig
import sys


sys.stderr = open(snakemake.log[0], "w")

path = os.path.dirname(str(snakemake.input[0]))

inputs = []
for file in os.listdir(path):
    if file.endswith(".txt"):
        inputs.append(os.path.join(path, file))
# concatenate all txt files in a file
with open(str(snakemake.output), "w") as outfile:
    i = 0
    outfile.write("Sample: Difference without human " + "\n")
    while i < len(inputs):
        with open(inputs[i], encoding="utf-8", errors="ignore") as infile:
            path_name = inputs[i].split("_")[0]
            name = path_name.split("/")[-1] + ": " + infile.read()
            outfile.write(name)
            i += 1

Python Pandas numpy matplotlib seaborn pyLabLib From line 1 of scripts/bowtie_frequency.py

import os
import pandas as pd
from os.path import isfile, join
from os import path, getcwd, listdir
from os import mkdir
from shutil import move, copy2
from datetime import date, datetime
import sys


sys.stderr = open(snakemake.log[0], "w")

# Creating a metadata sample-sheet, that holds additional information concerning the samples.
# Metadata information is needed to be able to create plots and for metadata-specific analysis.

config = snakemake.config

DATA_PATH = str(config["data"])
IN_PATH = str(config["input"])
# today = date.today().strftime("%Y-%m-%d")
incoming_files = []
# Running through all files in the IN_PATH and checking them
# Adding the files to a list, when they meet the requirements
for f in listdir(IN_PATH):
    if (
        path.isfile(path.join(IN_PATH, f))
        and "ndetermined" not in f
        and ".fastq.gz" in f
        and os.stat(IN_PATH + f).st_size > 100
    ):
        incoming_files.append(f)
    else:
        print(f, "not used")
metadata = pd.read_csv(str(snakemake.input), header=0, delimiter=",")
date = metadata["run_date"].iloc[1]
print(date)
# Adding a direcory for the specific date as subdirectory to the DATA_PATH
DATA_PATH += date
if not os.path.isdir(DATA_PATH):
    mkdir(DATA_PATH)
# Checking if some files are already present in the DATA_PATH, those are not copied
data_files = [f for f in listdir(DATA_PATH) if path.isfile(path.join(DATA_PATH, f))]
files_not_to_copy = [f for f in data_files if f in incoming_files]
files_to_copy = [f for f in incoming_files if f not in data_files]
for file in files_to_copy:
    if file.endswith(".fastq.gz") and not "ndetermined" in file:
        copy2(IN_PATH + file, DATA_PATH)
print(files_to_copy)

# Reading the sample names and the metadata from the file-names and the metadata.csv file, that needs to be provided
files = os.listdir(DATA_PATH)
print(files)
sample_list = []
for name in files:
    sample = name.split("_")[0]
    sample_list.append(sample)
sample_list = list(set(sample_list))
metadata = pd.read_csv(str(snakemake.input), header=0, delimiter=",")
metadata.columns = metadata.columns.str.lower()
# Samples, that are not mentioned in the metadata.csv are excluded from the sample-metadata-sheet
for name in sample_list:
    if name not in metadata["sample_name"].tolist():
        print("No metadata was found for " + name)
        sample_list.remove(name)
        # Also remove the fastq file from folder?

# Error if names don't fit
metadata_name_list = metadata["sample_name"].tolist()
metadata_name_list.remove("#q2:types")
if len(sample_list) == len(metadata_name_list):
    if set(sample_list) != set(metadata_name_list):
        print(
            "There seems to be a discrepancy between fastq file and metadata. Please check the spelling in the metadata.txt."
        )
        raise Exception(
            "There seems to be a discrepancy between fastq file and metadata. Please check the spelling in the metadata.txt."
        )

# Replacing empty metadata-columns with 0 and after that removing them from the file
metadata.fillna(0, inplace=True)
metadata.rename(columns={"sample_name": "sample-ID"}, inplace=True)
metadata.to_csv(snakemake.output.metadata, sep="\t", index=False)

# Creating a sample_info df, that holds paths to the fastq files and the date.
# Common functions can read information from sample_info.txt instead from the data directories
data = [metadata["sample-ID"], metadata["run_date"]]
header = ["sample-ID", "run_date"]
sample_info = pd.concat(data, axis=1, keys=header)
if "SampleData[PairedEndSequencesWithQuality]" in str(snakemake.params.datatype):
    sample_info["path1"] = ""
    sample_info["path2"] = ""
elif "SampleData[SequencesWithQuality]" in str(snakemake.params.datatype):
    sample_info["path1"] = ""
sample_info.set_index("sample-ID", inplace=True)
sample_info.drop(labels=["#q2:types"], axis=0, inplace=True)
i = 0
while i < len(sample_info.index):
    for file in files_to_copy:
        if sample_info.index[i] in file and len(sample_info.index[i]) == len(
            file.split("_")[0]
        ):
            if "SampleData[PairedEndSequencesWithQuality]" in str(
                snakemake.params.datatype
            ):
                if "R1" in file:
                    sample_info.loc[sample_info.index[i], "path1"] = (
                        DATA_PATH + "/" + file
                    )
                elif "R2" in file:
                    sample_info.loc[sample_info.index[i], "path2"] = (
                        DATA_PATH + "/" + file
                    )
            elif "SampleData[SequencesWithQuality]" in str(snakemake.params.datatype):
                if "R1" in file:
                    sample_info.loc[sample_info.index[i], "path1"] = (
                        DATA_PATH + "/" + file
                    )
        else:
            continue
    i = i + 1
sample_info.to_csv(snakemake.output.sample_info, sep=",", mode="a")

Python Pandas From line 1 of scripts/create_sample_metadata.py

import os
import pandas as pd
import shutil
from os.path import isfile, join
from os import path, getcwd, listdir
from os import mkdir
from shutil import copy, copy2
from datetime import date, datetime
import sys


sys.stderr = open(snakemake.log[0], "w")
# Function to extract results from the qiime2 artifacts, already saved in a zip file

"""print("results/2022-02-24/visual/unzipped/")
os.walk("results/2022-02-24/visual/unzipped/")
for x in os.walk("results/2022-02-24/visual/unzipped/"):
    print(x[0])"""
# Reading the zip-files from the directory
dirlist = os.listdir(str(snakemake.input))
# Getting the subdirectories of the files
subdir = []
i = 0
while i < len(dirlist):
    directory = str(snakemake.input) + "/" + dirlist[i] + "/"
    subdir.append(directory)
    i = i + 1
"""z = 0
while z < len(subdir):
    if path.isdir(subdir[z]):
        dir = os.listdir(subdir[z])
        print(dir)
        subdir[z] = subdir[z] + dir[0] + "/"
        print(subdir[z])
        z = 1 + z
    else:
        z = 1 + z"""
# Iterating through the different subdirectories and copying the important information into the output directory
b = 0
path_name = str(snakemake.output)
print(path_name)
plot_name = path_name.split("/")[-1]
while b < len(subdir):
    datadir = subdir[b] + "data/"
    if plot_name == subdir[b].split("/")[-2]:
        html = datadir
        shutil.copytree(html, str(snakemake.output))
    b = b + 1

Python Pandas QIIME2.0 From line 1 of scripts/extract_beta_corr.py

import os
import shutil
import gzip
import zipfile
from os.path import isfile, join
from os import path, getcwd, listdir
from os import mkdir
from shutil import copy, copy2
from datetime import date, datetime
import sys


sys.stderr = open(snakemake.log[0], "w")

file = str(snakemake.input)
name = os.path.splitext(file)[0]
shutil.copy(file, name + ".zip")
zipped_file = name + ".zip"

with zipfile.ZipFile(zipped_file, "r") as zip_ref:
    name = zipped_file.split("/")[-1]
    new_dir = str(snakemake.params.between) + "/" + name
    zip_ref.extractall(os.path.splitext(new_dir)[0] + "/")
directory = os.listdir(str(snakemake.params.between))
subdir = os.listdir(str(snakemake.params.between) + "/" + directory[0])
orig_dir = str(snakemake.params.between) + "/" + directory[0] + "/" + subdir[0]
new_dir = str(snakemake.params.between) + "/" + directory[0]
print(new_dir)
dir_name = new_dir.split("/")[-1]
for f in os.listdir(orig_dir):
    path = orig_dir + "/" + f
    print(path)
    unzipped_path = str(snakemake.params.between) + "/" + dir_name + "/" + f
    shutil.move(path, unzipped_path)
os.rmdir(orig_dir)

dirlist = os.listdir(str(snakemake.params.between))
subdir = []
i = 0
while i < len(dirlist):
    directory = str(snakemake.params.between) + "/" + dirlist[i] + "/"
    subdir.append(directory)
    i = i + 1

b = 0
while b < len(subdir):
    datadir = subdir[b] + "/" + "data/"
    if "human-count" in subdir[b]:
        html = datadir
        shutil.copytree(html, snakemake.output.human_count)
    b += 1

Python From line 1 of scripts/extract_humancount.py

import os
import pandas as pd
import shutil
from os.path import isfile, join
from os import path, getcwd, listdir
from os import mkdir
from shutil import copy, copy2
from datetime import date, datetime
import sys


sys.stderr = open(snakemake.log[0], "w")

# Function to extract results from the qiime2 artifacts, already saved in a zip file

"""print("results/2022-02-24/visual/unzipped/")
os.walk("results/2022-02-24/visual/unzipped/")
for x in os.walk("results/2022-02-24/visual/unzipped/"):
    print(x[0])"""
# Reading the zip-files from the directory
dirlist = os.listdir(str(snakemake.input))
# Getting the subdirectories of the files
subdir = []
i = 0
while i < len(dirlist):
    directory = str(snakemake.input) + "/" + dirlist[i] + "/"
    subdir.append(directory)
    i = i + 1
"""z = 0
while z < len(subdir):
    if path.isdir(subdir[z]):
        dir = os.listdir(subdir[z])
        print(dir)
        subdir[z] = subdir[z] + dir[0] + "/"
        print(subdir[z])
        z = 1 + z
    else:
        z = 1 + z"""
# Iterating through the different subdirectories and copying the important information into the output directory
b = 0
while b < len(subdir):
    datadir = subdir[b] + "data/"
    if "beta-rarefaction" in subdir[b]:
        svg = datadir + "heatmap.svg"
        shutil.copy(svg, snakemake.output.beta_svg)
        html = datadir
        shutil.copytree(html, snakemake.output.beta_html)
    elif "heatmap" in subdir[b] and "gneiss" not in subdir[b]:
        svg = datadir + "feature-table-heatmap.svg"
        shutil.copy(svg, snakemake.output.heatmap)
    elif "taxonomy" in subdir[b]:
        tsv = datadir + "metadata.tsv"
        shutil.copy(tsv, snakemake.output.taxonomy_tsv)
    elif "taxa-bar-plots" in subdir[b]:
        html = datadir
        shutil.copytree(html, snakemake.output.taxa_barplot)
    elif "heatmap_gneiss" in subdir[b]:
        svg = datadir + "heatmap.svg"
        shutil.copy(svg, snakemake.output.gneiss)
    if "alpha-rarefaction" in subdir[b]:
        html = datadir
        shutil.copytree(html, snakemake.output.alpha_html)
    if "paired-seqs" in subdir[b]:
        html = datadir
        shutil.copytree(html, snakemake.output.paired_seqs)
    if "fastq_stats" in subdir[b]:
        html = datadir
        shutil.copytree(html, snakemake.output.fastq_stats)
    if "demux-joined-filter-stats" in subdir[b]:
        html = datadir
        shutil.copytree(html, snakemake.output.demux_filter_stats)
    if "dada2-stats" in subdir[b]:
        html = datadir
        shutil.copytree(html, snakemake.output.dada2)
    b = b + 1

Python Pandas QIIME2.0 From line 1 of scripts/extract_reports.py

import os
import pandas as pd
import shutil
from os.path import isfile, join
from os import path, getcwd, listdir
from os import mkdir
from shutil import copy, copy2
from datetime import date, datetime
import sys


sys.stderr = open(snakemake.log[0], "w")

# Reading the zip-files from the directory
dirlist = os.listdir(str(snakemake.input))
# Getting the subdirectories of the files
subdir = []
i = 0
while i < len(dirlist):
    directory = str(snakemake.input) + "/" + dirlist[i] + "/"
    subdir.append(directory)
    i = i + 1

# Iterating through the different subdirectories and copying the important information into the output directory
b = 0
path_name = str(snakemake.output)
print(path_name)
plot_name = path_name.split("/")[-1]
while b < len(subdir):
    datadir = subdir[b] + "data/"
    if plot_name == subdir[b].split("/")[-2]:
        html = datadir
        shutil.copytree(html, str(snakemake.output))
    b = b + 1

Python Pandas From line 1 of scripts/extract_significance.py

import pandas as pd
import os
import sys


sys.stderr = open(snakemake.log[0], "w")

differentials_df = pd.read_csv(
    str(snakemake.params), header=0, index_col="featureid", sep="\t"
)
taxa_df = pd.read_csv(str(snakemake.input.taxa), header=1, sep="\t")
taxa_small = taxa_df[["#OTU ID", "taxonomy"]].copy()
taxa_small_new = taxa_small.set_index(["#OTU ID"])
diff_taxa_df = pd.concat([differentials_df, taxa_small_new], axis=1)
diff_taxa_df_new = diff_taxa_df.set_index(["taxonomy"])
diff_taxa_df_new.to_csv(str(snakemake.output.diff), sep="\t", index=True)
taxa_small_new.rename_axis("featurue-id", inplace=True)
taxa_small_new.to_csv(str(snakemake.output.feature_meta), sep="\t", index=True)

Python Pandas From line 1 of scripts/featuretable_songbird.py

import pandas as pd
import sys


sys.stderr = open(snakemake.log[0], "w")

fasta = open(str(snakemake.input), "r")
fasta_upper_file = open(str(snakemake.output), "w")
print("read!")
for x in fasta.read():
    fasta_upper = x.upper()
    fasta_upper_file.write(fasta_upper)
print("ready")
fasta.close()
fasta_upper_file.close()

Python Pandas From line 1 of scripts/lower_to_upper_fasta.py

import os
import shutil
import sys
import pandas as pd
import seaborn as sb
import numpy as np
import matplotlib.pyplot as plt

sys.stderr = open(snakemake.log[0], "w")

directories = os.listdir(str(snakemake.input))

i = 0
while i < len(directories):
    output_name = os.path.split(str(snakemake.output))[1]
    name = output_name.split(".")[0]
    if name in directories[i]:
        df = pd.read_csv(
            str(snakemake.input)
            + "/"
            + directories[i]
            + "/data"
            + "/distance-matrix.tsv",
            sep="\t",
            header=0,
            index_col=0,
        )
        matrix = np.triu(df)
        fig, ax = plt.subplots(figsize=(20, 15))
        heatmap = sb.heatmap(
            df, mask=matrix, cbar_kws={"label": "Distance"}, annot=True, fmt=".1f"
        )
        heatmap.set(xlabel="Samplename", ylabel="Samplename")
        figure = heatmap.get_figure()
        figure.savefig(str(snakemake.output), dpi=400)
    i = i + 1

Python Pandas numpy matplotlib seaborn From line 1 of scripts/plot_distance.py

import os
import shutil
import sys
import zipfile

sys.stderr = open(snakemake.log[0], "w")

zipped_files = []
i = 0
os.mkdir(str(snakemake.output))
while i < len(snakemake.input):
    file = str(snakemake.input[i])
    name_path = os.path.splitext(file)[0]
    name = os.path.split(name_path)[1]
    output_path = str(snakemake.output) + "/" + name + ".zip"
    shutil.copy2(file, output_path)
    zipped_files.append(output_path)
    i += 1

j = 0
while j < len(zipped_files):
    with zipfile.ZipFile(zipped_files[j], "r") as zip_ref:
        name = zipped_files[j].split("/")[-1]
        print(name)
        new_dir = str(snakemake.output) + "/" + name
        print(new_dir)
        zip_ref.extractall(os.path.splitext(new_dir)[0] + "/")
        os.remove(new_dir)
    j = j + 1

directory = os.listdir(str(snakemake.output))
print(directory)

z = 0
while z < len(directory):
    subdir = os.listdir(str(snakemake.output) + "/" + directory[z])
    orig_dir = str(snakemake.output) + "/" + directory[z] + "/" + subdir[0]
    new_dir = str(snakemake.output) + "/" + directory[z]
    for f in os.listdir(orig_dir):
        path = orig_dir + "/" + f
        shutil.move(path, new_dir)
    os.rmdir(orig_dir)
    z = z + 1

Python From line 1 of scripts/prepare_diversity.py

import pandas as pd
import gzip
import shutil
import os
import zipfile
import sys


sys.stderr = open(snakemake.log[0], "w")

# Extracting the number of total features over all samples from the sample-table.
# Multiplying the number with the relative abundance filtering value to create a threshold for the
# qiime filtering.

# Reading the sample-table, creating a zip file
file = str(snakemake.input)
name = os.path.splitext(file)[0]
shutil.copy(file, name + ".zip")
filename = name + ".zip"
# Extract zip files to folder
with zipfile.ZipFile(filename, "r") as zip_ref:
    name = filename.split("/")[-1]
    dir_name = os.path.dirname(str(snakemake.input))
    new_dir = dir_name + "/" + name
    zip_ref.extractall(os.path.splitext(new_dir)[0] + "/")
name = name.split(".")[0]
directory = os.path.dirname(str(snakemake.input)) + "/" + name
# Moving the folder inventory one folder up
b = 0
subdir = os.listdir(directory)
while b < len(subdir):
    orig_dir = directory + "/" + subdir[b]
    new_dir = directory
    for f in os.listdir(orig_dir):
        path = orig_dir + "/" + f
        shutil.move(path, new_dir)
    b = b + 1
# Read the specific csv holding the information, creating a dataframe, adding up all feature frequencies
datadir = str(snakemake.output.feature_table) + "/"
csv = datadir + "sample-frequency-detail.csv"
frequency = pd.read_csv(csv, header=0, delimiter=",")
number = frequency["0"].sum()
# Creating the abundance threshold and storing it in an output file
abundance = float(str(snakemake.params))
endnumber = number * abundance
endnumber = int(endnumber)
if endnumber == 0:
    endnumber += 1
with open(str(snakemake.output.abundance), "w") as f:
    f.write(str(endnumber))

Python Pandas QIIME2.0 From line 1 of scripts/relative_abundance.py

import pandas as pd
import gzip
import shutil
import os
import zipfile
import sys


sys.stderr = open(snakemake.log[0], "w")

# Renaming the qiime2 artifacts for viewing to zip files so the information can be accessed

# Iterating over the input files and renaming them to zip
i = 0
zipped_files = []
while i < len(snakemake.input):
    file = snakemake.input[i]
    name = os.path.splitext(file)[0]
    shutil.copy(file, name + ".zip")
    zipped_files.append(name + ".zip")
    i = i + 1
z = 0
# opening the zip files and saving them to the output directory
while z < len(zipped_files):
    with zipfile.ZipFile(zipped_files[z], "r") as zip_ref:
        name = zipped_files[z].split("/")[-1]
        new_dir = str(snakemake.output) + "/" + name
        zip_ref.extractall(os.path.splitext(new_dir)[0] + "/")
        os.remove(zipped_files[z])
    z = z + 1
directory = os.listdir(str(snakemake.output))
# Iterating through the directory holding the unzipped files, moving the file content one folder up.
# Folder created while unzipping can be removed, because it gives no further information
j = 0
while j < len(directory):
    subdir = os.listdir(str(snakemake.output) + "/" + directory[j])
    orig_dir = str(snakemake.output) + "/" + directory[j] + "/" + subdir[0]
    new_dir = str(snakemake.output) + "/" + directory[j]
    for f in os.listdir(orig_dir):
        path = orig_dir + "/" + f
        shutil.move(path, new_dir)
    os.rmdir(orig_dir)
    j = j + 1

Python Pandas QIIME2.0 From line 1 of scripts/rename_qzv.py

import pandas as pd
import os
import sys


sys.stderr = open(snakemake.log[0], "w")

# Renaming the headers in the taxonomy.tsv file, so that they match with the headers of the table-biom file.
# When the headers are equal, the files can be merged into one file.

taxonomy_file = str(snakemake.input) + "/" + "taxonomy.tsv"
print(os.listdir(str(snakemake.input))[0])
taxonomy_df = pd.read_csv(taxonomy_file, delimiter="\t", header=0)
taxonomy_df.rename(
    columns={"Feature ID": "#OTU ID", "Taxon": "taxonomy", "Consensus": "confidence"},
    inplace=True,
)
taxonomy_df.to_csv(str(snakemake.output), sep="\t", index=False)

Python Pandas From line 1 of scripts/rename_taxonomy.py

import pandas as pd

sys.stderr = open(snakemake.log[0], "w")

whuman = str(snakemake.params.visual_wh)
wohuman = str(snakemake.params.visual_woh)

whuman_df = pd.read_csv(whuman, sep=",", header=0, index_col=0)
wohuman_df = pd.read_csv(wohuman, sep=",", header=0, index_col=0)

whuman_df.rename(columns={"0": "whuman"}, inplace=True)
wohuman_df.rename(columns={"0": "wohuman"}, inplace=True)

combined = pd.concat([whuman_df, wohuman_df], axis=1)

for sample in combined.index:
    if combined.at[sample, "whuman"] == combined.at[sample, "wohuman"]:
        combined.drop([sample], inplace=True)
# combined.drop_duplicates(subset = ["whuman", "wohuman"], inplace = True)

for sample in combined.index:
    difference = combined.at[sample, "whuman"] - combined.at[sample, "wohuman"]
    combined.at[sample, "difference"] = difference
combined.index.name = "Sample"
combined.rename(
    columns={"whuman": "Features with human", "wohuman": "Features without human"},
    inplace=True,
)
combined.to_csv(str(snakemake.output))

Python Pandas From line 1 of scripts/sample_freq_difference.py

import sys
from pathlib import Path
import oyaml as yaml
from prettytable import PrettyTable
import argparse
import os

"""

Example :

Input YAML :
~~~~~~~~~~
metadata:
   name: nginx-deployment
    labels:
      app: nginx

Output Table :
~~~~~~~~~~~~
metadata:
+--------+------------------+----------+-----------------------------------------------------------+
| Field  | Example Value    | Required | Description                                               |
+--------+------------------+----------+-----------------------------------------------------------+
| name   | nginx-deployment |    --    | Lorem ipsum dolor sit amet, consecteteur adipiscing elit. |
| labels | --               |    --    | Lorem ipsum.                                              |
|   app  | nginx            |    --    | Lorem ipsum.                                              |
+--------+------------------+----------+-----------------------------------------------------------+	    

"""

# parser = argparse.ArgumentParser(description='YAML file to (HTML) table converter',
#                epilog='text table will be printed as STDOUT - html table will be save in html file ')
# parser.add_argument('--inputFile', dest='inputfile', required=True, help="input yaml file to process")
# parser.add_argument('--out', dest='format', choices=['txt', 'html', 'text'], help="convert yaml to text table or html "
#                                                                                  "table")
# parser.add_argument('--outpath', dest='outputfile', required=True, help="path to the output file")
# args = parser.parse_args()

sys.stderr = open(snakemake.log[0], "w")

outputFmt = "html"
INPUT_YAML = str(snakemake.input)

if outputFmt == "text" or outputFmt == "txt":
    PRINT_HTML = False
else:
    PRINT_HTML = True

in_file = Path(INPUT_YAML)
if not in_file.is_file():
    sys.exit("Input file [" + INPUT_YAML + "] does not exists")

SPACE_CHAR = "~"
OUTPUT_HTMl = str(snakemake.output)

CSS_TEXT = """
        <html>
        <head>
          <meta name="viewport" content="width=device-width, initial-scale=1">
          <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css">
          <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
          <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/js/bootstrap.min.js"></script>
          <style>
        body
        {
            padding-left: 20px;
        }
        th:nth-child(1) {
          width: 200px;
          }

        /* the second */
        th:nth-child(2) {
          width: 200px;
        }

        /* the third */
        th:nth-child(3) {
          width: 100px;
         }
         /* the third */
         th:nth-child(4) {
          width: 420px;
         }

         pre {
            white-space: -moz-pre-wrap; /* Mozilla, supported since 1999 */
            white-space: -pre-wrap; /* Opera */
            white-space: -o-pre-wrap; /* Opera */
            white-space: pre-wrap; /* CSS3 - Text module (Candidate Recommendation) http://www.w3.org/TR/css3-text/#white-space */
            word-wrap: break-word; /* IE 5.5+ */
            width: 725px
         }
          </style>
        </head>
        """


def listToString(inList):
    """Convert list to String"""
    ret = ""
    for line in inList:
        ret = ret + line
    return ret


def printDic(inDictionary, inPTable, indent):
    """
    Iterate over Dictionary
        If needed call same function again (recursively) until we key : value dictionary
        Add key , value , isItRequired , description to pretty table object (inPTable)
    """
    global SPACE_CHAR  # No space character that will be replaced when we print this table to text/html

    # Go ver dictionary
    for item in inDictionary:
        if isinstance(
            item, dict
        ):  # If it again dictionary call same function with this new dictionary
            inPTable.add_row([SPACE_CHAR, SPACE_CHAR])
            printDic(item, inPTable, indent)
        else:
            # Two way to get next item based on input type
            if isinstance(inDictionary, dict):
                moreStuff = inDictionary.get(item)
            elif isinstance(inDictionary, list):
                # If it simple array/list we just print all it's value and we are done
                for _item in inDictionary:
                    inPTable.add_row([indent + _item, SPACE_CHAR + SPACE_CHAR])
                break

            # if it is dictionary or list process them accordingly
            if isinstance(moreStuff, dict):
                inPTable.add_row([indent + item, SPACE_CHAR + SPACE_CHAR])
                printDic(moreStuff, inPTable, SPACE_CHAR + SPACE_CHAR + indent)
            elif isinstance(moreStuff, list):
                # If we are not in nested call (as indent is empty string) we add one extra row in table (for clarity)
                # if indent is "":
                #    inPTable.add_row([SPACE_CHAR, SPACE_CHAR, SPACE_CHAR])
                #
                # inPTable.add_row([indent + item, "", ""])
                for dicInDic in moreStuff:
                    if dicInDic is not None:
                        if isinstance(dicInDic, dict):
                            printDic(
                                dicInDic,
                                inPTable,
                                SPACE_CHAR
                                + SPACE_CHAR
                                + SPACE_CHAR
                                + SPACE_CHAR
                                + indent,
                            )
                        else:
                            inPTable.add_row([indent + item, dicInDic])

            else:
                # Most of the call will end-up eventually here -
                # this will print - key,value,isItRequired, Lorem ipsum (description)
                inPTable.add_row([indent + item, inDictionary[item]])


"""
    Read given yaml file 
       process each to level element build table (HTML) out of it and print it console/file 
"""
with open(INPUT_YAML) as file:
    # The FullLoader parameter handles the conversion from YAML
    # scalar values to Python the dictionary format
    yaml_file_object = yaml.load(file, Loader=yaml.FullLoader)

    if PRINT_HTML:
        html_st = []
        f = open(OUTPUT_HTMl, "w")
        html_st.append(CSS_TEXT)
    i = 0
    for key in yaml_file_object:
        body_st = []
        prettyTable = PrettyTable()

        prettyTable.field_names = ["Field", "Example Value"]

        if not PRINT_HTML:
            prettyTable.align["Field"] = "l"
            prettyTable.align["Example Value"] = "l"

        if isinstance(yaml_file_object, list):
            dic = yaml_file_object[i]
            i += 1
        elif isinstance(yaml_file_object, dict):
            dic = yaml_file_object.get(key)

        if isinstance(dic, dict) or isinstance(dic, list):
            printDic(dic, prettyTable, "")
            if isinstance(yaml_file_object, dict):
                yaml_snippet = yaml.dump({key: dic})

            else:
                yaml_snippet = yaml.dump(dic)

        else:
            prettyTable.add_row([key, dic])
            yaml_snippet = yaml.dump({key: dic})

        if isinstance(yaml_file_object, dict):
            if PRINT_HTML:
                body_st.append("<h2>" + key + "</h2>")
            else:
                print("=> " + key + ":")

        table = prettyTable.get_html_string(
            attributes={
                "name": key,
                "id": key,
                "class": "table table-striped table-condensed",
                "style": "width: 1450px;table-layout: fixed;overflow-wrap: "
                "break-word;",
            }
        )
        table = table.replace(SPACE_CHAR, "&nbsp;")
        body_st.append(table)

        if PRINT_HTML:
            html_st.append(" ".join(body_st))
        else:
            print(str(prettyTable).replace(SPACE_CHAR, " "))

    if PRINT_HTML:
        html_st.append("</html>")
        f.write(" ".join(html_st))
        f.close()
        print("File " + OUTPUT_HTMl + " has been generated")

Python prettytable oyaml From line 1 of scripts/yaml_to_table.py

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "[email protected]"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
# Set this to False if multiqc should use the actual input directly
# instead of parsing the folders where the provided files are located
use_input_files_only = snakemake.params.get("use_input_files_only", False)

if not use_input_files_only:
    input_data = set(path.dirname(fp) for fp in snakemake.input)
else:
    input_data = set(snakemake.input)

output_dir = path.dirname(snakemake.output[0])
output_name = path.basename(snakemake.output[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "multiqc"
    " {extra}"
    " --force"
    " -o {output_dir}"
    " -n {output_name}"
    " {input_data}"
    " {log}"
)

Python Snakemake MultiQC From line 3 of multiqc/wrapper.py

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "[email protected]"
__license__ = "MIT"


from os import path
import re
from tempfile import TemporaryDirectory

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)


def basename_without_ext(file_path):
    """Returns basename of file path, without the file extension."""

    base = path.basename(file_path)
    # Remove file extension(s) (similar to the internal fastqc approach)
    base = re.sub("\\.gz$", "", base)
    base = re.sub("\\.bz2$", "", base)
    base = re.sub("\\.txt$", "", base)
    base = re.sub("\\.fastq$", "", base)
    base = re.sub("\\.fq$", "", base)
    base = re.sub("\\.sam$", "", base)
    base = re.sub("\\.bam$", "", base)

    return base


# Run fastqc, since there can be race conditions if multiple jobs
# use the same fastqc dir, we create a temp dir.
with TemporaryDirectory() as tempdir:
    shell(
        "fastqc {snakemake.params} -t {snakemake.threads} "
        "--outdir {tempdir:q} {snakemake.input[0]:q}"
        " {log}"
    )

    # Move outputs into proper position.
    output_base = basename_without_ext(snakemake.input[0])
    html_path = path.join(tempdir, output_base + "_fastqc.html")
    zip_path = path.join(tempdir, output_base + "_fastqc.zip")

    if snakemake.output.html != html_path:
        shell("mv {html_path:q} {snakemake.output.html:q}")

    if snakemake.output.zip != zip_path:
        shell("mv {zip_path:q} {snakemake.output.zip:q}")