RADSeq tool with Snakemake workflow integration for analysis of RAD sequencing data.

public 1yr ago Version: latest 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

NodeRAD is a Snakemake workflow for analysis of single-end reads from RAD sequencing without the presence of a reference genome. It detects loci and genomic variants using sequencing error and heterozygosity rates. For more information please have a look at the associated bachelor thesis .

Note: Currently the workflow is limited to diploid species.

Authors

Antonie Vietor (@AntonieV)

This workflow is part of a bachelor thesis The topic and the underlying model were developed by:

Johannes Köster (@johanneskoester), https://koesterlab.github.io

Usage

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).

Step 1: Obtain a copy of this workflow

Create a new GitHub repository using this workflow as a template .
Clone the newly created repository to your local system, into the place where you want to perform the data analysis.

Step 2: Configure workflow

Configure the workflow according to your needs via editing the files in the config/ folder. Adjust config.yaml to configure the workflow execution, and samples.tsv to specify your sample setup.

Step 3: Install Snakemake

Install conda .
Install mamba in conda base and create a snakemake conda environment:

 conda install -n base -c conda-forge mamba
 conda create -c bioconda -c conda-forge -n snakemake snakemake

For installation details, see the instructions in the Snakemake documentation .

Step 4: Execute workflow

You can run the workflow with some examples through the script start.sh . To use your own data, change the paths in config/config.yaml for samples , fastq-data and eval-data (if there is data from a ddRAGE simulation).

Activate the conda environment:

conda activate snakemake

Test your configuration by performing a dry-run via

snakemake --use-conda --conda-frontend mamba -n

Execute the workflow locally via

snakemake --use-conda --conda-frontend mamba --cores $N

using $N cores or run it in a cluster environment via

snakemake --use-conda --conda-frontend mamba --cluster qsub --jobs 100

snakemake --use-conda --conda-frontend mamba --drmaa --jobs 100

If you not only want to fix the software stack but also the underlying OS, use

snakemake --use-conda --conda-frontend mamba --use-singularity

in combination with any of the modes above. See the Snakemake documentation for further details.

Step 5: Investigate results

After successful execution, you can create a self-contained interactive HTML report with all results via:

snakemake --report report.html

This report can, e.g., be forwarded to your collaborators. An example (using some trivial test data) can be seen here .

Step 6: Commit changes

Whenever you change something, don't forget to commit the changes back to your github copy of the repository:

git commit -a
git push

Step 7: Obtain updates from upstream

Whenever you want to synchronize your workflow copy with new developments from upstream, do the following.

Once, register the upstream repository in your local copy: git remote add -f upstream [email protected]:TharjaX/NodeRAD.git or git remote add -f upstream https://github.com/TharjaX/NodeRAD.git if you do not have setup ssh keys.
Update the upstream version: git fetch upstream .
Create a diff with the current version: git diff HEAD upstream/master workflow > upstream-changes.diff .
Investigate the changes: vim upstream-changes.diff .
Apply the modified diff via: git apply upstream-changes.diff .
Carefully check whether you need to update the config files: git diff HEAD upstream/master config . If so, do it manually, and only where necessary, since you would otherwise likely overwrite your settings and samples.

Step 8: Contribute back

In case you have also changed or added steps, please consider contributing them back to the original repository:

Fork the original repo to a personal or lab account.
Clone the fork to your local system, to a different place than where you ran your analysis.
Copy the modified files from your analysis to the clone of your fork, e.g., cp -r workflow path/to/fork . Make sure to not accidentally copy config file contents or sample sheets. Instead, manually update the example config files if necessary.
Commit and push your changes to your fork.
Create a pull request against the original repository.

Testing

More test cases are in the subfolder .test . To run the workflow with one of the test data sets adjust the paths for samples , fastq-data and eval-data in .test/config/config.yaml for the desired test dataset and run the script start_test.sh . Also note that for the different data sets you may have to adjust the threshold values in the configuration file. Test cases are also automatically executed via continuous integration with Github Actions .

Code Snippets

script:
    "../scripts/sim_to_fasta.py"

SnakeMake From line 16 of rules/evaluation.smk

script:
    "../scripts/vcf_to_fasta.py"

SnakeMake From line 28 of rules/evaluation.smk

wrapper:
    "v1.25.0/bio/samtools/faidx"

SnakeMake From line 40 of rules/evaluation.smk

shell:
    "makeblastdb -in {input.fasta_sim} -input_type fasta -dbtype nucl -parse_seqids"

SnakeMake From line 56 of rules/evaluation.smk

shell:
    "blastn -query {input.res} -outfmt 6 -db {params.dbname} -out {output.blast}"

SnakeMake BLAST From line 73 of rules/evaluation.smk

shell:
    "echo -ne \"{params.header}\" | cat - {input} > {output}"

SnakeMake From line 85 of rules/evaluation.smk

script:
    "../scripts/plots_blast.R"

SnakeMake From line 102 of rules/evaluation.smk

wrapper:
    "v1.25.0/bio/minimap2/index"

SnakeMake From line 9 of rules/noderad.smk

wrapper:
    "v1.25.0/bio/minimap2/aligner"

SnakeMake From line 24 of rules/noderad.smk

wrapper:
    "v1.25.0/bio/samtools/view"

SnakeMake From line 37 of rules/noderad.smk

script:
    "../scripts/noderad_main.py"

SnakeMake From line 72 of rules/noderad.smk

wrapper:
    "v1.25.0/bio/fastqc"

SnakeMake FastQC From line 11 of rules/qc.smk

wrapper:
    "v1.25.0/bio/multiqc"

SnakeMake MultiQC From line 21 of rules/qc.smk

import sys
import gzip
import pysam
from graph_tool.all import *
import numpy as np
import graph_operations
import likelihood_operations

sys.stderr = open(snakemake.log[0], "w")

sample = snakemake.wildcards.get('sample')

# input
bam = snakemake.input.get("bam")
reads = snakemake.input.get("fastq")

# required output
vcf = open(snakemake.output.get("vcf", ""), "w")
vcf_header = "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\t{}\n".format(sample)
vcf.write(vcf_header)

# optional output
graph_xml = snakemake.output.get("graph_xml", "")
output_figure = snakemake.output.get("graph_figure", "")
connected_components_xml = snakemake.output.get("connected_components_xml", "")
connected_components_figure = snakemake.output.get("connected_components_figure", "")
dir_subgraphs = snakemake.output.get("components_subgraphs", "")

# params for ploidy, threshold, noise and cluster-size
threshold = snakemake.params.get("threshold_max_edit_distance", "")
ploidy = snakemake.params.get("ploidy", "")  # a diploid chromosome set is determined for the prototype
noise_small = snakemake.params.get("treshold_seq_noise_small", "")
noise_large = snakemake.params.get("treshold_seq_noise_large", "")
cluster_size = snakemake.params.get("treshold_cluster_size", "")

# get reads format
format = ""  # format of reads
if reads.endswith((".fastq", ".fq", ".fastq.gz", ".fq.gz")):
    format = "fastq"
if reads.endswith((".fasta", ".fa", ".fasta.gz", ".fa.gz")):
    format = "fasta"

# default value of threshold for maximum distance value
if threshold == "":
    threshold = 23
else:
    threshold = int(threshold)

# default value of noise
if not noise_small:
    noise_small = 1
if not noise_large:
    noise_large = 1

# init graph
graph = Graph(directed=True)

# set graph properties
for (name, prop, prop_type) in graph_operations.set_properties():
    if name.startswith("g_"):
        vars()[name] = graph.new_graph_property(prop_type)
        graph.graph_properties[prop] = vars()[name]
    if name.startswith("v_"):
        vars()[name] = graph.new_vertex_property(prop_type)
        graph.vertex_properties[prop] = vars()[name]
    if name.startswith("e_"):
        vars()[name] = graph.new_edge_property(prop_type)
        graph.edge_properties[prop] = vars()[name]

# set ploidy, sequencing error and heterozygosity
graph.graph_properties["ploidy"] = ploidy

graph.graph_properties["ins-error-rates"] = snakemake.params.get("err_ins", "")
graph.graph_properties["del-error-rates"] = snakemake.params.get("err_del", "")

graph.graph_properties["subst-heterozygosity"] = snakemake.params.get("heterozyg_subst", "")
graph.graph_properties["ins-heterozygosity"] = snakemake.params.get("heterozyg_ins", "")
graph.graph_properties["del-heterozygosity"] = snakemake.params.get("heterozyg_del", "")

# create first empty node for graph
node = graph.add_vertex()
v_id[node] = "{idx}_{sample}".format(idx=0, sample=sample)
v_name[node] = ""
v_seq[node] = ""
v_q_qual[node] = ""

# add reads as vertices of the graph
if reads.endswith(".gz"):
    with gzip.open(reads, "rt") as _reads:
        graph = graph_operations.set_nodes(graph, _reads, format, sample)
else:
    with open(reads, "rU") as _reads:
        graph = graph_operations.set_nodes(graph, _reads, format, sample)

# add edges from all-vs-all alignment of reads (please see rule minimap2)
verbose = pysam.set_verbosity(0)  # https://github.com/pysam-developers/pysam/issues/939
bam = pysam.AlignmentFile(bam, "rb")
pysam.set_verbosity(verbose)
for read in bam.fetch(until_eof=True):
    graph = graph_operations.set_edges(graph, read, threshold)
bam.close()
graph.remove_vertex(0)


# write log files
sys.stderr.write(
    "graph construction summary for sample {}:"
    "\n nodes:\t{}\n edges:\t{}\n".format(sample, graph.num_vertices(), graph.num_edges()))

graph_operations.save_and_draw_graph(graph, xml_out=graph_xml,
                                     figure_out=output_figure)

# in case subgraph directory is expected as optional output, but some samples do not produce
# subgraphs. This way the empty directories for these samples preserved and are not removed by snakemake.
graph_operations.set_dir(dir_subgraphs)

# step 1: extract connected components
message = "CONNECTED COMPONENTS based on the graph construction from the edit distances (minimap2)"
connected_components = graph_operations.get_components(graph, message, snakemake.wildcards.get('sample'),
                                                       dir_subgraphs, connected_components_xml,
                                                       connected_components_figure, v_color="component-label")

graph = None
loc_nr = 0
for (comp, comp_nr) in connected_components:
    # step 2: likelihood of allele fractions
    alleles = likelihood_operations.get_candidate_alleles(comp, comp.vertices(), noise_small, noise_large, cluster_size)
    n = len(alleles)
    vafs_candidates = list(likelihood_operations.get_candidate_vafs(n, ploidy))
    nodes = list([comp.vertex_index[node] for node in comp.vertices()])
    read_allele_likelihoods = {}

    # calculate the likelihood over ALL reads
    vafs_likelihoods = [likelihood_operations.calc_vafs_likelihood(comp, vafs, nodes, alleles, read_allele_likelihoods) for vafs in
                        vafs_candidates]
    if not vafs_likelihoods:  # case empty list, e.g. if the treshold-seq-noise value is set too large
        continue
    max_likelihood_idx = np.argmax(vafs_likelihoods)

    # obtain ML solution for step 2
    max_likelihood_vafs = vafs_candidates[max_likelihood_idx]

    # write to log file
    sys.stderr.write(
        "\n\nStats for component {} in sample {} with {} alleles and ploidy = {}:\n".format(comp_nr, sample,
                                                                                            n, ploidy))
    sys.stderr.write("\n\tMaximum vafs likelihood:\n")
    for vaf, allele in zip(max_likelihood_vafs, alleles):
        sys.stderr.write("\t\t{} for allele {}\n".format(vaf, allele))

    # step 3: likelihood of loci given alleles and fractions
    loci_candidates = list(likelihood_operations.get_candidate_loci(n, ploidy, max_likelihood_vafs))
    loci_likelihoods = {}

    loci_likelihoods = [
        likelihood_operations.calc_loci_likelihoods(comp, max_likelihood_vafs, alleles, loci, loci_likelihoods)
        for loci in loci_candidates
    ]
    max_likelihood_idx = np.argmax(loci_likelihoods)
    max_likelihood_loci = loci_candidates[max_likelihood_idx]

    # write to log file
    sys.stderr.write("\n\tMaximum loci likelihood calculations:\n")
    sys.stderr.write("\t\tloci_likelihoods:\n\t\t\t{}\n\t\tmax_likelihood_idx:\n\t\t\t{}"
                     "\n\t\tmax_likelihood_loci:\n\t\t\t{}\n".format(loci_likelihoods, max_likelihood_idx,
                                                                     max_likelihood_loci))

    # step 4: results output to VCF file
    loci_alleles = likelihood_operations.get_sorted_loci_alleles(alleles, max_likelihood_loci)
    gt_indices = likelihood_operations.get_gt_indices(alleles, max_likelihood_loci, loci_alleles)

    for gt_idx_locus in list(set(gt_indices)):
        vcf.write("{chrom}\t{pos}\t.\t{ref}\t{alt}\t.\t.\t.\tGT\t{gt}\n".format(chrom="LOC{}".format(loc_nr),
                                                                             pos="1", ref=loci_alleles[0],
                                                                             alt=', '.join(loci_alleles[1:]) if len(loci_alleles) > 1 else ".",
                                                                             gt=likelihood_operations.get_genotype(gt_idx_locus)))
        loc_nr += 1

Python numpy pysam graph-tool From line 1 of scripts/noderad_main.py

library("tidyverse")
library("stringr")

log <- file(snakemake@log[[1]], open="wt")
sink(log)
sink(log, type="message")

blast <- snakemake@input[[1]]
blast_data <- read_tsv(blast, col_names = TRUE, trim_ws = TRUE)
names(blast_data) <- c('res_id','sim_id','identity', 'len_alignment', 'mismatches', 'gap_opens', 'q_start', 'q_end', 's_start', 's_end', 'evalue', 'bit_score')

blast_data <- blast_data %>%
  mutate(res_id=as.character(res_id)) %>%
  separate(res_id, c("sample", "individual", "locus"), sep = "\\|", extra="merge", convert=TRUE) #%>%

for(i in blast_data$locus) {
  if(nchar(strsplit(i, "-")[[1]][1])<5){
    blast_data$locus[blast_data$locus==i] <- str_replace(blast_data$locus[blast_data$locus==i], 'LOC', 'LOC0')
  }
}

plot <- ggplot(data = blast_data, aes(y=locus, x=identity)) +
  geom_bar(width = 1.0, position = "dodge", stat="identity", aes(fill = identity), colour="Black") +
  scale_fill_gradient(low="mediumpurple3",high="green3", name = "Identity") +
  ggtitle("Indentity [%] of loci identified by NodeRAD vs. simulated loci") +
  xlab("Identity [%]") +
  ylab("Locus") +
  theme_minimal() +
  theme(aspect.ratio = 2.5/1.5, plot.title = element_text(hjust = 0.5), legend.position = "right", legend.key.size = unit(0.8, "cm"), axis.text.y = element_text(hjust = 0))
plot
ggsave(snakemake@output[["ident"]], width = 7, height = 7)

blast_data$intervals <- cut(blast_data$identity, seq(0,100,by=1))
plot <- ggplot(blast_data, aes(intervals)) +
    geom_histogram(stat= "count", aes(fill = ..count..)) +
    xlab("Identity [%]") +
    ylab("Counts") +
    scale_fill_gradient(name = "Counts")+
    theme(aspect.ratio = 2.5/1.5, plot.title = element_text(hjust = 0.5), legend.position = "right") +
    ggtitle("Histogram of number of alleles identified by NodeRAD\nwith respect to their similarity to simulated data.")
plot
ggsave(snakemake@output[["ident_hist"]], width = 7, height = 7)

plot <- ggplot(data = blast_data, aes(x=locus, y=bit_score, group = individual)) +
  geom_line(color = "gray70") +
  geom_point(aes(color = bit_score), size =3) +
  geom_point(shape = 1,size = 3, colour = "black") +
  scale_color_gradient(low="red", high="green", name = "Bit score") +
  ggtitle("Bitscores of loci identified by NodeRAD vs. simulated loci") +
  xlab("Locus") +
  ylab("Bit score") +
  theme_minimal() +
  theme(axis.text.x = element_text(color = "black", size = 7, angle = 90, hjust = 0, face = "plain"), plot.title = element_text(hjust = 0.5), legend.position = "right", legend.key.size = unit(0.4, "cm"), axis.text.y = element_text(hjust = 0))
plot
ggsave(snakemake@output[["bit_scores"]], width = 7, height = 7)

plot <- ggplot(data = blast_data, aes(x=locus, y=evalue, group = individual)) +
  geom_line(color = "gray70") +
  geom_point(aes(color = evalue), size =3) +
  geom_point(shape = 1,size = 3, colour = "black") +
  scale_color_gradient(low="green", high="red", name = "E-value") +
  ggtitle("E-Values of loci identified by NodeRAD vs. simulated loci") +
  xlab("Locus") +
  ylab("E-Value") +
  theme_minimal() +
  theme(axis.text.x = element_text(color = "black", size = 7, angle = 90, hjust = 0, face = "plain"), plot.title = element_text(hjust = 0.5), legend.position = "right", legend.key.size = unit(0.4, "cm"), axis.text.y = element_text(hjust = 0))
plot
ggsave(snakemake@output[["evalues"]], width = 7, height = 7)

R ggplot2 tidyverse BLAST stringr From line 1 of scripts/plots_blast.R

import sys
import yaml
from itertools import chain
import re

sys.stderr = open(snakemake.log[0], "w")

sample = snakemake.wildcards.get('sample')

# input
sim_in = snakemake.input.get("sim_data_stats")
individual_name=snakemake.params.get("individual")
individual = snakemake.params.get("individual").replace("_", " ")

fasta = ""

# read yaml file
yaml_in = open(sim_in, 'r')
yaml_in_load = list(yaml.load_all(yaml_in, Loader=yaml.FullLoader))
sim_data_file = yaml_in_load[1]

# parse fasta data from yaml
for key_loc in sim_data_file.keys():
    # loci sequences
    seq = sim_data_file[key_loc]["p5 seq"]

    # loci mutations of given individual
    indiv_data = sim_data_file[key_loc]["individuals"][individual]
    mutations = list([indiv_data[item]["mutations"] for item in indiv_data])
    # modifying the sequence when mutations are present for given individual and locus
    index = 0
    if mutations:
        for mutation in mutations:
            if mutation:
                mut_list = [re.split("@", item)[1] for item in mutation]
                mut_tuples = [tuple(re.split(":", mut)) for mut in mut_list]
                for (pos, mut) in mut_tuples:
                    if '(' or ')' in pos:
                        idx = re.split("\(", re.split("\)", pos)[0])[-1]
                    idx = int(pos)  # nucleotide position indexing in ddRAGE starts at 0,
                    # but it should start at position 1 https://varnomen.hgvs.org/bg-material/numbering/
                    if ">" in mut:
                        mut_subs = re.split(">", mut)
                        if seq[idx] == mut_subs[0]:
                            seq = seq[:idx] + mut_subs[-1] + seq[idx + 1:]
                    if "+" in mut:
                        mut_subs = re.split('\+', mut)[-1]
                        seq = seq[:idx] + mut_subs + seq[idx:]
                    if "-" in mut:
                        mut_subs = re.split('\-', mut)[-1]
                        if all([seq[idx] == mut_subs[i] for i in range(len(mut_subs))]):
                            seq = seq[:idx] + seq[idx + len(mut_subs):]
            # parsing to fasta format
            fasta_id = ">{}|{}-{}|simulated".format(individual_name, key_loc.replace(" ", "_"), index)
            fasta += "{}\n{}\n".format(fasta_id, seq)
            index += 1
    else:
        # parsing to fasta format
        fasta_id = ">{}|{}|simulated".format(individual_name, key_loc.replace(" ", "_"))
        fasta += "{}\n{}\n".format(fasta_id, seq)
# write fasta file
fasta_file = open(snakemake.output.get("fasta_sim", ""), "w")
fasta_file.write(fasta)

Python PyYAML From line 1 of scripts/sim_to_fasta.py

import sys
import pandas as pd

sys.stderr = open(snakemake.log[0], "w")

sample = snakemake.wildcards.get('sample')

# input
vcf_in = pd.read_csv(snakemake.input.get("vcf_data"), sep="\t")[["#CHROM", "REF", "ALT"]]

individual = snakemake.params.get("individual").replace(" ", "_")

prefix = "{sample}|{indiv}|".format(sample=sample,indiv=individual)
fasta = ""

for i in range(len(vcf_in["#CHROM"])):
    name = vcf_in["#CHROM"][i]
    fasta += ">{prefix}{loc}-REF\n{ref}\n".format(prefix=prefix, loc=name, ref=vcf_in["REF"][i])
    variants = str(vcf_in["ALT"][i]).split(",")
    if not variants[0] == ".":
        idx = 1
        for var in variants:
            fasta += ">{prefix}{loc}-ALT_{idx}\n{alt}\n".format(prefix=prefix, loc=name, idx=idx, alt=var)
            idx += 1

# write fasta file
fasta_file = open(snakemake.output.get("vcf_fasta", ""), "w")
fasta_file.write(fasta)

Python Pandas From line 1 of scripts/vcf_to_fasta.py

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "[email protected]"
__license__ = "MIT"


from os import path
import re
from tempfile import TemporaryDirectory

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)


def basename_without_ext(file_path):
    """Returns basename of file path, without the file extension."""

    base = path.basename(file_path)
    # Remove file extension(s) (similar to the internal fastqc approach)
    base = re.sub("\\.gz$", "", base)
    base = re.sub("\\.bz2$", "", base)
    base = re.sub("\\.txt$", "", base)
    base = re.sub("\\.fastq$", "", base)
    base = re.sub("\\.fq$", "", base)
    base = re.sub("\\.sam$", "", base)
    base = re.sub("\\.bam$", "", base)

    return base


# Run fastqc, since there can be race conditions if multiple jobs
# use the same fastqc dir, we create a temp dir.
with TemporaryDirectory() as tempdir:
    shell(
        "fastqc {snakemake.params} -t {snakemake.threads} "
        "--outdir {tempdir:q} {snakemake.input[0]:q}"
        " {log}"
    )

    # Move outputs into proper position.
    output_base = basename_without_ext(snakemake.input[0])
    html_path = path.join(tempdir, output_base + "_fastqc.html")
    zip_path = path.join(tempdir, output_base + "_fastqc.zip")

    if snakemake.output.html != html_path:
        shell("mv {html_path:q} {snakemake.output.html:q}")

    if snakemake.output.zip != zip_path:
        shell("mv {zip_path:q} {snakemake.output.zip:q}")

Python Snakemake FastQC From line 3 of fastqc/wrapper.py

__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "[email protected]"
__license__ = "MIT"


from os import path
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import infer_out_format
from snakemake_wrapper_utils.samtools import get_samtools_opts


samtools_opts = get_samtools_opts(snakemake, parse_output=False)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
sort = snakemake.params.get("sorting", "none")
sort_extra = snakemake.params.get("sort_extra", "")

out_ext = infer_out_format(snakemake.output[0])

pipe_cmd = ""
if out_ext != "PAF":
    # Add option for SAM output
    extra += " -a"

    # Determine which pipe command to use for converting to bam or sorting.
    if sort == "none":
        if out_ext != "SAM":
            # Simply convert to output format using samtools view.
            pipe_cmd = f"| samtools view -h {samtools_opts}"

    elif sort in ["coordinate", "queryname"]:
        # Add name flag if needed.
        if sort == "queryname":
            sort_extra += " -n"

        # Sort alignments.
        pipe_cmd = f"| samtools sort {sort_extra} {samtools_opts}"

    else:
        raise ValueError(f"Unexpected value for params.sort: {sort}")


shell(
    "(minimap2"
    " -t {snakemake.threads}"
    " {extra} "
    " {snakemake.input.target}"
    " {snakemake.input.query}"
    " {pipe_cmd}"
    " > {snakemake.output[0]}"
    ") {log}"
)

Python Snakemake SAMtools snakemake-wrapper-utils From line 1 of aligner/wrapper.py

__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "[email protected]"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "(minimap2 -t {snakemake.threads} {extra} "
    "-d {snakemake.output[0]} {snakemake.input.target}) {log}"
)

Python Snakemake From line 1 of index/wrapper.py

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "[email protected]"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
# Set this to False if multiqc should use the actual input directly
# instead of parsing the folders where the provided files are located
use_input_files_only = snakemake.params.get("use_input_files_only", False)

if not use_input_files_only:
    input_data = set(path.dirname(fp) for fp in snakemake.input)
else:
    input_data = set(snakemake.input)

output_dir = path.dirname(snakemake.output[0])
output_name = path.basename(snakemake.output[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "multiqc"
    " {extra}"
    " --force"
    " -o {output_dir}"
    " -n {output_name}"
    " {input_data}"
    " {log}"
)

Python Snakemake MultiQC From line 3 of multiqc/wrapper.py

__author__ = "Michael Chambers"
__copyright__ = "Copyright 2019, Michael Chambers"
__email__ = "[email protected]"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts

samtools_opts = get_samtools_opts(
    snakemake, parse_threads=False, parse_write_index=False, parse_output_format=False
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell("samtools faidx {samtools_opts} {extra} {snakemake.input[0]} {log}")

Python Snakemake SAMtools snakemake-wrapper-utils From line 1 of faidx/wrapper.py

__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "[email protected]"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts

samtools_opts = get_samtools_opts(snakemake)
extra = snakemake.params.get("extra", "")
region = snakemake.params.get("region", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)


shell("samtools view {samtools_opts} {extra} {snakemake.input[0]} {region} {log}")