Long Read Pipeline with Alignment and Dependency Information

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

long read pipeline

:speech_balloon: Introduction

The module consists of alignment ....

:heavy_exclamation_mark: Dependencies

In order to use this module, the following dependencies are required:

:school_satchel: Preparations

Sample data

Input data should be added to samples.tsv and units.tsv . The following information need to be added to these files:

Column Id	Description
`samples.tsv`
sample	unique sample/patient id, one per row
`units.tsv`
sample	same sample/patient id as in `samples.tsv`
type	data type identifier (one letter), can be one of T umor, N ormal, R NA
platform	type of sequencing platform, e.g. `NovaSeq`
machine	specific machine id, e.g. NovaSeq instruments have `@Axxxxx`
flowcell	identifer of flowcell used
lane	flowcell lane number
barcode	sequence library barcode/index, connect forward and reverse indices by `+` , e.g. `ATGC+ATGC`
fastq1/2	absolute path to forward and reverse reads
adapter	adapter sequences to be trimmed, separated by comma

:white_check_mark: Testing

The workflow repository contains a small test dataset .tests/integration which can be run like so:

$ cd .tests/integration
$ snakemake -s ../../Snakefile -j1 --use-singularity

:rocket: Usage

To use this module in your workflow, follow the description in the snakemake docs . Add the module to your Snakefile like so:

module prealignment:
 snakefile:
 github(
 "long_read",
 path="workflow/Snakefile",
 tag="1.0.0",
 )
 config:
 config
use rule * from long_read as long_read_*

Output files

The following output files should be targeted via another rule:

File	Description
`long_read/PATH/FILE`	DESCRIPTION

:judge: Rule Graph

Code Snippets

wrapper:
    "v1.28.0/bio/minimap2/aligner"

SnakeMake From line 33 of rules/minimap2.smk

shell:
    "(pbsv discover "
    "{input.bam} "
    "{output.svsig} "
    "{params.extra}) "
    "&> {log}"

SnakeMake From line 32 of rules/pbsv.smk

shell:
    "(pbsv call "
    "{input.ref} "
    "-r {input.svsig} "
    "{output.vcf} "
    "{params.extra} "
    "&> {log}"

SnakeMake From line 68 of rules/pbsv.smk

script:
    "sniffles -i {input.bam} "
    "--reference {input.fasta} "
    "-t {threads} "
    "{params.non_germline} "
    "{params.extra} "
    "-v {output.vcf} &> {log} "

SnakeMake sniffles From line 31 of rules/sniffles.smk

__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "[email protected]"
__license__ = "MIT"


from os import path
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import infer_out_format
from snakemake_wrapper_utils.samtools import get_samtools_opts


samtools_opts = get_samtools_opts(snakemake, parse_output=False)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
sort = snakemake.params.get("sorting", "none")
sort_extra = snakemake.params.get("sort_extra", "")

out_ext = infer_out_format(snakemake.output[0])

pipe_cmd = ""
if out_ext != "PAF":
    # Add option for SAM output
    extra += " -a"

    # Determine which pipe command to use for converting to bam or sorting.
    if sort == "none":
        if out_ext != "SAM":
            # Simply convert to output format using samtools view.
            pipe_cmd = f"| samtools view -h {samtools_opts}"

    elif sort in ["coordinate", "queryname"]:
        # Add name flag if needed.
        if sort == "queryname":
            sort_extra += " -n"

        # Sort alignments.
        pipe_cmd = f"| samtools sort {sort_extra} {samtools_opts}"

    else:
        raise ValueError(f"Unexpected value for params.sort: {sort}")


shell(
    "(minimap2"
    " -t {snakemake.threads}"
    " {extra} "
    " {snakemake.input.target}"
    " {snakemake.input.query}"
    " {pipe_cmd}"
    " > {snakemake.output[0]}"
    ") {log}"
)