Long Read Pipeline with Alignment and Dependency Information

public public 1yr ago 0 bookmarks

long read pipeline

:speech_balloon: Introduction

The module consists of alignment ....

:heavy_exclamation_mark: Dependencies

In order to use this module, the following dependencies are required:

:school_satchel: Preparations

Sample data

Input data should be added to samples.tsv and units.tsv . The following information need to be added to these files:

Column Id Description
samples.tsv
sample unique sample/patient id, one per row
units.tsv
sample same sample/patient id as in samples.tsv
type data type identifier (one letter), can be one of T umor, N ormal, R NA
platform type of sequencing platform, e.g. NovaSeq
machine specific machine id, e.g. NovaSeq instruments have @Axxxxx
flowcell identifer of flowcell used
lane flowcell lane number
barcode sequence library barcode/index, connect forward and reverse indices by + , e.g. ATGC+ATGC
fastq1/2 absolute path to forward and reverse reads
adapter adapter sequences to be trimmed, separated by comma

:white_check_mark: Testing

The workflow repository contains a small test dataset .tests/integration which can be run like so:

$ cd .tests/integration
$ snakemake -s ../../Snakefile -j1 --use-singularity

:rocket: Usage

To use this module in your workflow, follow the description in the snakemake docs . Add the module to your Snakefile like so:

module prealignment:
 snakefile:
 github(
 "long_read",
 path="workflow/Snakefile",
 tag="1.0.0",
 )
 config:
 config
use rule * from long_read as long_read_*

Output files

The following output files should be targeted via another rule:

File Description
long_read/PATH/FILE DESCRIPTION

:judge: Rule Graph

Code Snippets

33
34
wrapper:
    "v1.28.0/bio/minimap2/aligner"
32
33
34
35
36
37
shell:
    "(pbsv discover "
    "{input.bam} "
    "{output.svsig} "
    "{params.extra}) "
    "&> {log}"
SnakeMake From line 32 of rules/pbsv.smk
68
69
70
71
72
73
74
shell:
    "(pbsv call "
    "{input.ref} "
    "-r {input.svsig} "
    "{output.vcf} "
    "{params.extra} "
    "&> {log}"
SnakeMake From line 68 of rules/pbsv.smk
31
32
33
34
35
36
37
script:
    "sniffles -i {input.bam} "
    "--reference {input.fasta} "
    "-t {threads} "
    "{params.non_germline} "
    "{params.extra} "
    "-v {output.vcf} &> {log} "
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "[email protected]"
__license__ = "MIT"


from os import path
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import infer_out_format
from snakemake_wrapper_utils.samtools import get_samtools_opts


samtools_opts = get_samtools_opts(snakemake, parse_output=False)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
sort = snakemake.params.get("sorting", "none")
sort_extra = snakemake.params.get("sort_extra", "")

out_ext = infer_out_format(snakemake.output[0])

pipe_cmd = ""
if out_ext != "PAF":
    # Add option for SAM output
    extra += " -a"

    # Determine which pipe command to use for converting to bam or sorting.
    if sort == "none":
        if out_ext != "SAM":
            # Simply convert to output format using samtools view.
            pipe_cmd = f"| samtools view -h {samtools_opts}"

    elif sort in ["coordinate", "queryname"]:
        # Add name flag if needed.
        if sort == "queryname":
            sort_extra += " -n"

        # Sort alignments.
        pipe_cmd = f"| samtools sort {sort_extra} {samtools_opts}"

    else:
        raise ValueError(f"Unexpected value for params.sort: {sort}")


shell(
    "(minimap2"
    " -t {snakemake.threads}"
    " {extra} "
    " {snakemake.input.target}"
    " {snakemake.input.query}"
    " {pipe_cmd}"
    " > {snakemake.output[0]}"
    ") {log}"
)
ShowHide 3 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/hydra-genetics/long_read
Name: long_read
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Accessed: 6
Downloaded: 0
Copyright: Public Domain
License: GNU General Public License v3.0
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...