An awesome Oxford Nanopore Pipeline for direct RNA-sequencing

public 1yr ago Version: v0.5.0 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

modr 🔬

An Oxford Nanopore direct RNA-sequencing Pipeline

This is the home of the pipeline, modr. Its long-term goals: to accurately estimate known and novel transcript expression, to predict poly-A tail lengths, and to detect RNA modifications in Oxford Nanopore direct RNA-sequencing data like no pipeline before!

Overview

Welcome to modr! Before getting started, we highly recommend reading through modr's documentation .

The ./modr pipeline is composed several inter-related sub commands to setup and run the pipeline across different systems. Each of the available sub commands perform different functions:

modr run : Run the modr pipeline with your input files.
modr unlock : Unlocks a previous runs output directory.
modr install : Download reference files locally.
modr cache : Cache software containers locally.

modr is an awesome pipeline to detect known/novel transcript expression, poly-A tail lengths, and RNA-editing in Oxford Nanopore direct RNA sequencing data. It relies on technologies like Singularity1 to maintain the highest-level of reproducibility. The pipeline consists of a series of data processing and quality-control steps orchestrated by Snakemake2 , a flexible and scalable workflow management system, to submit jobs to a cluster.

The pipeline is compatible with data generated from Nanopore sequencing technologies . As input, it accepts a set of FastQ & Fast5 files and can be run locally on a compute instance or on-premise using a cluster. A user can define the method or mode of execution. The pipeline can submit jobs to a cluster using a job scheduler like SLURM (more coming soon!). A hybrid approach ensures the pipeline is accessible to all users.

Before getting started, we highly recommend reading through the usage section of each available sub command.

For more information about issues or trouble-shooting a problem, please checkout our FAQ prior to opening an issue on Github .

Dependencies

Requires: singularity>=3.5 snakemake>=6.0 conda/mamba (optional)

Snakemake must be installed on the target system. Snakemake is a workflow manager that orchestrates each step of the pipeline. The second dependency, i.e singularity OR conda/mamba , handles the dowloading/installation of any remaining software dependencies. By default, the pipeline will utilize singularity to guarantee the highest level of reproducibility; however, the --use-conda option of the run sub command can be provided to use conda/mamba instead of singularity. If possible, we recommend using singularity over conda for reproducibility; however, it is worth noting that singularity and conda produce identical results for this pipeline.

If you are running the pipeline on Windows, please use the Windows Subsystem for Linux (WSL) . Singularity can be installed on WSL following these instructions .

Installation

Please clone this repository to your local filesystem using the following command:

# Clone Repository from Github
git clone https://github.com/OpenOmics/modr.git
# Change your working directory
cd modr/
# Add dependencies to $PATH
# Biowulf users should run
module load snakemake singularity
# Get usage information
./modr -h

For more detailed installation instructions, please see our setup page .

Contribute

This site is a living document, created for and by members like you. modr is maintained by the members of OpenOmics and is improved by continous feedback! We encourage you to contribute new content and make improvements to existing content via pull request to our GitHub repository .

References

1. Kurtzer GM, Sochat V, Bauer MW (2017). Singularity: Scientific containers for mobility of compute. PLoS ONE 12(5): e0177459.
2. Koster, J. and S. Rahmann (2018). "Snakemake-a scalable bioinformatics workflow engine." Bioinformatics 34(20): 3600.

Code Snippets

shell: """
# Find differential alternative
# splicing events such as skipped
# exons, alternative 5'/3' splice 
# sites, and retained introns 
flair diffSplice \\
    --threads {threads} \\
    --test \\
    --isoforms {input.isoforms} \\
    --counts_matrix {input.counts} \\
    --conditionA {params.group1} \\
    --conditionB {params.group2} \\
    --out_dir {params.outdir} \\
    --out_dir_force
"""

SnakeMake From line 46 of rules/altsplice.smk

shell: """
# Subset counts matrix to contain 
# only samples belonging to the 
# contrast of interest, this will
# ensure the correct group is set
# as the baseline for design matrix.
# By default, the first group it 
# encounters in the sample names 
# will be the intercept. 
# Get the first group's samples
grp1_samples=$(
    awk -F '\\t' -v OFS='_' \\
        '$2=="{params.group1}" {{print $1,$2,$3}}' \\
        {input.manifest}
)
# Get the second group's samples
grp2_samples=$(
    awk -F '\\t' -v OFS='_' \\
        '$2=="{params.group2}" {{print $1,$2,$3}}' \\
        {input.manifest}
)
# Create per-contrast counts matrix,
# extracts samples by their col name
{params.script} -c ids ${{grp1_samples}} ${{grp2_samples}} \\
    -i {input.counts} \\
> {output.counts}

# Find differential gene/isoform 
# expression and find differential
# isoform usage
flair diffExp \\
    --threads {threads} \\
    --counts_matrix {output.counts} \\
    --out_dir {params.outdir} \\
    --exp_thresh 10 \\
    --out_dir_force
"""

SnakeMake From line 39 of rules/diffexp.smk

shell: 
    """
    # Download primary assembly,
    # reference genome sequence
    wget -O - \\
        {params.uri} \\
    | gzip -d \\
    > {output.genome}
    """

SnakeMake From line 21 of rules/download.smk

shell: 
    """
    # Download transcript sequences
    wget -O - \\
        {params.uri} \\
    | gzip -d \\
    > {output.transcripts}
    """

SnakeMake From line 51 of rules/download.smk

shell: 
    """
    # Download PRI gene annotation
    wget -O - \\
        {params.uri} \\
    | gzip -d \\
    > {output.gtf}
    """

SnakeMake From line 80 of rules/download.smk

shell: 
    """
    # Setups temporary directory for
    # intermediate files with built-in 
    # mechanism for deletion on exit
    if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi
    tmp=$(mktemp -d -p "{params.tmpdir}")
    trap 'rm -rf "${{tmp}}"' EXIT

    # Convert any U bps to T 
    zcat {input.fq} \\
        | sed 's/U/T/g' \\
    >  {output.fq}

    # Align against genome with graphmap2
    graphmap2 align \\
        -x rnaseq \\
        -t {threads} \\
        -r {input.ref} \\
        -d {output.fq} \\
        -o {params.sam}

    # Convert to bam, index, and sort
    samtools sort -@{threads} \\
        -T "${{tmp}}" \\
        -O bam \\
        -o {output.bam} \\
        {params.sam}
    samtools index -@{threads} \\
        {output.bam} \\
        {output.bai}
    """

SnakeMake SAMtools graphmap2 From line 70 of rules/editing_a-to-i.smk

shell: 
    """
    # Build an index mapping
    # basecalled reads to event 
    # signals from the sequencer
    nanopolish index \\
        -d {params.f5} \\
        {input.fq}
    # Overlay event signal to 
    # kmers of reference genome
    nanopolish eventalign \\
        --reads {input.fq} \\
        --bam {input.bam} \\
        --genome {input.ref} \\
        --summary {output.summary} \\
        --print-read-names \\
        --threads {threads} \\
        --scale-events \\
    > {output.events}
    """

SnakeMake nanopolish From line 133 of rules/editing_a-to-i.smk

shell: 
    """
    # Build an sequence dictionary
    java -jar ${{PICARDJARPATH}}/picard.jar \\
        CreateSequenceDictionary \\
        R={input.ref} \\
        O={output.dct}
    # Create FASTA index
    samtools faidx \\
        {input.ref}
    """

SnakeMake SAMtools Picard From line 176 of rules/editing_a-to-i.smk

shell: 
    """
    # Convert SAM to raw features TSV
    samtools view \\
        -@{threads} \\
        -h \\
        -F 4 \\
        {input.bam} \\
    | java -jar ${{PICARDJARPATH}}/sam2tsv.jar -r {input.ref} \\
    | awk 'BEGIN{{FS=OFS="\\t"}} ($9 != "S") && ($9 != "H") && ($9 != "N")' - \\
    | awk 'BEGIN{{FS=OFS="\\t"}} \\
        ($7=="."){{$7="-99";}} \\
        ($4=="."){{$4="-99"}} \\
        ($5=="."){{$5="na"}} \\
        ($8=="."){{$8="na"}} \\
        ($9=="D"){{$6=" "}} \\
        ($2==16){{$2="n"}} \\
        ($2==0){{$2="p"}} 1' \\
    > {output.tsv}
    """

SnakeMake SAMtools From line 213 of rules/editing_a-to-i.smk

shell: 
    """
    # Setup for the combine step,
    # see github repo for more info:
    # https://github.com/darelab2014/Dinopore/blob/main/code/s2.Combine_raw_nnpl.sh
    cd "{params.outdir}"
    head -n 1 {input.events} \\
        > {params.header}
    noline=$(wc -l {input.events} | cut -d " " -f 1)
    tmp="{params.tmp}"
    num=2
    chunk=10000000
    num1=$(expr $num + $chunk)
    num2=$(expr $num1 + 1)
    filecount=1

    # Combine event-level signal information
    while [ $num -le $noline ]; do
        sed -n "$num,${{num1}}p; ${{num2}}q" {input.events} > "$tmp"
        chk=$(wc -l "$tmp" | cut -f 1 -d " ")
        echo "Processing reads $num to $num1 in file $filecount"
        if [ $chk -gt 0 ]; then
            Rscript ${{DINOPORE_CODE}}/s2.Combine_raw_nanopolish.R \\
                -f "$tmp" \\
                -t {threads} \\
                -o {output.events}.part${{filecount}} \\
                -s ${{noline}} \\
                -n ${{num}} \\
                -c ${{chunk}}
            if test -f tmp.eli; then
                eli=$(cat tmp.eli)
                rm "tmp.eli"
            else
                eli=0
            fi
        fi
        num=$(( $num1 - $eli + 1 ))
        num1=$(expr $num + $chunk)
        num2=$(expr $num1 + 1)
        filecount=$(expr $filecount + 1)
    done

    # Combine results of all parts
    cat ${{DINOPORE_CODE}}/misc/nnpl.header > {output.events}
    cat {output.events}.part* | grep -v contig >> {output.events}
    rm -f tmp.nnpl {output.events}.part*
    """

SnakeMake From line 260 of rules/editing_a-to-i.smk

shell: 
    """
    # Setups temporary directory for
    # intermediate files with built-in 
    # mechanism for deletion on exit
    if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi
    tmp=$(mktemp -d -p "{params.tmpdir}")
    trap 'rm -rf "${{tmp}}"' EXIT

    # Setup for the feature table step,
    # see github repo for more info:
    # https://github.com/darelab2014/Dinopore/blob/main/code/S3.Generate_raw_features.sh
    cd "{params.outdir}"
    num=1
    chunk=100000
    num1=$(expr $num + $chunk - 1)
    num2=$(expr $num1 + 1)
    filecount=1

    # Extract common reads between filtered
    # alignment TSV and nanopolish events 
    awk -F '\\t' 'NR!=1 {{print $1}}' {input.tsv} \\
        | LC_ALL=C uniq \\
        | LC_ALL=C sort -S {params.memory} --parallel={threads}  \\
    > {params.tsv}
    awk '{{print $2}}' {input.events} \\
        | LC_ALL=C uniq \\
        | LC_ALL=C sort -S {params.memory} --parallel={threads}  \\
    > {params.events}
    # Prints lines/reads common in both files
    comm -12 \\
        {params.tsv} \\
        {params.events} \\
    > {params.common}
    noline=$(wc -l {params.common} | cut -d " " -f 1)

    # Generate raw features files
    while [ $num -le $noline ]; do
        echo "Processing reads $num to $num1 in file $filecount"
        sed -n "$num,${{num1}}p; ${{num2}}q" {params.common} \\
        > {params.tmp}
        chk=$(wc -l {params.tmp} | cut -f 1 -d " ")
        if [ $chk -gt 0 ]; then
            # The headers can also be found here:
            # https://github.com/darelab2014/Dinopore/tree/main/code/misc
            cat ${{DINOPORE_CODE}}/misc/nnpl.header \\
            > {params.innnpl}
            cat ${{DINOPORE_CODE}}/misc/tsv.header \\
            > {params.intsv}
            LC_ALL=C grep \\
                -h \\
                -F \\
                -w \\
                -f {params.tmp} \\
                {input.tsv} \\
            >> {params.intsv}
            LC_ALL=C grep \\
                -h \\
                -F \\
                -w \\
                -f {params.tmp} \\
                {input.events} \\
            >> {params.innnpl}

            # Create chunked raw features files,
            # Rscript cannot use files with abs paths
            rel_intsv="$(basename {params.intsv})"
            rel_innnpl="$(basename {params.innnpl})"
            Rscript ${{DINOPORE_CODE}}/s3.Generate_raw_feature_table.R \\
                -n ${{rel_innnpl}} \\
                -t ${{rel_intsv}} \\
                -o {params.outname}.part${{filecount}}

            num=$num2
            num1=$(expr $num + $chunk - 1)
            num2=$(expr $num1 + 1)
            filecount=$(expr $filecount + 1)
            rm {params.tmp}
        fi
    done

    # Combine results of all parts
    cat ${{DINOPORE_CODE}}/misc/inae.header \\
    > {output.table}
    cat {params.outname}.part*.positive \\
        | LC_ALL=C sort -T "${{tmp}}" -k1,1 -k3,3n \\
        | grep -v contig \\
    >> {output.table}
    cat {params.outname}.part*.negative \\
        | LC_ALL=C sort -T "${{tmp}}" -k1,1 -k3,3n \\
        | grep -v contig \\
    >> {output.table}

    # Cleanup intermediate files
    rm -f {params.tsv} \\
        {params.events} \\
        {params.common} \\
        {params.intsv} \\
        {params.innnpl} \\
        {params.outname}.part*.positive \\
        {params.outname}.part*.negative
    """

SnakeMake nanopolish From line 347 of rules/editing_a-to-i.smk

shell: 
    """
    # Aggregate features of all
    # reads at the same position
    # across multiple samples in 
    # the same group
    cd "{params.outdir}"
    Rscript ${{DINOPORE_CODE}}/s4.Aggregating_reads_pos.R \\
        -t {threads} \\
        -o {output.table} \\
        -r _{params.group} 
    """

SnakeMake From line 473 of rules/editing_a-to-i.smk

shell: 
    """
    # Transform/pre-process features
    # file to create input for CNN,
    # regression model
    cd "{params.outdir}"
    Rscript ${{DINOPORE_CODE}}/s5.Preprocess_data_matrix_inputCNN.R \\
        -t {threads} \\
        -i {input.table} \\
        -o {output.rdata} \\
        -c {params.truth}
    """

SnakeMake From line 508 of rules/editing_a-to-i.smk

shell: 
    """
    # Predict A-to-I RNA-editing
    # Class 0: A (unmodified)
    # Class 1: I (editied)
    # Class 2: G (SNP)
    cd "{params.outdir}"
    DINOPORE_CODE="{params.code}"
    Rscript ${{DINOPORE_CODE}}/s6.Predict_test_data.R \\
        -t {threads} \\
        -i {input.rdata}
    """

SnakeMake From line 548 of rules/editing_a-to-i.smk

shell: 
    """
    # Setups temporary directory for
    # intermediate files with built-in 
    # mechanism for deletion on exit
    if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi
    tmp=$(mktemp -d -p "{params.tmpdir}")
    trap 'rm -rf "${{tmp}}"' EXIT

    # Align against reference genome,
    # minimap2 automatically handles
    # conversion of U to T bps 
    minimap2 \\
        -ax splice \\
        --MD \\
        -uf \\
        -k14 \\
        {input.ref} \\
        {input.fq} \\
    | samtools sort -@{threads} \\
        -T "${{tmp}}" \\
        -O bam \\
        --write-index \\
        -o {output.bam}##idx##{output.bai} \\
        -

    # Gather BAM statistics
    samtools idxstats {output.bam} > {output.idx}
    samtools flagstat {output.bam} > {output.flagstat}
    samtools stats {output.bam} > {output.stats}
    """

SnakeMake SAMtools Minimap2 From line 27 of rules/map.smk

shell: 
    """
    # Setups temporary directory for
    # intermediate files with built-in 
    # mechanism for deletion on exit
    if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi
    tmp=$(mktemp -d -p "{params.tmpdir}")
    trap 'rm -rf "${{tmp}}"' EXIT

    # Align against reference genome,
    # minimap2 automatically handles
    # conversion of U to T bps, See 
    # this issue for the author of 
    # minimap2's recomendations for 
    # aligning direct RNA reads against 
    # the transcriptome:
    # https://github.com/lh3/minimap2/issues/340  
    minimap2 \\
        -ax map-ont \\
        -N 10 \\
        -k10 \\
        {input.ref} \\
        {input.fq} \\
    | samtools sort -@{threads} \\
        -T "${{tmp}}" \\
        -O bam \\
        --write-index \\
        -o {output.bam}##idx##{output.bai} \\
        -
    """

SnakeMake SAMtools Minimap2 From line 81 of rules/map.smk

shell: 
    """
    # CPM normalized BigWig
    # of the forward strand
    bamCoverage \\
        -b {input.bam} \\
        -o {output.fwd} \\
        --samFlagExclude 16 \\
        --normalizeUsing CPM

    # CPM normalized BigWig
    # of the reverse strand
    bamCoverage \\
        -b {input.bam} \\
        -o {output.rev} \\
        --samFlagInclude 16 \\
        --normalizeUsing CPM
    """

SnakeMake DeepTools From line 133 of rules/map.smk

shell: """
# Build an index to map basecalled
# reads to ONT device's raw signal 
nanopolish index \\
    --directory={params.f5} \\
    {input.fq}
"""

SnakeMake nanopolish From line 29 of rules/polya.smk

shell: """
# Estimate polyA tail lengths
# for each transcript
nanopolish polya \\
    --threads={threads} \\
    --reads={input.fq} \\
    --bam={input.bam} \\
    --genome={input.ref} \\
> {output.polya}
"""

SnakeMake nanopolish From line 65 of rules/polya.smk

shell: """
# Create pycoQC summary file
Fast5_to_seq_summary \\
    --threads {threads} \\
    -f {params.f5path} \\
    -s {output.tsv} \\
    --verbose 2 \\
    --fields barcode_arrangement barcode_full_arrangement \\
        barcode_score calibration_strand_end \\
        calibration_strand_genome_template \\
        calibration_strand_identity calibration_strand_start \\
        called_events channel channel_digitisation \\
        channel_offset channel_range channel_sampling_rate \\
        device_id duration flow_cell_id mean_qscore_template \\
        protocol_run_id read_id read_number run_id sample_id \\
        sequence_length_template skip_prob start_mux \\
        start_time stay_prob step_prob strand_score
"""

SnakeMake pycoQC From line 20 of rules/qc.smk

shell: """
pycoQC \\
    -f {input.tsv} \\
    -o {output.html} \\
    -j {output.json} 
"""

SnakeMake pycoQC From line 62 of rules/qc.smk

shell: """
fastqc \\
    -t {threads} \\
    -o {params.outdir} \\
    {input.fq}
"""

SnakeMake FastQC From line 89 of rules/qc.smk

shell: """
fastqc \\
    -t {threads} \\
    -o {params.outdir} \\
    {input.fq}
"""

SnakeMake FastQC From line 116 of rules/qc.smk

shell: """
NanoPlot \\
    -t {threads} \\
    --bam {input.bam} \\
    -o {params.outdir} \\
    --no_static \\
    --verbose
"""

SnakeMake NanoPlot From line 144 of rules/qc.smk

shell: """
NanoStat \\
    -t {threads} \\
    --bam {input.bam} \\
> {output.metrics}
"""

SnakeMake nanostat From line 173 of rules/qc.smk

shell: """
multiqc \\
    --ignore '*/.singularity/*' \\
    -f \\
    -c {params.conf} \\
    --interactive \\
    --outdir {params.outdir} \\
    {params.wdir}
"""

SnakeMake MultiQC From line 207 of rules/qc.smk

shell: """
NanoCount \\
    -i {input.bam} \\
    --extra_tx_info \\
    -e {params.em_iter} \\
    -o {output.counts} 
"""

SnakeMake From line 23 of rules/quant.smk

shell: """
# Estimated counts matrix,
# raw counts 
{params.script} \\
    --input {input.counts} \\
    --output {output.est_count} \\
    --join-on transcript_name \\
    --extract est_count \\
    --clean-suffix '.nanocount.transcripts.tsv' \\
    --nan-values 0.0
# TPM counts matrix,
# normalized counts
{params.script} \\
    --input {input.counts} \\
    --output {output.tpm} \\
    --join-on transcript_name \\
    --extract tpm \\
    --clean-suffix '.nanocount.transcripts.tsv' \\
    --nan-values 0.0
"""

SnakeMake From line 53 of rules/quant.smk

shell: """
# Convert BAM into BED12
bam2Bed12 \\
    -i {input.bam} \\
> {output.bed12}
# Correct misaligned splice
# sites with flair correct 
flair correct \\
    -q {output.bed12} \\
    -g {input.ref} \\
    -f {input.gtf} \\
    -t {threads} \\
    -o {params.prefix} \\
    --nvrna
"""

SnakeMake From line 102 of rules/quant.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

# Merge the corrected BED files
cat {input.corrected} > {output.merged}

# Find high-confidence isoforms.
# Flair does not like transcript 
# IDs reported by GENCODE  
flair collapse \\
    --temp_dir "${{tmp}}" \\
    --threads {threads} \\
    --gtf {input.gtf} \\
    --genome {input.genome} \\
    --annotation_reliant generate \\
    --check_splice \\
    --stringent \\
    --support 1 \\
    --reads {input.reads} \\
    --query {output.merged} \\
    --output {params.prefix}
"""

SnakeMake From line 157 of rules/quant.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

# Create sample manifest file,
# TSV containing each samples 
# basename, group, batch, and 
# path to its FastQ file, the
# sample manifest file, cannot
# contain any underscores in the 
# first three columns; however,
# hypens cannot/should not used
# in the groups/condition column
awk -F '\\t' -v OFS='\\t' \\
    '{{print \\
        $1, \\
        $3, \\
        "batch1", \\
        "{params.workdir}/"$1"/fastqs/"$1".filtered.fastq.gz" \\
    }}' {params.groups} \\
    | awk -F '\\t' -v OFS='\\t' \\
        '{{gsub("_","-",$1); \\
            gsub("_","",$2); \\
            gsub("_","-",$3); \\
            print \\
        }}' \\
>  {output.manifest}

# Create isoform-by-sample counts 
# matrix that can be used in the 
# diffExp and diffSplice modules
flair quantify \\
    -r {output.manifest} \\
    -i {input.fa} \\
    --isoform_bed {input.bed} \\
    --check_splice \\
    --stringent \\
    --temp_dir "${{tmp}}" \\
    --threads {threads} \\
    --tpm \\
    --output {params.prefix}
"""

SnakeMake From line 224 of rules/quant.smk

shell: """
# Structural and quality annotation
# of collapsed transcript from flair
sqanti3_qc.py \\
    {input.fa} \\
    {input.gtf} \\
    {input.genome} \\
    --aligner_choice=minimap2 \\
    --cpus {threads} \\
    --output {params.prefix} \\
    --dir {params.outdir} \\
    --CAGE_peak {params.cage_peak} \\
    --polyA_motif_list {params.polya_motif} \\
    --isoAnnotLite \\
    --force_id_ignore \\
    --fasta
"""

SnakeMake From line 49 of rules/sqanti.smk

shell: """
# Applies ML filter on selected
# SQANTI3 QC classification file
sqanti3_filter.py ML \\
    --output {params.prefix} \\
    --dir {params.outdir} \\
    --gtf {input.gtf} \\
    --isoforms {input.fa} \\
    {input.txt}
"""

SnakeMake From line 102 of rules/sqanti.smk

shell: 
    """
    # Setups temporary directory for
    # intermediate files with built-in 
    # mechanism for deletion on exit
    if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi
    tmp=$(mktemp -d -p "{params.tmpdir}")
    trap 'rm -rf "${{tmp}}"' EXIT

    # Align against reference genome,
    # minimap2 automatically handles
    # conversion of U to T bps, See 
    # this issue for the author of 
    # minimap2's recomendations for 
    # aligning direct RNA reads against 
    # the transcriptome:
    # https://github.com/lh3/minimap2/issues/340  
    minimap2 \\
        -ax map-ont \\
        -N 10 \\
        -k10 \\
        {input.ref} \\
        {input.fq} \\
    | samtools sort -@{threads} \\
        -T "${{tmp}}" \\
        -O bam \\
        --write-index \\
        -o {output.bam}##idx##{output.bai} \\
        -
    """

SnakeMake SAMtools Minimap2 From line 136 of rules/sqanti.smk

shell: 
    """
    {params.prefix} {input.fq} {params.suffix} {output.fq}
    """

SnakeMake From line 55 of rules/trim.smk

shell: 
    """
    # Nanofilt requires uncompressed input
    gunzip -c {input.fq} \\
        | NanoFilt -q {params.qual_filt} \\
        | gzip \\
    > {output.flt}
    """