RNA Biology Pipeline to Characterize protein-RNA Interactions

public 1yr ago Version: v2.3.1 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

An RNA Biology pipeline to characterize protein-RNA interactions.

iCLIPv2 2

1. Getting Started!

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (tba later!).

1.1 Download the workflow

Please clone this repository to your local filesystem using the following command:

# Clone Repository from Github
git clone https://github.com/RBL-NCI/iCLIP.git
# Change your working directory to the iCLIP repo
cd iCLIP/

1.2 Add snakemake to PATH

Please make sure that snakemake>=5.19 is in your $PATH . If you are in Biowulf, please load the following environment module:

# Recommend running snakemake>=5.19
module load snakemake/5.24.1

1.3 Configure workflow

Configure the workflow according to your needs via editing the files in the config/ folder. Adjust snakemake_config.yaml to configure the workflow execution and cluster_config.yml to configure the cluster settings. Create multiplex.tsv and samples.tsv files to specify your sample setup, or edit the example manifests in the manifest/ folder.

1.4 Dry-run the workflow

Run the following command to dry-run the snakemake pipeline:

sh run_snakemake.sh dry-run

Review the log to ensure there are no workflow errors.

2. Usage

Submit master job to the cluster:

sh run_snakemake.sh cluster

Submit master job locally:

sh run_snakemake.sh local

3. Contribute

This section is for new developers working with the iCLIP pipeline. If you have added new features or adding new changes, please consider contributing them back to the original repository:

Fork the original repo to a personal or org account.
Clone the fork to your local filesystem.
Copy the modified files to the cloned fork.
Commit and push your changes to your fork.
Create a pull request to this repository.

Code Snippets

shell:
    """
    set -exo pipefail

    if [[ {params.b_qc_flag} == "PROCESS" ]]; then
        gunzip -c  {input.fq} \\
            | awk 'NR%4==2 {{print substr($0, {params.start_pos}, {params.bc_len});}}' \\
            | LC_ALL=C sort --buffer-size={params.memG} --parallel={threads} --temporary-directory=/lscratch/${{SLURM_JOB_ID}} -n \\
            | uniq -c > {output.counts};

        Rscript {params.R} --sample_manifest {sample_manifest} \\
            --multiplex_manifest {multiplex_manifest} \\
            --barcode_input {output.counts} \\
            --mismatch {params.mm} \\
            --mpid {wildcards.mp} \\
            --output_dir {params.base} \\
            --qc_dir {params.base}
    else
        echo "Barcode QC checking was skipped for this run" > {output.txt}
        touch {output.png}
        touch {output.counts}
    fi
    """

SnakeMake From line 518 of workflow/Snakefile

shell:
    """
    set -exo pipefail

    # set tmp dir
    tmp_dir="/lscratch/${{SLURM_JOB_ID}}"
    export tmp_dir

    # run ultraplex to remove adaptors, separate barcodes
    # output files to tmp scratch dir
    ultraplex \\
        --threads {threads} \\
        --barcodes {input.barcodes} \\
        --directory $tmp_dir \\
        --inputfastq {input.f1} \\
        --final_min_length {params.ml} \\
        --phredquality {params.pq} \\
        --fiveprimemismatches {params.mm} \\
        --ultra 

    # move files to final location after they are zipped
    mv $tmp_dir/* {params.out_dir}
    """

SnakeMake From line 571 of workflow/Snakefile

shell:
    """
    set -exo pipefail

    # create empty file
    touch {output.fastq}

    # moves files to project output dir
    {params.cmd}
    """

SnakeMake From line 608 of workflow/Snakefile

shell:
    """
    set -exo pipefail

    # Rename files
    {params.cmd} 
    """

SnakeMake From line 632 of workflow/Snakefile

shell:
    """
    set -exo pipefail

    # run FASTQC
    fastqc {input.fastq} -o {params.base}
    """

SnakeMake FastQC From line 653 of workflow/Snakefile

shell:
    """
    set -exo pipefail

    # set tmp dir
    tmp_dir="/lscratch/${{SLURM_JOB_ID}}"
    export tmp_dir

    # Gzip input files
    gunzip -c {input.filtered} > ${{tmp_dir}}/{params.tmp};

    # Run FastQ Screen
    fastq_screen --conf {params.conf_species} \\
        --outdir {params.base_species} \\
        --threads {threads} \\
        --subset 1000000 \\
        --aligner bowtie2 \\
        --force \\
        ${{tmp_dir}}/{params.tmp};
    fastq_screen --conf {params.conf_rrna} \\
        --outdir {params.base_rrna} \\
        --threads {threads} \\
        --subset 1000000 \\
        --aligner bowtie2 \\
        --force \\
        ${{tmp_dir}}/{params.tmp};

    # Remove tmp gzipped file
    rm ${{tmp_dir}}/{params.tmp}

    # Run FastQ Validator
    mkdir -p {params.base_val}
    {params.fastq_v} \\
        --disableSeqIDCheck \\
        --noeof \\
        --printableErrors 100000000 \\
        --baseComposition \\
        --avgQual \\
        --file {input.filtered} > {output.log};
    """

SnakeMake Bowtie 2 From line 695 of workflow/Snakefile

shell:
    """
    set -exo pipefail

    # set tmp dir
    tmp_dir="/lscratch/${{SLURM_JOB_ID}}"
    export tmp_dir

    # STAR cannot handle sorting large files - allow samtools to sort output files
    STAR \
    --runMode alignReads \
    --genomeDir {params.s_index} \
    --sjdbGTFfile {params.s_gtf} \
    --readFilesCommand zcat \
    --readFilesIn {input.f1} \
    --outFileNamePrefix $tmp_dir/{params.out_prefix} \
    --outReadsUnmapped Fastx \
    --outSAMtype BAM Unsorted \
    --alignEndsType {params.s_atype} \
    --alignIntronMax {params.s_intron} \
    --alignSJDBoverhangMin {params.s_sjdb} \
    --alignSJoverhangMin {params.s_asj} \
    --alignTranscriptsPerReadNmax {params.s_transc} \
    --alignWindowsPerReadNmax {params.s_windows} \
    --limitBAMsortRAM {params.s_bam_limit} \
    --limitOutSJcollapsed {params.s_sjcol} \
    --outFilterMatchNmin {params.s_match} \
    --outFilterMatchNminOverLread {params.s_readmatch} \
    --outFilterMismatchNmax {params.s_mismatch} \
    --outFilterMismatchNoverReadLmax {params.s_readmm} \
    --outFilterMultimapNmax {params.s_fmm} \
    --outFilterMultimapScoreRange {params.s_mmscore} \
    --outFilterScoreMin {params.s_score} \
    --outFilterType {params.s_ftype} \
    --outSAMattributes {params.s_att} \
    --outSAMunmapped {params.s_unmap} \
    --outSJfilterCountTotalMin {params.s_sjmin} \
    --outSJfilterOverhangMin {params.s_overhang} \
    --outSJfilterReads {params.s_sjreads} \
    --seedMultimapNmax {params.s_smm} \
    --seedNoneLociPerWindow {params.s_loci} \
    --seedPerReadNmax {params.s_read} \
    --seedPerWindowNmax {params.s_wind} \
    --sjdbScore {params.s_sj} \
    --winAnchorMultimapNmax {params.s_anchor} \
    --quantMode {params.s_quantmod}

    # sort file
    samtools sort -m 80G -T $tmp_dir $tmp_dir/{params.out_prefix}Aligned.out.bam -o $tmp_dir/{params.out_prefix}Aligned.sortedByCoord.out.bam

    # move STAR files and final log file to output
    mv $tmp_dir/{params.out_prefix}Aligned.sortedByCoord.out.bam {output.bam}
    mv $tmp_dir/{params.out_prefix}Log.final.out {output.log}

    # move mates to unmapped file
    touch {output.unmapped}
    for f in $tmp_dir/{params.out_prefix}Unmapped.out.mate*; do cat $f >> {output.unmapped}; done
    """

SnakeMake SAMtools STAR From line 785 of workflow/Snakefile

shell:
    """
    set -exo pipefail

    # set tmp dir
    tmp_dir="/lscratch/${{SLURM_JOB_ID}}"
    export tmp_dir

    # Index
    cp {input.bam} {output.bam}
    samtools index -@ {threads} {output.bam};

    # Run samstats
    samtools stats --threads {threads} {output.bam} > {output.samstat}
    """

SnakeMake SAMtools From line 859 of workflow/Snakefile

    shell:
        """
        # set fail count
        fail=0

        # create output file
        if [[ -f {output.qc_raw_counts} ]]; then rm {output.qc_raw_counts}; fi 
        touch {output.qc_raw_counts}

        for f in {input.stats}; do
            # check samstats file to determine number of reads and reads mapped
            raw_count=`cat $f | grep "raw total sequences" | awk -F"\t" '{{print $3}}'`
            mapped_count=`cat $f | grep "reads mapped:" | awk -F"\t" '{{print $3}}'`
            found_percentage=$(($mapped_count / $raw_count))

            # check the count against the set count_threshold, if counts found are lower than expected, fail
            fail=0
            if [ 1 -eq "$(echo "${{found_percentage}} < {params.count_threshold}" | bc)" ]; then
                flag="sample failed"
                fail=$((fail + 1))
            else
                flag="sample passed"
            fi

            # put data into output
            echo "$f\t$found_percentage\t$flag" >> {output.qc_raw_counts}
        done

        # create output file
if [ 1 -eq "$(echo "${{fail}} > 0" | bc)" ]; then
            echo "Check sample log {output.qc_raw_counts} to review what sample(s) failed" > {params.qc_base}_fail.txt
        else
            touch {params.qc_base}_pass.txt
        fi
        """

SnakeMake From line 892 of workflow/Snakefile

shell:
    """
    set -exo pipefail

    multiqc -f -v \\
        -c {params.qc_config} \\
        -d -dd 1 \\
        {params.dir_post} \\
        {params.dir_screen_rrna} \\
        {params.dir_screen_species} \\
        -o {params.out_dir}
    """

SnakeMake MultiQC From line 949 of workflow/Snakefile

shell:
    """
    if [[ {params.m_flag} == "Y" ]] && [[ {params.b_flag} == "PROCESS" ]] ; then
        Rscript -e 'library(rmarkdown); \
        rmarkdown::render("{params.R}",
            output_file = "{output.html}", \
                params= list(log_list = "{input.log_list}", \
                    b_txt = "{params.txt_bc}"))'
    else
        Rscript -e 'library(rmarkdown); \
            rmarkdown::render("{params.R}",
            output_file = "{output.html}", \
                params= list(log_list = "{input.log_list}"))'
    fi

    sh {params.qc} {params.log_dir} {params.single_qc} {params.proj_qc}
    """

SnakeMake From line 984 of workflow/Snakefile

shell:
    """
    set -exo pipefail

    # set tmp dir
    tmp_dir="/lscratch/${{SLURM_JOB_ID}}"
    export tmp_dir

    # Run UMI Tools Deduplication
    echo "Using the following UMI seperator: {params.umi}"
    umi_tools dedup \\
        -I {input.f1} \\
        --method unique \\
        --multimapping-detection-method=NH \\
        --umi-separator={params.umi} \\
        -S ${{tmp_dir}}/{params.base}.bam \\
        --log2stderr;

    # Sort and Index
    samtools sort --threads {threads} -m 10G -T ${{tmp_dir}} \\
        ${{tmp_dir}}/{params.base}.bam \\
        -o {output.bam};
    samtools index -@ {threads} {output.bam};
    """

SnakeMake SAMtools umi_tools From line 1021 of workflow/Snakefile

shell:
    """
    set -exo pipefail
    # set tmp dir
    tmp_dir="/lscratch/${{SLURM_JOB_ID}}"
    export tmp_dir

    # if read alignment has tag "NH:i:1" then it is an unique alignment
    python {params.pyscript} --inputBAM {input.bam} --outputBAM ${{tmp_dir}}/{params.base}.unique.bam
    samtools index -@ {threads} ${{tmp_dir}}/{params.base}.unique.bam;

    # Create SAFs
    bedtools bamtobed -split -i {input.bam} \\
        | LC_ALL=C sort --buffer-size={params.memG} --parallel={threads} --temporary-directory=$tmp_dir -k1,1V -k2,2n  > {output.bed_all};
    bedtools bamtobed -split -i ${{tmp_dir}}/{params.base}.unique.bam \\
        | LC_ALL=C sort --buffer-size={params.memG} --parallel={threads} --temporary-directory=$tmp_dir -k1,1V -k2,2n > {output.bed_unique};
    awk '{{OFS="\\t";print "GeneID","Chr","Start","End","Strand"}}' > {output.saf_all}
    awk '{{OFS="\\t";print "GeneID","Chr","Start","End","Strand"}}' > {output.saf_unique}
    bedtools merge \\
        -c 6 -o count,distinct \\
        -bed -s -d 50 \\
        -i {output.bed_all} \\
        | awk '{{OFS="\\t"; print $1":"$2"-"$3"_"$5,$1,$2,$3,$5}}' >> {output.saf_all};
    bedtools merge \\
        -c 6 -o count,distinct \\
        -bed -s -d 50 \\
        -i {output.bed_unique} \\
        | awk '{{OFS="\\t"; print $1":"$2"-"$3"_"$5,$1,$2,$3,$5}}' >> {output.saf_unique}
    """

SnakeMake SAMtools BEDTools From line 1069 of workflow/Snakefile

shell:
    """
    set -exo pipefail
    # set tmp dir
    tmp_dir="/lscratch/${{SLURM_JOB_ID}}"
    export tmp_dir

    cp {input.bed_all} $tmp_dir/all.bed
    bgzip --threads {threads} --force $tmp_dir/all.bed
    mv $tmp_dir/all.bed.gz {output.bed_all}
    tabix -p bed -f {output.bed_all}

    cp {input.bed_unique} $tmp_dir/unique.bed
    bgzip --threads {threads} --force $tmp_dir/unique.bed
    mv $tmp_dir/unique.bed.gz {output.bed_unique}
    tabix -p bed -f {output.bed_unique}
    """

SnakeMake tabix From line 1116 of workflow/Snakefile

shell:
    """
    set -exo pipefail
    # Run for allreadpeaks
    featureCounts -F SAF \\
        -a {input.saf_all} \\
        -O \\
        -J \\
        --fraction \\
        --minOverlap 1 \\
        -s 1 \\
        -T {threads} \\
        -o {output.all_unique} \\
        {input.bam};
    featureCounts -F SAF \\
        -a {input.saf_all} \\
        -M \\
        -O \\
        -J \\
        --fraction \\
        --minOverlap 1 \\
        -s 1 \\
        -T {threads} \\
        -o {output.all_mm} \\
        {input.bam};
    featureCounts -F SAF \\
        -a {input.saf_all} \\
        -M \\
        -O \\
        --minOverlap 1 \\
        -s 1 \\
        -T {threads} \\
        -o {output.all_total} \\
        {input.bam};
    # Run for uniquereadpeaks
    featureCounts -F SAF \\
        -a {input.saf_unique} \\
        -O \\
        -J \\
        --fraction \\
        --minOverlap 1 \\
        -s 1 \\
        -T {threads} \\
        -o {output.unique_unique} \\
        {input.bam};
    featureCounts -F SAF \\
        -a {input.saf_unique} \\
        -M \\
        -O \\
        -J \\
        --fraction \\
        --minOverlap 1 \\
        -s 1 \\
        -T {threads} \\
        -o {output.unique_mm} \\
        {input.bam}
    featureCounts -F SAF \\
        -a {input.saf_unique} \\
        -M \\
        -O \\
        --minOverlap 1 \\
        -s 1 \\
        -T {threads} \\
        -o {output.unique_total} \\
        {input.bam}    
    """

SnakeMake FeatureCounts From line 1169 of workflow/Snakefile

shell:
    """
    Rscript {params.script} \\
        --ref_species {params.ref_sp} \\
        --refseq_rRNA {params.rrna_flag} \\
        --alias_path {params.a_path} \\
        --gencode_path {params.g_path} \\
        --refseq_path {params.rs_path} \\
        --canonical_path {params.c_path} \\
        --intron_path {params.i_path} \\
        --rmsk_path {params.r_path} \\
        --custom_path {params.custom_path} \\
        --out_dir {params.base} \\
        --reftable_path {params.a_config}
    """

SnakeMake From line 1260 of workflow/Snakefile

shell:
    '''
    # Setup tmp directory
    if [ -d "/lscratch/${{SLURM_JOB_ID}}" ]; then
        tmp_dir="/lscratch/${{SLURM_JOB_ID}}/"
    else
        tmp_dir="{out_dir}01_preprocess/07_rscripts/"
        if [ ! -d $tmp_dir ]; then mkdir $tmp_dir; fi
    fi

    #bash script to run bedtools and get site2peak lookuptable
    bash {params.bashscript} {input.mm_jcounts} {input.mm} {params.sp}_{params.p_type} {params.out} {params.pyscript} 
    # above bash script will create {output.splice_table}
    Rscript {params.script} \\
        --rscript {params.functions} \\
        --peak_type {params.p_type} \\
        --peak_unique {input.unique} \\
        --peak_all {input.mm} \\
        --peak_total {input.total} \\
        --join_junction {params.junc} \\
        --anno_anchor {params.anchor} \\
        --read_depth {params.r_depth} \\
        --demethod {params.d_method} \\
        --sample_id {params.sp} \\
        --ref_species {params.ref_sp} \\
        --splice_table {output.splice_table} \\
        --tmp_dir ${{tmp_dir}} \\
        --out_dir {params.out} \\
        --out_file {output.text} \\
        --out_dir_DEP {params.out_de} \\
        --output_file_error {params.error}
    '''

SnakeMake BEDTools From line 1316 of workflow/Snakefile

shell:
    '''
    # Setup tmp directory
    if [ -d "/lscratch/${{SLURM_JOB_ID}}" ]; then
        tmp_dir="/lscratch/${{SLURM_JOB_ID}}/"
    else
        tmp_dir="{out_dir}01_preprocess/07_rscripts/"
        if [[ ! -d $tmp_dir ]]; then mkdir $tmp_dir; fi
    fi

    #bash script to run bedtools and get site2peak lookuptable
    bash {params.bashscript} {input.mm_jcounts} {input.mm} {params.sp}_{params.p_type} {params.out} {params.pyscript}
    # above bash script will create {output.splice_table}

    Rscript {params.script} \\
        --rscript {params.functions} \\
        --peak_type {params.p_type} \\
        --peak_unique {input.unique} \\
        --peak_all {input.mm} \\
        --peak_total {input.total} \\
        --join_junction {params.junc} \\
        --anno_anchor {params.anchor} \\
        --read_depth {params.r_depth} \\
        --demethod {params.d_method} \\
        --sample_id {params.sp} \\
        --ref_species {params.ref_sp} \\
        --splice_table {output.splice_table} \\
        --tmp_dir ${{tmp_dir}} \\
        --out_dir {params.out} \\
        --out_file {output.text} \\
        --out_dir_DEP {params.out_m} \\
        --output_file_error {params.error}
    '''

SnakeMake BEDTools From line 1385 of workflow/Snakefile

shell:
    '''
    # Setup tmp directory
    if [ -d "/lscratch/${{SLURM_JOB_ID}}" ]; then
        tmp_dir="/lscratch/${{SLURM_JOB_ID}}/"
    else
        tmp_dir="{out_dir}01_preprocess/07_rscripts/"
        if [ ! -d $tmp_dir ]; then mkdir $tmp_dir; fi
    fi
    Rscript {params.script} \\
        --rscript {params.functions} \\
        --peak_type {params.p_type} \\
        --anno_anchor {params.anchor} \\
        --read_depth {params.r_depth} \\
        --sample_id {params.sp} \\
        --ref_species {params.ref_sp} \\
        --anno_dir {params.anno_dir} \\
        --reftable_path {params.a_config} \\
        --gencode_path {params.g_path} \\
        --intron_path {params.i_path} \\
        --rmsk_path {params.r_path} \\
        --tmp_dir ${{tmp_dir}} \\
        --out_dir {params.out} \\
        --out_file {output.outfile} \\
        --anno_strand {params.anno_strand} 
    '''

SnakeMake From line 1451 of workflow/Snakefile

shell:
    '''
    # Setup tmp directory
    if [ -d "/lscratch/${{SLURM_JOB_ID}}" ]; then
        tmp_dir="/lscratch/${{SLURM_JOB_ID}}"
    else
        tmp_dir="{out_dir}01_preprocess/07_rscripts/"
        if [ ! -d $tmp_dir ]; then mkdir $tmp_dir; fi
    fi
    tmp_dir_s=$tmp_dir/same
    tmp_dir_o=$tmp_dir/oppo

    mkdir $tmp_dir_s
    mkdir $tmp_dir_o

    Rscript {params.script} \\
        --rscript {params.functions} \\
        --peak_type {params.p_type} \\
        --anno_anchor {params.anchor} \\
        --read_depth {params.r_depth} \\
        --sample_id {params.sp} \\
        --ref_species {params.ref_sp} \\
        --anno_dir {params.anno_dir} \\
        --reftable_path {params.a_config} \\
        --gencode_path {params.g_path} \\
        --intron_path {params.i_path} \\
        --rmsk_path {params.r_path} \\
        --tmp_dir ${{tmp_dir_s}} \\
        --out_dir {params.out} \\
        --out_file {output.EIOut_SS} \\
        --anno_strand "SameStrand" 

    Rscript {params.script} \\
        --rscript {params.functions} \\
        --peak_type {params.p_type} \\
        --anno_anchor {params.anchor} \\
        --read_depth {params.r_depth} \\
        --sample_id {params.sp} \\
        --ref_species {params.ref_sp} \\
        --anno_dir {params.anno_dir} \\
        --reftable_path {params.a_config} \\
        --gencode_path {params.g_path} \\
        --intron_path {params.i_path} \\
        --rmsk_path {params.r_path} \\
        --tmp_dir ${{tmp_dir_o}} \\
        --out_dir {params.out} \\
        --out_file {output.EIOut_OS} \\
        --anno_strand "OppoStrand"
    '''

SnakeMake From line 1510 of workflow/Snakefile

shell:
    '''
    # Setup tmp directory
    if [ -d "/lscratch/${{SLURM_JOB_ID}}" ]; then
        tmp_dir="/lscratch/${{SLURM_JOB_ID}}/"
    else
        tmp_dir="{out_dir}01_preprocess/07_rscripts/"
        if [ ! -d $tmp_dir ]; then mkdir $tmp_dir; fi
    fi
    Rscript {params.script} \\
        --rscript {params.functions} \\
        --peak_type {params.p_type} \\
        --anno_anchor {params.anchor} \\
        --read_depth {params.r_depth} \\
        --sample_id {params.sp} \\
        --ref_species {params.ref_sp} \\
        --anno_dir {params.anno_dir} \\
        --reftable_path {params.a_config} \\
        --gencode_path {params.g_path} \\
        --intron_path {params.i_path} \\
        --rmsk_path {params.r_path} \\
        --tmp_dir ${{tmp_dir}} \\
        --out_dir {params.out} \\
        --out_file {output.outfile} \\
        --anno_strand {params.anno_strand}  
    '''

SnakeMake From line 1591 of workflow/Snakefile

shell:
    '''
    # Setup tmp directory
    if [ -d "/lscratch/${{SLURM_JOB_ID}}" ]; then
        tmp_dir="/lscratch/${{SLURM_JOB_ID}}/"
    else
        tmp_dir="{out_dir}01_preprocess/07_rscripts/"
        if [ ! -d $tmp_dir ]; then mkdir $tmp_dir; fi
    fi
    Rscript {params.script} \\
        --rscript {params.functions} \\
        --peak_type {params.p_type} \\
        --anno_anchor {params.anchor} \\
        --read_depth {params.r_depth} \\
        --sample_id {params.sp} \\
        --ref_species {params.ref_sp} \\
        --tmp_dir ${{tmp_dir}} \\
        --out_dir {params.out} \\
        --out_file {output.outfile}
    '''

SnakeMake From line 1647 of workflow/Snakefile

shell:
    """
    Rscript -e 'library(rmarkdown); \
    rmarkdown::render("{params.R}",
        output_file = "{output.o1}", \
        params= list(samplename = "{params.sp}", \
            peak_in = "{input.peak_in}", \
            output_table = "{output.o2}", \
            readdepth = "{params.r_depth}", \
            PeakIdnt = "{params.p_type}"))'
    """

SnakeMake From line 1689 of workflow/Snakefile

shell:
    """
    set -exo pipefail
    awk '$6=="+"' {input.bed_all} > {output.p_bed}
    awk '$6=="-"' {input.bed_all} > {output.n_bed}
    """

SnakeMake From line 1712 of workflow/Snakefile

shell:
    """
    manorm \\
        --p1 "{params.de_dir}/{params.gid_1}_{params.peak_id}readPeaks_forMANORM_{wildcards.strand}.bed" \\
        --p2 "{params.de_dir}/{params.gid_2}_{params.peak_id}readPeaks_forMANORM_{wildcards.strand}.bed" \\
        --r1 "{params.de_dir}/{params.gid_1}_ReadsforMANORM_{wildcards.strand}.bed" \\
        --r2 "{params.de_dir}/{params.gid_2}_ReadsforMANORM_{wildcards.strand}.bed" \\
        --s1 0 \\
        --s2 0 \\
        -p 1 \\
        -m 0 \\
        -w {params.manorm_w} \\
        -d {params.manorm_d} \\
        -n 10000 \\
        -s \\
        -o {params.base} \\
        --name1 {params.gid_1} \\
        --name2 {params.gid_2}

    # rename MANORM final output file
    mv {params.base}{wildcards.group_id}_all_MAvalues.xls {output.mavals}

    # rename individual file names for each sample
    mv {params.base}{params.gid_1}_MAvalues.xls {params.base}{params.gid_1}_{params.peak_id}readPeaks_MAvalues.xls
    mv {params.base}{params.gid_2}_MAvalues.xls {params.base}{params.gid_2}_{params.peak_id}readPeaks_MAvalues.xls

    # mv folders of figures, filters, tracks to new location
    # remove folders if they already exist
    if [[ -d {params.base}output_figures_{params.peak_id}readPeaks ]]; then rm -r {params.base}output_figures_{params.peak_id}readPeaks; fi
    if [[ -d {params.base}output_filters_{params.peak_id}readPeaks ]]; then rm -r {params.base}output_filters_{params.peak_id}readPeaks; fi
    if [[ -d {params.base}output_tracks_{params.peak_id}readPeaks ]]; then rm -r {params.base}output_tracks_{params.peak_id}readPeaks; fi
    mv {params.base}output_figures {params.base}output_figures_{params.peak_id}readPeaks
    mv {params.base}output_filters {params.base}output_filters_{params.peak_id}readPeaks
    mv {params.base}output_tracks {params.base}output_tracks_{params.peak_id}readPeaks
    """

SnakeMake MAnorm From line 1751 of workflow/Snakefile

shell:
    """
    set -exo pipefail

    ### for sample vs Bg compairson
    featureCounts -F SAF \\
        -a {params.smplSAF} \\
        -O \\
        --fraction \\
        --minOverlap 1 \\
        -s 1 \\
        -T {threads} \\
        -o {output.bkUniqcountsmplPk} \\
        {params.bkbam}
    featureCounts -F SAF \\
        -a {params.smplSAF} \\
        -M \\
        -O \\
        --fraction \\
        --minOverlap 1 \\
        -s 1 \\
        -T {threads} \\
        -o {output.bkMMcountsmplPk} \\
        {params.bkbam}
    Rscript {params.script} \\
        --samplename {params.gid_1} \\
        --background {params.gid_2} \\
        --peak_anno_g1 {params.anno_dir}/{params.gid_1}_annotation_{params.peak_id}readPeaks_final_table.txt \\
        --peak_anno_g2 {params.anno_dir}/{params.gid_2}_annotation_{params.peak_id}readPeaks_final_table.txt \\
        --Smplpeak_bkgroundCount_MM {output.bkMMcountsmplPk} \\
        --Smplpeak_bkgroundCount_unique {output.bkUniqcountsmplPk} \\
        --pos_manorm {params.de_dir}/{wildcards.group_id}/{wildcards.group_id}_P/{params.gid_1}_{params.peak_id}readPeaks_MAvalues.xls \\
        --neg_manorm {params.de_dir}/{wildcards.group_id}/{wildcards.group_id}_N/{params.gid_1}_{params.peak_id}readPeaks_MAvalues.xls \\
        --output_file {output.post_proc}

    ### for Bg vs sample compairson
    featureCounts -F SAF \\
        -a {params.bkSAF} \\
        -O \\
        --fraction \\
        --minOverlap 1 \\
        -s 1 \\
        -T {threads} \\
        -o {output.smplUniqcountbkPk} \\
        {params.smplbam}
    featureCounts -F SAF \\
        -a {params.bkSAF} \\
        -M \\
        -O \\
        --fraction \\
        --minOverlap 1 \\
        -s 1 \\
        -T {threads} \\
        -o {output.smplMMcountbkPk} \\
        {params.smplbam}
    Rscript {params.script} \\
        --samplename {params.gid_2} \\
        --background {params.gid_1} \\
        --peak_anno_g1 {params.anno_dir}/{params.gid_2}_annotation_{params.peak_id}readPeaks_final_table.txt \\
        --peak_anno_g2 {params.anno_dir}/{params.gid_1}_annotation_{params.peak_id}readPeaks_final_table.txt \\
        --Smplpeak_bkgroundCount_MM {output.smplMMcountbkPk} \\
        --Smplpeak_bkgroundCount_unique {output.smplUniqcountbkPk} \\
        --pos_manorm {params.de_dir}/{wildcards.group_id}/{wildcards.group_id}_P/{params.gid_2}_{params.peak_id}readPeaks_MAvalues.xls \\
        --neg_manorm {params.de_dir}/{wildcards.group_id}/{wildcards.group_id}_N/{params.gid_2}_{params.peak_id}readPeaks_MAvalues.xls \\
        --output_file {output.post_procRev}
    """

SnakeMake FeatureCounts From line 1825 of workflow/Snakefile

shell:
    """
    Rscript -e 'library(rmarkdown); \
    rmarkdown::render("{params.R}", \
        output_file = "{output.reportOut}", \
        params= list(peak_in="{input.post_proc}", \
            PeakIdnt="{params.p_id}",\
            samplename="{params.gid_1}", \
            background="{params.gid_2}", \
            pval="{params.pval}", \
            FC="{params.fc}", \
            incd_rRNA="{params.rrna}"\
            ))'

    Rscript -e 'library(rmarkdown); \
    rmarkdown::render("{params.R}", \
        output_file = "{output.reportRev}", \
        params= list(peak_in="{input.post_procRev}", \
            PeakIdnt="{params.p_id}",\
            samplename="{params.gid_2}", \
            background="{params.gid_1}", \
            pval="{params.pval}", \
            FC="{params.fc}", \
            incd_rRNA="{params.rrna}"\
            ))'
    """

SnakeMake From line 1911 of workflow/Snakefile

shell:
    """
    set -exo pipefail
    ################################################
    # Setup tmp directory
    ################################################
    if [ -d "/lscratch/${{SLURM_JOB_ID}}" ]; then
        tmp_dir="/lscratch/${{SLURM_JOB_ID}}"
    else
        tmp_dir="{out_dir}01_preprocess/05_tmp_bam"
        if [ ! -d $tmp_dir ]; then mkdir $tmp_dir; fi
    fi

    # Create header
    samtools view -@ {threads} -H {input.f1} > ${{tmp_dir}}/{params.base}.header.txt

    # Run
    samtools view -@ {threads} -F 16 {input.f1} \\
        | cat ${{tmp_dir}}/{params.base}.header.txt - \\
        | samtools view -Sb > {output.p_bam}
    samtools view -@ {threads} -f 16 {input.f1} \\
        | cat ${{tmp_dir}}/{params.base}.header.txt - \\
        | samtools view -Sb > {output.n_bam}
    """

SnakeMake SAMtools From line 1950 of workflow/Snakefile

shell:
    """
    Rscript {params.script} \\
        --samplename {params.gid_1} \\
        --background {params.gid_2} \\
        --strand {params.st} \\
        --sample_overlap {params.so} \\
        --samplemanifest {input.s_manifest} \\
        --input_dir {params.base} \\
        --output_table {output.table} \\
        --output_summary {output.summary} \\
        --output_figures {params.figs}
    """

SnakeMake From line 2012 of workflow/Snakefile

shell:
    """
    Rscript {params.script} \\
        --samplename {params.gid_1} \\
        --background {params.gid_2} \\
        --peak_type {params.p_type} \\
        --join_junction {params.junc} \\
        --samplemanifest {input.s_manifest} \\
        --pos_DB {input.table_p} \\
        --neg_DB {input.table_n} \\
        --anno_dir_sample {params.anno_dir_s} \\
        --reftable_path {params.ref_tab_config} \\
        --anno_dir_project {params.anno_dir_p} \\
        --ref_species {params.ref_sp} \\
        --gencode_path {params.g_path} \\
        --intron_path {params.i_path} \\
        --rmsk_path {params.r_path} \\
        --function_script {params.script_func} \\
        --out_dir {params.base}
    """

SnakeMake From line 2080 of workflow/Snakefile

shell:
    """
    Rscript -e 'library(rmarkdown); \
    rmarkdown::render("{params.R}",
        output_file = "{output.o1}", \
        params= list(peak_in="{input.final}", \
            DEGfolder="{params.base}",\
            PeakIdnt="{params.p_id}",\
            samplename="{params.gid_1}", \
            background="{params.gid_2}", \
            pval="{params.pval}", \
            FC="{params.FC}", \
            incd_rRNA="{params.rrna}"\
            ))'
    """