Analysis Pipeline for Ribosome Profiling-Based Identification of Open Reading Frames in Bacteria

public 1yr ago Version: 2021 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

This repository contains all code to recreate the contents of [RiboReport - Benchmarking tools for ribosome profiling-based identification of open reading frames in bacteria](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbab549/6509045). Following are descriptions of the required steps to reproduce our analysis.

Processing of High Throughput Sequencing Data

In the publication, the data required for the generation of our result figures was created using the snakemake workflow described below. After the creation of the input data, the figures from the publication were generated by using the evaluation scripts provided in the evaluation folder. The generation and usage of the final tables and figures is described below.

Generation of the Dataset

Dependencies

miniconda3 .
snakemake >=6.4.1
singularity

Input data

For running the workflow, several input files are required:

genome.fa
annotation.gff
samples.tsv
config.yaml
fastq files

The fastq files for our newly published dataset are available via NCBI GEO (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE131514). The data will be published together with the publication. Please contact us if you require the access token.

The rest of the required files can be found in the data folder.

Workflow

Note

Even though snakemake workflows are executable locally, we do not advise this due to high memory usage and runtime of some of the processing steps. We ran the workflow on a TORQUE cluster system (powered by bwHPC) and later on a de.NBI cloud instance.

For more information how to run the workflow using your prefered cluster setup, please refer to the snakemake documentation .

To run the provided snakemake workflow, follow the example for pseudomonas_aeroginosa below:

1. Setup the workflow folder and download the workflow:

mkdir pseudomonas_aeroginosa; cd pseudomonas_aeroginosa;
wget https://github.com/RickGelhausen/RiboReport/archive/2021.tar.gz;
tar -xzf 2021.tar.gz; mv RiboReport-2021 RiboReport; rm 2021.tar.gz;

2. Required tools

All required tools are downloaded automatically via conda and docker.

3. Fetch the annotation and genome files:

cp RiboReport/data/pseudomonas_aeroginosa/annotation.gff . ;
cp RiboReport/data/pseudomonas_aeroginosa/genome.fa . ;

4. Fetch the config.yaml and samples.tsv files:

cp RiboReport/data/pseudomonas_aeroginosa/config.yaml RiboReport/config.yaml ;
cp RiboReport/data/pseudomonas_aeroginosa/samples.tsv RiboReport/samples.tsv ;

5. Retrieve the sequencing data:

In order to download the fastq files, we used the SRA Toolkit .

We run fasterq-dump executable to download the according .fastq files, by providing the appropriate SRR ID .

If you do not have the SRA Toolkit , we suggest creating a conda environment:

conda create -n sra-tools -c bioconda -c conda-forge sra-tools pigz
conda activate sra-tools

(use source activate, if conda is not set-up for your bash)

For simplicity, we already collected the required fasterq-dump calls in a file.

cp RiboReport/data/pseudomonas_aeroginosa/download.sh .
bash download.sh

This will download all required fastq files into a fastq folder.

6. Run the snakemake workflow:

In order to run snakemake , the creation of a conda environment is required.

Once miniconda3 is installed. Create a snakemake environment:

conda create -n snakemake -c conda-forge -c bioconda snakemake singularity
conda activate snakemake

Next, we will show how we executed the RiboReport pipeline in both our bwHPC cluster system and on our de.NBI Cloud.

On the cloud or locally (tested on ubuntu)

If you have set up everything correctly, simply run:

snakemake -p -k --use-conda --use-singularity --singularity-args " -c " --greediness 0 -s RiboReport/Snakefile --configfile RiboReport/config.yaml --directory ${PWD} -j 4 --latency-wait 60

You can also run it using nohup, as it might take some time to finish:

nohup snakemake -p -k --use-conda --use-singularity --singularity-args " -c " --greediness 0 -s RiboReport/Snakefile --configfile RiboReport/config.yaml --directory ${PWD} -j 4 --latency-wait 60 &

Depending on your available resources you can also remove --greediness 0 and increase -j, to your preferences in order to boost performance.

On the cluster

Then you can copy and complete one of the provided submission scripts, or create your own.

cp RiboReport/torque.sh .

Example for torque.sh :

#!/bin/bash
#PBS -N benchmark
#PBS -S /bin/bash
#PBS -q "long"
#PBS -d <file path>/benchmark
#PBS -l nodes=1:ppn=1
#PBS -o <file path>/benchmark
#PBS -j oe
cd <file path>/benchmark
export PATH="<file path>/miniconda3/bin/:$PATH"
source activate snakemake
snakemake --latency-wait 600 --use-conda -s RiboReport/Snakefile --configfile RiboReport/config.yaml --directory ${PWD} -j 20 --cluster-config RiboReport/torque.yaml --cluster "qsub -N {cluster.jobname} -S /bin/bash -q {cluster.qname} -d <file path>/benchmark -l {cluster.resources} -o {cluster.logoutputdir} -j oe"

All file path statements have to be replaced by the path to your benchmark folder.

Please note that these scripts might need some extra changes depending on your cluster system. If your cluster system does not run SGE or TORQUE, these files will most likely not work at all. In the case they run SGE or TORQUE, there might be slightly different definitions for the resource statements (here #PBS -l nodes=1:ppn=1 ). This is then also the case for the configuration files sge.yaml and torque.yaml . Please check the snakemake documentation .

Excluding a prediction tool:

Currently, to remove a tool from the analysis simply remove the line from the input and shell part of the mergeConditions rule. You can find it in under rules/postprocessing.smk .

To remove IRSOM change:

rule mergeConditions:
 input:
 ribotish="tracks/{condition}.ribotish.gff",
 reparation="tracks/{condition}.reparation.gff",
 deepribo="tracks/{condition}.deepribo.gff",
 irsom="tracks/{condition}.irsom.gff",
 spectre="tracks/{condition}.spectre.gff",
 smorfer="tracks/{condition}.smorfer.gff",
 ribotricer="tracks/{condition}.ribotricer.gff",
 price="tracks/{condition}.price.gff"
 output:
 "tracks/{condition, [a-zA-Z]+}.merged.gff"
 conda:
 "../envs/bedtools.yaml"
 threads: 1
 shell:
 """
 mkdir -p tracks;
 cat {input.spectre} > {output}.unsorted;
 cat {input.ribotish} >> {output}.unsorted;
 cat {input.reparation} >> {output}.unsorted;
 cat {input.deepribo} >> {output}.unsorted;
 cat {input.irsom} >> {output}.unsorted;
 cat {input.smorfer} >> {output}.unsorted;
 cat {input.ribotricer} >> {output}.unsorted;
 cat {input.price} >> {output}.unsorted;
 bedtools sort -i {output}.unsorted > {output};
 """

rule mergeConditions:
 input:
 ribotish="tracks/{condition}.ribotish.gff",
 reparation="tracks/{condition}.reparation.gff",
 deepribo="tracks/{condition}.deepribo.gff",
 spectre="tracks/{condition}.spectre.gff",
 smorfer="tracks/{condition}.smorfer.gff",
 ribotricer="tracks/{condition}.ribotricer.gff",
 price="tracks/{condition}.price.gff"
 output:
 "tracks/{condition, [a-zA-Z]+}.merged.gff"
 conda:
 "../envs/bedtools.yaml"
 threads: 1
 shell:
 """
 mkdir -p tracks;
 cat {input.spectre} > {output}.unsorted;
 cat {input.ribotish} >> {output}.unsorted;
 cat {input.reparation} >> {output}.unsorted;
 cat {input.deepribo} >> {output}.unsorted;
 cat {input.smorfer} >> {output}.unsorted;
 cat {input.ribotricer} >> {output}.unsorted;
 cat {input.price} >> {output}.unsorted;
 bedtools sort -i {output}.unsorted > {output};
 """

smORFer output

As smORFer is modular and has different optional steps it is hard to add all combinations to the snakemake pipeline.

Some of these steps even require the creation of a calibrated bam file, which involves manual checking of offsets.

Rules for all the steps involving Ribo-seq libraries are included in rules/smorfer.smk . To use the steps involving manual work, simply add the calibrated bam file to a directory called calibrated_bam/method-condition-replicate_calibrated.bam .

By default, the filtering steps involving manual work are skipped. To add them the input/output of the rules in rules/smorfer.smk have to be updated accordingly.

Extract operon regions from the annotation:

This describes how we extracted the operons from the available annotation:

conda create -n operons -c conda-forge -c bioconda interlap
conda activate operons
./RiboReport/scripts/operons.py --in_gff_filepath RiboReport/data/pseudomonas_aeroginosa/annotation.gff > RiboReport/data/pseudomonas_aeroginosa/operons.gff ;

Visualisation and plotting of the Results

All data can be visualised using the following scripts inside the evaluation folder. Further, an evaluation_call.sh script, containing all calls for the plotting pipeline, is provided. This bash script only works if all python scripts are positioned in the same folder and the input gtf-files from the data folder are stored in a "data"-folder in the same location.

Dependencies

numpy =1.16.3
matplotlib =3.0.3
seaborn =0.9.0
pandas =0.24.2
simple_venn =0.1.0
bedtools =v2.28.0

statistics_pos_neg.py

Parameters:

reference_pos_data (-p) path to the .gtf file containing all genes of the investigated genome or the investigated subset of genes
reference_neg_data (-n) path to the .gtf file containing all genes of the investigated genome or the investigated subset of genes
tool_data (-t) path to gtf file containing all predictions for the tools: deepribo, ribotish, reparation and irsom
save_path (-o) path to a directory, where the result tables will be stored
overlap_cutoff (-c) allowed sequence overlap cutoff between gene and prediction
flag_subopt (-s) allowed allow suboptimals in statistical calculation. Set to 1 to enable flag. If not 0
flag_no_gene (-g) add predictions which do not overlap with any genen for the given cutoff. Set to 1 to enable flag. If not 0.

Output:

df_stat.csv -> main result table storing all computed statistical measurements. Needed for the bar plots.
SetX_labels_pos_predictions_overlap_fp.gtf -> all prdictions which do not overlap with the positive or the negative gene set. Therefore, predictions outside of the annotation borders for a given overlap (temporary result).
SetX_labels_pos_predictions_overlap_neg.gtf -> all genes of the negative set overlapping with a given percent with any prediction (temporary result).
SetX_labels_pos_predictions_overlap_pos.gtf -> all genes of the positive set overlapping with a given percent with any prediction (temporary result).
SetX_labels_pos_predictions_overlap_temp.gtf -> all prdictions which do not overlap with the positive gene set (temporary result).
df_venn_genes.csv -> table containing a column for each tool listing the number of genes counted as true positive (TP). Needed for the overlap plots.
toolX_score_list -> For each prediction, correct genen prediction and not predicted gene a tuple of: (score, overlap, label) is saved as a python object serialization (pickle.dump). Needed to compute the ROC and PRC.

This script generates several statistical measurements for a reference and tool prediction .gtf file. First true positives (TP), false positives (FP) and false negatives (FN) are predicted. One prediction will be associated with one gene and counted as one true positve. The association selection is based on the lowest p-value (0.05). All genes fulfilling the overlap cutoff will be counted as suboptimals and not as false positives. False positives are all predictions, where no gene fulfilling the overlap cutoff could be found and the false negatives vice versa. Based on this computations the recall, FNR, precision, FDR and F1 measure are calculated.

plot_barplots.py

Parameters:

input1_df (-i1) df_stat.csv of the statistics.py script for the first overlap cutoff.
input2_df (-i2) df_stat.csv of the statistics.py script for the second overlap cutoff.
save_path (-o) path to directory where the plots will be stored.

Output:

bar_FNR.pdf -> barplot of FNR measures for different tools and two cutoff conditions.
bar_recall.pdf -> barplot of recall measures for different tools and two cutoff conditions.
bar_precision.pdf -> barplot of precision measures for different tools and two cutoff conditions.
bar_FDR.pdf -> barplot of FDR measures for different tools and two cutoff conditions.
bar_F1.pdf -> barplot of F1 measures for different tools and two cutoff conditions.

This script generates barplots of statistical measures for the different tools and two overlap cutoff conditions. The output barplots consist of FNR, recall, precision, FDR and F1 measures. It is based on the statistics.py statistical output table.

venn_diagram.py

Parameters:

input_df (-i) .cvs containing the list of true positive genes for the used tools.
save_path (-o) path where the Venn diagramm will be saved.
name_folder (-n) name of the result folder of the investigated set. It will be used in the file name to make it unique.
coverage_percent (-c) percentage of overlap cutoff that was used to determine the true positves. It will be used in the file name to make it unique.

Output:

venn_diagram.pdf -> a 4 Venn diagram highlighting overlapping predictions of the tool.

This script will generate a 4 Venn diagram for the overlap of true positive predicted genes of the 4 investigated tools.

prc_plotting.py

Parameters:

input_path: path list of triplets (score, overlap, label) for each tool.
species_label: name of the bacterial datasets.
save_path: path to save data to.
percent_overlap: how much of the prediction should overlap with the gene and vice versa.
experiment: name of the subset.

Output:

datesetX_prc_overlapScore_subset_lables_.pdf

This scripts generates Precision-Recall-Curves (PRC) for all different tools and overlaps.

roc_plotting.py

Parameters:

input_path: path list of triplets (score, overlap, label) for each tool.
save_path: path to save data to.
percent_overlap: how much of the prediction should overlap with the gene and vice versa.
experiment: name of the subset.

Output:

datesetX_roc_overlapScore_.pdf

This scripts generates Receiver-Operator-Curves (ROC) for all different tools and overlaps.

Code Snippets

shell:
    "mkdir -p ribotish; RiboReport/scripts/createRiboTISHannotation.py -a {input.annotation} --genome_sizes {input.sizes} --annotation_output {output}"

SnakeMake From line 10 of rules/annotation.smk

shell:
    "mkdir -p ribotish; gff3ToGenePred {input} {output}"

SnakeMake From line 21 of rules/annotation.smk

shell:
    "mkdir -p ribotish; genePredToBed {input} {output}"

SnakeMake From line 32 of rules/annotation.smk

shell:
    "mkdir -p qc/all; RiboReport/scripts/annotation_featurecount.py -a {input.annotation} -o {output};"

SnakeMake From line 43 of rules/annotation.smk

shell:
    "mkdir -p auxiliary; RiboReport/scripts/enrich_annotation.py -a {input.annotation} -o {output}"

SnakeMake From line 54 of rules/annotation.smk

shell:
    """
    mkdir -p auxiliary;
    awk -F'\\t' '/^[^#]/ {{printf "%s\\t%s\\t%s\\t%s\\t%s\\t%s\\t%s\\t%s\\tID=uid%s;\\n", $1, $2, $3, $4, $5, $6, $7, $8, NR-1}}' {input} > {output}
    """

SnakeMake From line 65 of rules/annotation.smk

shell:
    "mkdir -p transcripts; cufflinks {input.bam} -p {threads} -o ./transcripts/{wildcards.condition}-{wildcards.replicate}/ -g {input.annotation}"

SnakeMake Cufflinks From line 83 of rules/annotation.smk

run:
    shell("mkdir -p deepribo; mv {input} deepribo/DeepRibo_model_v1.pt")

SnakeMake From line 18 of rules/deepribo.smk

shell:
    "mkdir -p coverage_deepribo; RiboReport/scripts/coverage_deepribo.py --alignment_file {input.bam} --output_file_prefix coverage_deepribo/{wildcards.condition}-{wildcards.replicate}"

SnakeMake From line 31 of rules/deepribo.smk

shell:
    """
    mkdir -p coverage_deepribo
    bedtools genomecov -bg -ibam {input.bam} -strand + > {output.covfwd}
    bedtools genomecov -bg -ibam {input.bam} -strand - > {output.covrev}
    """

SnakeMake BEDTools From line 44 of rules/deepribo.smk

shell:
    """
    mkdir -p deepribo/{wildcards.condition}-{wildcards.replicate}/0/;
    mkdir -p deepribo/{wildcards.condition}-{wildcards.replicate}/1/;
    DataParser.py {input.covS} {input.covAS} {input.asiteS} {input.asiteAS} {input.genome} deepribo/{wildcards.condition}-{wildcards.replicate} -g {input.annotation}
    """

SnakeMake From line 64 of rules/deepribo.smk

shell:
    "mkdir -p deepribo; Rscript RiboReport/scripts/parameter_estimation.R -f {input} -o {output}"

SnakeMake From line 79 of rules/deepribo.smk

shell:
    """
    mkdir -p deepribo;
    DeepRibo.py predict deepribo/ --pred_data {wildcards.condition}-{wildcards.replicate}/ -r {params.rpkm} -c {params.cov} --model {input.model} --dest {output} --num_workers {threads}
    """

SnakeMake From line 95 of rules/deepribo.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/deepriboGFF.py -c {wildcards.condition}  -i {input} -o {output}"

SnakeMake From line 109 of rules/deepribo.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/concatGFF.py {input} -o {output}"

SnakeMake From line 120 of rules/deepribo.smk

shell:
    "samtools faidx {rules.retrieveGenome.output}"

SnakeMake SAMtools From line 9 of rules/indexing.smk

shell:
    "mkdir -p genomes; cut -f1,2 {input[0]} > genomes/sizes.genome"

SnakeMake From line 21 of rules/indexing.smk

shell:
    "samtools index -@ {threads} maplink/{params.prefix}"

SnakeMake SAMtools From line 35 of rules/indexing.smk

shell:
    "mkdir -p transcripts; RiboReport/scripts/transcriptsToBed.py -i {input} -o {output}"

SnakeMake From line 10 of rules/irsom.smk

shell:
    "mkdir -p transcripts; bedtools getfasta -fi {input.genome} -name -bed {input.transcripts} -fo {output}"

SnakeMake BEDTools From line 22 of rules/irsom.smk

shell:
    """
    mkdir -p irsom/model/Escherichia_coli/
    mv forge.ibisc.univ-evry.fr/lplaton/IRSOM/raw/master/model/Escherichia_coli/SLSOM irsom/model/Escherichia_coli/
    mv forge.ibisc.univ-evry.fr/lplaton/IRSOM/raw/master/model/Escherichia_coli/SOM irsom/model/Escherichia_coli/
    """

SnakeMake From line 38 of rules/irsom.smk

shell:
    "mkdir -p irsom; mv {input} irsom/Featurer; chmod +x irsom/Featurer;"

SnakeMake oschmod From line 50 of rules/irsom.smk

shell:
    """
    mkdir -p irsom;
    predict.py --featurer={input.featurer} --file={input.transcripts} --model=irsom/model/Escherichia_coli/ --output=irsom/{wildcards.condition}-{wildcards.replicate}/
    """

SnakeMake From line 67 of rules/irsom.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/irsomGFF.py -c {wildcards.condition}  -i {input} -o {output}"

SnakeMake From line 81 of rules/irsom.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/concatGFF.py {input} -o {output}"

SnakeMake From line 92 of rules/irsom.smk

shell:
    "mkdir -p maplink; ln -s {params.inlink} {params.outlink}"

SnakeMake From line 10 of rules/maplink.smk

shell:
    "mkdir -p maplink/RIBO/; ln -s {params.inlink} {params.outlink}"

SnakeMake From line 22 of rules/maplink.smk

shell:
    "mkdir -p maplink/RIBO/; ln -s {params.inlink} {params.outlink}"

SnakeMake From line 34 of rules/maplink.smk

shell:
    "mkdir -p maplink/RNA/; ln -s {params.inlink} {params.outlink}"

SnakeMake From line 46 of rules/maplink.smk

shell:
    "mkdir -p maplink/RNA/; ln -s {params.inlink} {params.outlink}"

SnakeMake From line 58 of rules/maplink.smk

shell:
    "mkdir -p maplink/TIS/; ln -s {params.inlink} {params.outlink}"

SnakeMake From line 10 of rules/maplinktis.smk

shell:
    "mkdir -p maplink/TIS/; ln -s {params.inlink} {params.outlink}"

SnakeMake From line 22 of rules/maplinktis.smk

shell:
    "mkdir -p maplink/RNATIS/; ln -s {params.inlink} {params.outlink}"

SnakeMake From line 34 of rules/maplinktis.smk

shell:
    "mkdir -p maplink/RNATIS/; ln -s {params.inlink} {params.outlink}"

SnakeMake From line 46 of rules/maplinktis.smk

shell:
    "mkdir -p genomeSegemehlIndex; echo \"Computing Segemehl index\"; segemehl.x --threads {threads} -x {output.index} -d {input.genome} 2> {log};"

SnakeMake From line 11 of rules/mapping.smk

shell:
    "mkdir -p sammulti; segemehl.x -e -d {input.genome} -i {input.genomeSegemehlIndex} -q {input.fastq} --threads {threads} -o {output.sammulti} 2> {log}"

SnakeMake From line 28 of rules/mapping.smk

shell:
    """
    set +e
    mkdir -p sam
    awk '$2 == "4"' {input.sammulti} > {input.sammulti}.unmapped
    gawk -i inplace '$2 != "4"' {input.sammulti}
    samtools view -H <(cat {input.sammulti}) | grep '@HD' > {output.sam}
    samtools view -H <(cat {input.sammulti}) | grep '@SQ' | sort -t$'\t' -k1,1 -k2,2V >> {output.sam}
    samtools view -H <(cat {input.sammulti}) | grep '@RG' >> {output.sam}
    samtools view -H <(cat {input.sammulti}) | grep '@PG' >> {output.sam}
    cat {input.sammulti} |grep -v '^@' | grep -w 'NH:i:1' >> {output.sam}
    exitcode=$?
    if [ $exitcode -eq 1 ]
    then
        exit 1
    else
        exit 0
    fi
    """

SnakeMake SAMtools From line 39 of rules/mapping.smk

shell: "if [ \"{params.method}\" == \"NOTSET\" ]; then RiboReport/scripts/samstrandinverter.py --sam_in_filepath={input.sam} --sam_out_filepath={output.sam}; else cp {input.sam} {output.sam}; fi"

SnakeMake From line 68 of rules/mapping.smk

shell:
    "mkdir -p bammulti; samtools view -@ {threads} -bh {input.sam} | samtools sort -@ {threads} -o {output} -O bam"

SnakeMake SAMtools From line 78 of rules/mapping.smk

shell:
    "mkdir -p rRNAbam; samtools view -@ {threads} -bh {input.sam} | samtools sort -@ {threads} -o {output} -O bam"

SnakeMake SAMtools From line 89 of rules/mapping.smk

shell:
    """
    mkdir -p tracks;
    cat {input.spectre} > {output}.unsorted;
    cat {input.ribotish} >> {output}.unsorted;
    cat {input.reparation} >> {output}.unsorted;
    cat {input.deepribo} >> {output}.unsorted;
    cat {input.irsom} >> {output}.unsorted;
    cat {input.smorfer} >> {output}.unsorted;
    cat {input.ribotricer} >> {output}.unsorted;
    cat {input.price} >> {output}.unsorted;
    bedtools sort -i {output}.unsorted > {output};
    """

SnakeMake BEDTools From line 17 of rules/postprocessing.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/concatGFF.py {input.mergedGff} -o {output}"

SnakeMake From line 39 of rules/postprocessing.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/allToUniqueGTF.py -i {input} -o {output}"

SnakeMake From line 50 of rules/postprocessing.smk

shell:
    "mkdir -p genomes; mv genome.fa genomes/"

SnakeMake From line 7 of rules/preprocessing.smk

shell:
    "mkdir -p annotation; mv annotation.gff annotation/"

SnakeMake From line 16 of rules/preprocessing.smk

shell:
    "mkdir -p annotation; RiboReport/scripts/gtf2gff3.py -a {input} -o {output}"

SnakeMake From line 25 of rules/preprocessing.smk

shell:
    "mkdir -p annotation; RiboReport/scripts/gff2gtf.py -i {input} -o {output}"

SnakeMake From line 34 of rules/preprocessing.smk

shell:
    """
    mkdir -p price;
    gedi -e IndexGenome -s {input.genome} -a {input.annotation} -nobowtie -nokallisto -nostar -f price/ -o {output} -n price_genome
    """

SnakeMake From line 10 of rules/price.smk

shell:
    """
    mkdir -p price;
    gedi -e Price -reads {input.bam} -genomic {input.genomic} -prefix price/{wildcards.condition}-{wildcards.replicate}/results
    """

SnakeMake From line 27 of rules/price.smk

shell:
    """
    mkdir -p price;
    gedi -e ViewCIT -m bed -name 'd.getType()' {input.cit} > {output.bed}
    """

SnakeMake From line 41 of rules/price.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/priceGFF.py -c {wildcards.condition}  -i {input.filtered} -u {input.unfiltered} -o {output.unfiltered} -f {output.filtered}"

SnakeMake From line 57 of rules/price.smk

shell:
    """
    mkdir -p tracks;
    RiboReport/scripts/concatGFF.py {input.unfiltered} -o {output.unfiltered}
    RiboReport/scripts/concatGFF.py {input.filtered} -o {output.filtered}
    """

SnakeMake From line 70 of rules/price.smk

shell:
    "mkdir -p qc/4unique; fastqc -o qc/4unique -t {threads} -f sam_mapped {input.sam}; mv qc/4unique/{params.prefix}_fastqc.html {output.html}; mv qc/4unique/{params.prefix}_fastqc.zip {output.zip}"

SnakeMake FastQC From line 13 of rules/qc.smk

shell:
    "mkdir -p qc/3mapped; fastqc -o qc/3mapped -t {threads} -f sam_mapped {input.sam}; mv qc/3mapped/{params.prefix}_fastqc.html {output.html}; mv qc/3mapped/{params.prefix}_fastqc.zip {output.zip}"

SnakeMake FastQC From line 28 of rules/qc.smk

shell:
    "mkdir -p qc/1raw; fastqc -o qc/1raw -t {threads} {input.fastq}; mv qc/1raw/{params.prefix}_fastqc.html {output.html}; mv qc/1raw/{params.prefix}_fastqc.zip {output.zip}"

SnakeMake FastQC From line 43 of rules/qc.smk

shell:
    "mkdir -p qc/2trimmed; fastqc -o qc/2trimmed -t {threads} {input}; mv qc/2trimmed/{params.prefix}_fastqc.html {output.html}; mv qc/2trimmed/{params.prefix}_fastqc.zip {output.zip}"

SnakeMake FastQC From line 58 of rules/qc.smk

shell:
    "mkdir -p qc/5removedrRNA; fastqc -o qc/5removedrRNA -t {threads} {input}; mv qc/5removedrRNA/{params.prefix}_fastqc.html {output.html}; mv qc/5removedrRNA/{params.prefix}_fastqc.zip {output.zip}"

SnakeMake FastQC From line 73 of rules/qc.smk

shell:
    """
    mkdir -p qc/all;
    column3=$(cut -f3 auxiliary/unambigous_annotation.gff | sort | uniq)
    if [[ " ${{column3[@]}} " =~ "gene" ]];
    then
        featureCounts -T {threads} -t gene -g ID -a {input.annotation} -o {output.txt} {input.bam};
    else
        touch {output.txt};
    fi
    """

SnakeMake FeatureCounts From line 87 of rules/qc.smk

shell:
    """
    mkdir -p qc/trnainall;
    column3=$(cut -f3 auxiliary/unambigous_annotation.gff | sort | uniq)
    if [[ " ${{column3[@]}} " =~ "tRNA" ]];
    then
        featureCounts -T {threads} -t tRNA -g ID -a {input.annotation} -o {output.txt} {input.bam};
    else
        touch {output.txt};
    fi
    """

SnakeMake FeatureCounts From line 108 of rules/qc.smk

shell:
    """
    mkdir -p qc/rrnainall;
    column3=$(cut -f3 auxiliary/unambigous_annotation.gff | sort | uniq)
    if [[ " ${{column3[@]}} " =~ "rRNA" ]];
    then
        featureCounts -T {threads} -t rRNA -g ID -a {input.annotation} -o {output.txt} {input.bam};
    else
        touch {output.txt};
    fi
    """

SnakeMake FeatureCounts From line 129 of rules/qc.smk

shell:
    """
    mkdir -p qc/rrnainallaligned;
    column3=$(cut -f3 auxiliary/unambigous_annotation.gff | sort | uniq)
    if [[ " ${{column3[@]}} " =~ "rRNA" ]];
    then
        featureCounts -T {threads} -t rRNA -g ID -a {input.annotation} -o {output.txt} {input.bam};
    else
        touch {output.txt};
    fi
    """

SnakeMake FeatureCounts From line 150 of rules/qc.smk

shell:
    """
    mkdir -p qc/rrnainuniquelyaligned;
    column3=$(cut -f3 auxiliary/unambigous_annotation.gff | sort | uniq)
    if [[ " ${{column3[@]}} " =~ "rRNA" ]];
    then
        featureCounts -T {threads} -t rRNA -g ID -a {input.annotation} -o {output.txt} {input.bam};
    else
        touch {output.txt};
    fi
    """

SnakeMake FeatureCounts From line 171 of rules/qc.smk

shell:
    "mkdir -p coverage; bedtools genomecov -ibam {input} -bg > {output}"

SnakeMake BEDTools From line 191 of rules/qc.smk

shell:
    "export LC_ALL=en_US.utf8; export LANG=en_US.utf8; multiqc -f -d --exclude picard --exclude gatk -z -o {params.dir} qc/3mapped qc/1raw qc/2trimmed qc/5removedrRNA qc/4unique qc/all qc/trnainall qc/rrnainallaligned qc/rrnainuniquelyaligned qc/rrnainall trimmed  2> {log}"

SnakeMake gatk MultiQC Picard From line 215 of rules/qc.smk

shell:
    """
    mkdir -p auxiliary
    UNIQUE="$(cut -f3 {input.annotation} | sort | uniq)"
    IDENTIFIER="ID"
    LINE="$(sed '3q;d' {input.annotation})"
    if [[ $LINE == *"gene_id="* ]]; then IDENTIFIER="gene_id"; fi;
    for f in ${{UNIQUE}}
    do
        featureCounts -F GTF -s 1 -g $IDENTIFIER -O -t $f -M --fraction -a {input.annotation} {input.bam} -T {threads} -o auxiliary/annotation_total_reads.raw.tmp
        cat auxiliary/annotation_total_reads.raw.tmp | sed 1,2d | awk -v var=$f -FS'\\t' '{{print $0"\\t"var}}' >> {output}
        rm auxiliary/annotation_total_reads.raw.tmp
    done
    """

SnakeMake FeatureCounts From line 11 of rules/readcounting.smk

shell:
    """
    mkdir -p auxiliary; RiboReport/scripts/map_reads_to_annotation.py -i {input.reads} -a {input.annotation} -o {output}
    """

SnakeMake From line 35 of rules/readcounting.smk

shell:
    "mkdir -p auxiliary; RiboReport/scripts/total_mapped_reads.py -b {input.bam} -m {output.mapped} -l {output.length}"

SnakeMake From line 50 of rules/readcounting.smk

shell:
    "mkdir -p auxiliary; RiboReport/scripts/generate_output_annotation.py -t {input.total} -r {input.reads} -g {input.genome} -o {output.xlsx} --simple {output.annotation} --complete {output.complete}"

SnakeMake From line 65 of rules/readcounting.smk

run:
    outputName = os.path.basename(input[0])
    shell("mkdir -p uniprotDB; mv {input} uniprotDB/{outputName}; gunzip uniprotDB/{outputName}")

SnakeMake From line 10 of rules/reparation.smk

shell:
    "mkdir -p reparation; if [ uniprotDB/uniprot_sprot.fasta.bak does not exist ]; then cp -p uniprotDB/uniprot_sprot.fasta uniprotDB/uniprot_sprot.fasta.bak; fi; mkdir -p {params.prefix}/tmp; reparation.pl -bam {input.bam} -g {input.genome} -gtf {input.gtf} -db {input.db} -out {params.prefix} -threads {threads}; if [ uniprotDB/uniprot_sprot.fasta does not exist ]; then cp -p uniprotDB/uniprot_sprot.fasta.bak uniprotDB/uniprot_sprot.fasta; fi;"

SnakeMake From line 34 of rules/reparation.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/reparationGFF.py -c {wildcards.condition}  -r {wildcards.replicate} -i {input} -o {output}"

SnakeMake From line 45 of rules/reparation.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/concatGFF.py {input} -o {output}"

SnakeMake From line 56 of rules/reparation.smk

shell:
    "mkdir -p ribotish; ribotish quality -v -p {threads} -b {input.fp} -g {input.annotation} -o {params.reporttxt} -f {params.reportpdf} 2> {log} || true; if grep -q \"offdict = {{'m0': {{}}}}\" {params.offsetparameters}; then mv {params.offsetparameters} {params.offsetparameters}.unused; fi; touch {output.offsetdone}"

SnakeMake From line 19 of rules/ribotish.smk

shell:
    "mkdir -p ribotish; ribotish predict -v {params.codons} -p {threads} -b {params.fplist} -g {input.annotation} -f {input.genome} -o {output.filtered} 2> {log}"

SnakeMake From line 43 of rules/ribotish.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/ribotishGFF.py -c {wildcards.condition} -i {input} -o {output}"

SnakeMake From line 54 of rules/ribotish.smk

shell:
    """
    mkdir -p ribotricer;
    ribotricer prepare-orfs --gtf {input.annotation} --fasta {input.genome} --prefix ribotricer/ribotricer --min_orf_length 6 --start_codons ATG,GTG,TTG
    """

SnakeMake From line 16 of rules/ribotricer.smk

shell:
    """
    mkdir -p ribotricer/{wildcards.condition}-{wildcards.replicate};
    ribotricer learn-cutoff --ribo_bams {input.bam_ribo} --rna_bams {input.bam_rna}  --prefix ribotricer/ribotricer --ribotricer_index {input.index} | tail -n 1 | grep -Eo '[+-]?[0-9]+([.][0-9]+)?'> {output}
    """

SnakeMake From line 33 of rules/ribotricer.smk

shell:
    """
    mkdir -p ribotricer/{wildcards.condition}-{wildcards.replicate}
    ribotricer detect-orfs --bam {input.bam_ribo} --ribotricer_index {input.index} --prefix ribotricer/{wildcards.condition}-{wildcards.replicate}/ribotricer --phase_score_cutoff {params.cutoff}
    """

SnakeMake From line 50 of rules/ribotricer.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/ribotricerGFF.py -c {wildcards.condition}  -i {input} -o {output}"

SnakeMake From line 64 of rules/ribotricer.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/concatGFF.py {input} -o {output}"

SnakeMake From line 75 of rules/ribotricer.smk

shell:
    """
    mkdir -p annotation; awk -F'\\t' '$3 == "rRNA" || $3 == "tRNA"' {input.annotation} | awk -F'\\t' '{{print $1 FS $4 FS $5 FS "." FS "." FS $7}}' > {output.annotation}
    """

SnakeMake From line 9 of rules/rrnafiltering.smk

shell:
    "bedtools intersect -v -a {input.mapuniq} -b {input.annotation} > {output.bam}"

SnakeMake BEDTools From line 23 of rules/rrnafiltering.smk

shell:
    """
    mkdir -p smorfer;
    smorfer.sh -s putative_orfs.sh -g {input.genome} -c {params.genome_name} -n smorfer_pORFs -i 9-150 -o smorfer/
    """

SnakeMake From line 20 of rules/smorfer.smk

shell:
    """
    mkdir -p smorfer;
    smorfer.sh -s FT_GCcontent.sh -g {input.genome} -o smorfer/ -b {input.bed}
    """

SnakeMake From line 35 of rules/smorfer.smk

shell:
    """
    mkdir -p smorfer/{wildcards.condition}-{wildcards.replicate};
    smorfer.sh -s count_RPF.sh -b {input.bed} -a {input.bam} -o smorfer/{wildcards.condition}-{wildcards.replicate}/
    """

SnakeMake From line 50 of rules/smorfer.smk

shell:
    """
    mkdir -p smorfer/{wildcards.condition}-{wildcards.replicate};
    smorfer.sh -s FT_RPF.sh high_expressed_ORFs.bed calibrated_RiboSeq.bam -b {input.bed} -a {input.bam} -o smorfer/{wildcards.condition}-{wildcards.replicate}/
    """

SnakeMake From line 65 of rules/smorfer.smk

shell:
    """
    mkdir -p smorfer/{wildcards.condition}-{wildcards.replicate};
    smorfer.sh -s find_best_start.sh -b {input.bed} -a {input.bam} -o smorfer/{wildcards.condition}-{wildcards.replicate}/
    """

SnakeMake From line 80 of rules/smorfer.smk

shell:
    """
    mkdir -p smorfer/{wildcards.condition}-{wildcards.replicate}_noFT;
    smorfer.sh -s count_RPF.sh -b {input.bed} -a {input.bam} -o smorfer/{wildcards.condition}-{wildcards.replicate}_noFT/
    """

SnakeMake From line 96 of rules/smorfer.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/smorferGFF.py -c {wildcards.condition}  -i {input} -o {output}"

SnakeMake From line 111 of rules/smorfer.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/concatGFF.py {input} -o {output}"

SnakeMake From line 122 of rules/smorfer.smk

shell:
    """
    mkdir -p spectre;
    SPECtre.py --input {input.bam} --output {output.results} --log {output.log} --fpkm {input.fpkm} --gtf {input.gtf} --nt {threads}
    """

SnakeMake From line 14 of rules/spectre.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/spectreGFF.py -c {wildcards.condition}  -i {input} -o {output}"

SnakeMake From line 28 of rules/spectre.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/concatGFF.py {input} -o {output}"

SnakeMake From line 39 of rules/spectre.smk

shell:
    "mkdir -p trimlink; ln -s {params.inlink} {params.outlink};"

SnakeMake From line 16 of rules/trimming.smk

shell:
    "mkdir -p trimmed; cutadapt -j {threads} {params.adapter} {params.quality} {params.filtering} -o {output.fastq} {input.fastq}"

SnakeMake Cutadapt From line 31 of rules/trimming.smk

shell:
    "mkdir -p genomes; RiboReport/scripts/reverseComplement.py --input_fasta_filepath genomes/genome.fa --output_fasta_filepath genomes/genome.rev.fa"

SnakeMake From line 11 of rules/visualization.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/motif2GFF3.py --input_genome_fasta_filepath {input.fwd} --input_reverse_genome_fasta_filepath {input.rev} --motif_string ATG --output_gff3_filepath {output}"

SnakeMake From line 23 of rules/visualization.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/motif2GFF3.py --input_genome_fasta_filepath {input.fwd} --input_reverse_genome_fasta_filepath {input.rev} --motif_string GTG,TTG,CTG --output_gff3_filepath {output}"

SnakeMake From line 35 of rules/visualization.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/motif2GFF3.py --input_genome_fasta_filepath {input.fwd} --input_reverse_genome_fasta_filepath {input.rev} --motif_string TAG,TGA,TAA --output_gff3_filepath {output}"

SnakeMake From line 48 of rules/visualization.smk

shell:
    "mkdir -p tracks; RiboReport/scripts/motif2GFF3.py --input_genome_fasta_filepath {input.fwd} --input_reverse_genome_fasta_filepath {input.rev} --motif_string AAGG --output_gff3_filepath {output}"

SnakeMake From line 60 of rules/visualization.smk

shell:
   "mkdir -p globaltracks; mkdir -p globaltracks/raw; mkdir -p globaltracks/mil; mkdir -p globaltracks/min; RiboReport/scripts/mapping.py --mapping_style global --bam_path {input.bam} --wiggle_file_path globaltracks/ --no_of_aligned_reads_file_path {input.stats} --library_name {params.prefix};"

SnakeMake From line 82 of rules/visualization.smk

shell:
    "wigToBigWig {input.fwd} {input.genomeSize} {output.fwd}"

SnakeMake wigToBigWig From line 94 of rules/visualization.smk

shell:
    "wigToBigWig {input.rev} {input.genomeSize} {output.rev}"

SnakeMake wigToBigWig From line 106 of rules/visualization.smk

shell:
    "wigToBigWig {input.fwd} {input.genomeSize} {output.fwd}"

SnakeMake wigToBigWig From line 118 of rules/visualization.smk

shell:
    "wigToBigWig {input.rev} {input.genomeSize} {output.rev}"

SnakeMake wigToBigWig From line 130 of rules/visualization.smk

shell:
    "wigToBigWig {input.fwd} {input.genomeSize} {output.fwd}"

SnakeMake wigToBigWig From line 142 of rules/visualization.smk

shell:
    "wigToBigWig {input.rev} {input.genomeSize} {output.rev}"

SnakeMake wigToBigWig From line 154 of rules/visualization.smk

shell:
     "mkdir -p centeredtracks; mkdir -p centeredtracks/raw; mkdir -p centeredtracks/mil; mkdir -p centeredtracks/min; RiboReport/scripts/mapping.py --mapping_style centered --bam_path {input.bam} --wiggle_file_path centeredtracks/ --no_of_aligned_reads_file_path {input.stats} --library_name {params.prefix};"

SnakeMake From line 176 of rules/visualization.smk

shell:
    "wigToBigWig {input.fwd} {input.genomeSize} {output.fwd}"

SnakeMake wigToBigWig From line 188 of rules/visualization.smk

shell:
    "wigToBigWig {input.rev} {input.genomeSize} {output.rev}"

SnakeMake wigToBigWig From line 200 of rules/visualization.smk

shell:
    "wigToBigWig {input.fwd} {input.genomeSize} {output.fwd}"

SnakeMake wigToBigWig From line 212 of rules/visualization.smk

shell:
    "wigToBigWig {input.rev} {input.genomeSize} {output.rev}"

SnakeMake wigToBigWig From line 224 of rules/visualization.smk

shell:
    "wigToBigWig {input.fwd} {input.genomeSize} {output.fwd}"

SnakeMake wigToBigWig From line 236 of rules/visualization.smk

shell:
    "wigToBigWig {input.rev} {input.genomeSize} {output.rev}"

SnakeMake wigToBigWig From line 248 of rules/visualization.smk

shell:
    "mkdir -p fiveprimetracks; mkdir -p fiveprimetracks/raw; mkdir -p fiveprimetracks/mil; mkdir -p fiveprimetracks/min; RiboReport/scripts/mapping.py --mapping_style first_base_only --bam_path {input.bam} --wiggle_file_path fiveprimetracks/ --no_of_aligned_reads_file_path {input.stats} --library_name {params.prefix};"

SnakeMake From line 270 of rules/visualization.smk

shell:
    "wigToBigWig {input.fwd} {input.genomeSize} {output.fwd}"

SnakeMake wigToBigWig From line 282 of rules/visualization.smk

shell:
    "wigToBigWig {input.rev} {input.genomeSize} {output.rev}"

SnakeMake wigToBigWig From line 294 of rules/visualization.smk

shell:
    "wigToBigWig {input.fwd} {input.genomeSize} {output.fwd}"

SnakeMake wigToBigWig From line 306 of rules/visualization.smk

shell:
    "wigToBigWig {input.rev} {input.genomeSize} {output.rev}"

SnakeMake wigToBigWig From line 318 of rules/visualization.smk

shell:
    "wigToBigWig {input.fwd} {input.genomeSize} {output.fwd}"

SnakeMake wigToBigWig From line 330 of rules/visualization.smk

shell:
    "wigToBigWig {input.rev} {input.genomeSize} {output.rev}"

SnakeMake wigToBigWig From line 342 of rules/visualization.smk

shell:
    "mkdir -p threeprimetracks; mkdir -p threeprimetracks/raw; mkdir -p threeprimetracks/mil; mkdir -p threeprimetracks/min; RiboReport/scripts/mapping.py --mapping_style last_base_only --bam_path {input.bam} --wiggle_file_path threeprimetracks/ --no_of_aligned_reads_file_path {input.stats} --library_name {params.prefix};"

SnakeMake From line 364 of rules/visualization.smk

shell:
    "wigToBigWig {input.fwd} {input.genomeSize} {output.fwd}"

SnakeMake wigToBigWig From line 376 of rules/visualization.smk

shell:
    "wigToBigWig {input.rev} {input.genomeSize} {output.rev}"

SnakeMake wigToBigWig From line 388 of rules/visualization.smk

shell:
    "wigToBigWig {input.fwd} {input.genomeSize} {output.fwd}"

SnakeMake wigToBigWig From line 400 of rules/visualization.smk

shell:
    "wigToBigWig {input.rev} {input.genomeSize} {output.rev}"

SnakeMake wigToBigWig From line 412 of rules/visualization.smk

shell:
    "wigToBigWig {input.fwd} {input.genomeSize} {output.fwd}"

SnakeMake wigToBigWig From line 424 of rules/visualization.smk

shell:
    "wigToBigWig {input.rev} {input.genomeSize} {output.rev}"

SnakeMake wigToBigWig From line 436 of rules/visualization.smk

shell:
    "mkdir -p tracks; multiBamSummary bins --smartLabels --bamfiles maplink/*.bam -o {output} -p {threads};"

SnakeMake From line 447 of rules/visualization.smk

shell:
    "mkdir -p figures; plotCorrelation -in {input.npz} --corMethod spearman --skipZeros --plotTitle \"Spearman Correlation of Read Counts\" --whatToPlot heatmap --colorMap RdYlBu --plotNumbers -o {output.correlation} --outFileCorMatrix SpearmanCorr_readCounts.tab"

SnakeMake From line 458 of rules/visualization.smk

shell:
    "mkdir -p tracks; cat {input[0]} | grep -v '\tgene\t' > tracks/annotation-woGenes.gtf; gtf2bed < tracks/annotation-woGenes.gtf > tracks/annotation.bed"

SnakeMake GFFutils From line 469 of rules/visualization.smk

shell:
    "mkdir -p tracks; cut -f1-6 {input[0]} > tracks/annotationNScore.bed6;  awk '{{$5=1 ; print ;}}' tracks/annotation.bed6 > tracks/annotation.bed6; bedToBigBed -type=bed6 -tab tracks/annotation.bed6 {input[1]} tracks/annotation.bb"