Using RNA-Seq data to improve microRNA target prediction accuracy in animals

public 1yr ago Version: v1.3.0 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

FilTar is a tool to integrate RNA-Seq data to pre-existing miRNA target prediction workflows in order to increase prediction accuracy.

It achieves this by:

Removing transcripts which are not expressed or poorly expressed for a given cell type or tissue
Generating 3'UTR annotations specific to a given cell type or tissue

It also operates as a fully functional wrapper around the pre-existing TargetScan7 and miRanda target prediction workflows.

Installation

Instructions on how to install FilTar can be found at the following location: https://tbradley27.github.io/FilTar/

Basic Usage

FilTar can be used by following 2 steps:

Specify the options you would like to use to run FilTar by editing config/basic.yaml .
Run the following command:

snakemake --use-conda --cores $N target_predictions.txt

After running the command, all target predictions are contained inside target_predictions.txt .

The following video presents a concise demonstration of basic FilTar usage:

https://www.youtube.com/watch?v=Xhl-nsg7_xo

More detailed instructions can be found inside the full documentation: https://tbradley27.github.io/FilTar/

Publication

The article describing FilTar can be found in Volume 36, Issue 8 (pages 2410-2416) of the Bioinformatics journal published by Oxford University Press. An online, open access version of the article is available here .

DOI: 10.1093/bioinformatics/btaa007

PMCID: PMC7178423

PMID: 31930382

The default method of citing the article is to use the following:

Thomas Bradley, Simon Moxon, FilTar: using RNA-Seq data to improve microRNA target prediction accuracy in animals, Bioinformatics, Volume 36, Issue 8, 15 April 2020, Pages 2410–2416, https://doi.org/10.1093/bioinformatics/btaa007

Getting Help

In order to ensure your enquiries are seen by the most people possible who may be sharing your problem, it is best to share the problems that you are having publicly. The first port of call is to post questions on the biostars bioinformatics online forum (https://www.biostars.org/). If using biostars, please make sure to use the 'filtar' tag when asking questions to notify me, so I can answer promptly.

If this option doesn't work for whatever reason, I happen to accept correspondence via my academic email address: [email protected]

Reporting bugs, suggested enhancements or any other issues

The issues page of this repository is the best place to post this.

Contributions and Acknowledgements

Simox Moxon came up with the original idea and project proposal for FilTar. The FilTar concept was extended and developed further between Simon Moxon and Thomas Bradley through the course of the latter's BBSRC (Biotechnology and Biological Sciences Research Council) PhD Studentship on the Norwich Research Park (NRP) Bioscience Doctoral Training Partnership (DTP) programme, when Thomas Bradley worked under the primary supervision of Simon Moxon initially predominantly at the Earlham Institute, and then later predominantly at the School of Biological Sciences, University of East Anglia.

Full acknowledgements can be found within the preprinted article associated with FilTar.

Code Snippets

__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "[email protected]"
__license__ = "MIT"


import os
from snakemake.shell import shell


prefix = os.path.splitext(snakemake.output[0])[0]

shell(
    "samtools sort {snakemake.params} -@ {snakemake.threads} -o {snakemake.output[0]} "
    "-T {prefix} {snakemake.input[0]}")

Python Snakemake SAMtools From line 1 of sort/wrapper.py

import sys
import os

if 'single_end' in snakemake.output[0]:
	os.system("wget -nv --directory-prefix=data/single_end/ ftp://ftp.sra.ebi.ac.uk/vol1/fastq/{}/00{}/{}/{}.fastq.gz || wget -nv --directory-prefix=data/single_end/ ftp://ftp.sra.ebi.ac.uk/vol1/fastq/{}/{}/{}.fastq.gz".format(snakemake.wildcards['accession'][0:6],snakemake.wildcards['accession'][-1],snakemake.wildcards['accession'],snakemake.wildcards['accession'],snakemake.wildcards['accession'][0:6],snakemake.wildcards['accession'],snakemake.wildcards['accession']))
elif 'paired_end' in snakemake.output[0]:
	os.system("wget -nv --directory-prefix=data/paired_end/ ftp://ftp.sra.ebi.ac.uk/vol1/fastq/{}/00{}/{}/{}_{}.fastq.gz || wget -nv --directory-prefix=data/paired_end/ ftp://ftp.sra.ebi.ac.uk/vol1/fastq/{}/{}/{}_{}.fastq.gz".format(snakemake.wildcards['accession'][0:6],snakemake.wildcards['accession'][-1],snakemake.wildcards['accession'],snakemake.wildcards['accession'],snakemake.wildcards['mate_number'],snakemake.wildcards['accession'][0:6],snakemake.wildcards['accession'],snakemake.wildcards['accession'],snakemake.wildcards['mate_number']))

Python From line 17 of ENA/download_fastq.py

script: "download_fastq.py" 

SnakeMake From line 24 of ENA/Snakefile

script: "download_fastq.py"

SnakeMake From line 30 of ENA/Snakefile

shell: "rsync -av rsync://ftp.ensembl.org/ensembl/pub/release-{config[ensembl_release]}/gtf/{params}/{wildcards.genus_species}.{wildcards.build}.{config[ensembl_release]}.chr.gtf.gz data/ && gunzip data/{wildcards.genus_species}.{wildcards.build}.{config[ensembl_release]}.chr.gtf.gz && sed 's/^chr//g' data/{wildcards.genus_species}.{wildcards.build}.{config[ensembl_release]}.chr.gtf > tmp && mv tmp data/{wildcards.genus_species}.{wildcards.build}.{config[ensembl_release]}.chr.gtf"

SnakeMake From line 33 of data_download/Snakefile

        shell: "sed -r 's/^MT//g' {input} | sed '/^\t/d'  > {output}" # delete mitochondiral records and records not assigned to a chromosome

rule get_transcript_ids:
        input:
                gtf=get_gtf_file
        output: "results/bed/{species}_chr{chrom}_all_transcripts.txt"

SnakeMake From line 38 of data_download/Snakefile

shell: "scripts/get_all_transcripts.sh {input} {wildcards.chrom} > {output}"

SnakeMake From line 44 of data_download/Snakefile

shell: "rsync -av rsync://ftp.ensembl.org/ensembl/pub/release-{config[ensembl_release]}/fasta/{params}/dna/{wildcards.genus_species}.{wildcards.build}.dna.toplevel.fa.gz data/"

SnakeMake From line 51 of data_download/Snakefile

shell: "rsync -av rsync://ftp.ensembl.org/ensembl/pub/release-{config[ensembl_release]}/fasta/{params}/dna/{wildcards.genus_species}.{wildcards.build}.dna.primary_assembly.fa.gz data/"

SnakeMake From line 56 of data_download/Snakefile

shell: "rsync -av rsync://ftp.ensembl.org/ensembl/pub/release-{config[ensembl_release]}/fasta/{params}/dna/{wildcards.genus_species}.{wildcards.build}.dna.chromosome.{wildcards.chrom}.fa.gz data/"

SnakeMake From line 62 of data_download/Snakefile

shell: "pigz -p {threads} -d {input}"

SnakeMake From line 69 of data_download/Snakefile

shell:
   "wget --directory-prefix=data/maf_hsa/ -nv http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/maf/chr{wildcards.chrom}.maf.gz && gunzip {output}.gz"

SnakeMake From line 75 of data_download/Snakefile

shell:
   "wget --directory-prefix=data/maf_mmu/ -nv http://hgdownload.cse.ucsc.edu/goldenPath/mm10/multiz60way/maf/chr{wildcards.chrom}.maf.gz && gunzip {output}.gz"

SnakeMake From line 82 of data_download/Snakefile

shell: "rsync -av rsync://ftp.ensembl.org/ensembl/pub/release-{config[ensembl_release]}/fasta/{params}/cdna/{wildcards.genus_species}.{wildcards.build}.cdna.all.fa.gz data"

SnakeMake From line 88 of data_download/Snakefile

shell: "gunzip {input}"

SnakeMake From line 93 of data_download/Snakefile

shell: 'wget https://sourceforge.net/projects/apatrap/files/APAtrap_Linux.zip/download && mv download scripts && unzip -d scripts/ scripts/download'

SnakeMake From line 97 of data_download/Snakefile

shell: "fasterq-dump -O data/single_end/ --threads {threads} {wildcards.accession}"

SnakeMake From line 24 of SRAtoolkit/Snakefile

shell: "fasterq-dump -O data/paired_end/ --threads {threads} {wildcards.accession}"

SnakeMake From line 31 of SRAtoolkit/Snakefile

shell: "gzip {input}"

SnakeMake From line 36 of SRAtoolkit/Snakefile

shell: "gzip {input.file1} {input.file2}"

SnakeMake From line 45 of SRAtoolkit/Snakefile

library(tidyverse)

# load in target predictions

target_predictions = readr::read_tsv(snakemake@input[['targets']])

utr_genomic_positions = readr::read_tsv(snakemake@input[['bed_file']], col_names=c('chromosome','start','stop','strand','transcript'), 
	col_types='ciicc')

utr_genomic_positions = utr_genomic_positions[!duplicated(utr_genomic_positions$transcript),]

print(target_predictions)
print(utr_genomic_positions)

target_predictions = dplyr::inner_join(target_predictions, utr_genomic_positions, by=c(`Gene ID`='transcript'))
target_predictions$genomic_start = target_predictions$start + target_predictions$`UTR start`
target_predictions$genomic_end = target_predictions$start + target_predictions$`UTR end`

print(target_predictions %>% select(`Gene ID`,`Mirbase ID`,`UTR start`,`UTR end`,start,stop,genomic_start,genomic_end), width=Inf)

write.table(x=target_predictions, file=snakemake@output[[1]], sep='\t', quote=FALSE, col.names=TRUE, row.names=FALSE)

R tidyverse From line 3 of get_target_coordinates/get_target_genomic_coordinates.R

script: 'get_target_genomic_coordinates.R'

SnakeMake From line 8 of get_target_coordinates/Snakefile

import sys
import functools
from functools import reduce
import re
import os
from Bio import AlignIO
from Bio.AlignIO import MafIO
from subprocess import call

def add_alignment():
           global start_pos
           global end_pos
           if strand == -1:
               start_pos = start_pos[::-1] # Reverse Order
               end_pos = end_pos[::-1]
           else:
               pass

           new_multiple_alignment = idx.get_spliced(start_pos, end_pos, strand) # splice through the index
           for i in range(len(new_multiple_alignment)):
              new_multiple_alignment[i].id = re.sub('\..*','', new_multiple_alignment[i].id) # strip chomosome information from id
              new_multiple_alignment[i].id = accession + '\t' + new_multiple_alignment[i].id # label all alignments with relevant transcript accession

           global big_alignment   
           big_alignment += new_multiple_alignment.format('fasta')
           return()

if os.stat(snakemake.input['bed']).st_size == 0:
     target = open(snakemake.output[0], 'w')
else:
    if snakemake.wildcards['species'] == "hsa":    #Identify the species identifier passed through the command line
       build = "hg38"
    elif snakemake.wildcards['species'] == "mmu":
       build = "mm10"
    else:
       build = ''

    idx = AlignIO.MafIO.MafIndex(snakemake.input['maf_index'], snakemake.input['maf'], "{}.chr{}".format(build, snakemake.wildcards['chrom'])  )

    start_pos = []
    end_pos = []
    accession = 'empty'
    with open(snakemake.input['bed'] ) as f:
       big_alignment = ''
       for line in f:    #Open and loop through line-by-line the relevant BED file
          parts = line.split()  

          if accession == 'empty':
             accession = parts[4]   
             start_pos.append(int(parts[1]))
             end_pos.append(int(parts[2]))
             strand = (int(parts[3]))

          elif parts[4] == accession:       # If accession has multiple entries in bed, add additional genome co-ordinates to rel. lists
             start_pos.append(int(parts[1]))
             end_pos.append(int(parts[2]))
             strand = (int(parts[3]))

          else:
             add_alignment()

             start_pos = [] # initialise a new transcript record
             end_pos = []
             start_pos.append(int(parts[1]))
             end_pos.append(int(parts[2]))
             strand = (int(parts[3]))
             accession = parts[4]
       else:

         add_alignment()     

       result = reduce(lambda x, y: x.replace(y, snakemake.config["TaxID"][y]), snakemake.config["TaxID"], big_alignment) # get NCBI taxonomic IDs
       result = result.replace('\n','') # convert from fasta to tsv
       result = result.replace('>','\n') # convert from fasta to tsv
       result = re.sub('(\s[0-9]{4,7})',r'\1\t',result) # convert from fasta to tsv
       result = re.sub('\n','',result, count=1) # remove leading empty line

       target = open(snakemake.output[0], 'w')

       for line in iter(result.splitlines()):
          pattern = re.compile('\.[0-9][0-9]?\sN+')
          pattern2 = re.compile('T[0-9]+\sN+')
          pattern3 = re.compile('^N+$') # when ref transcript is unknown - it leaves a trailing lines of Ns which need to be removed
          if 'delete' in line:
              pass
          elif 'unknown' in line:
              pass
          elif pattern.search(line):
              pass
          elif pattern2.search(line):
              pass
          elif pattern3.search(line):
              pass
          else:
              line2 = line + '\n'
              target.write(line2)

Python Biopython From line 19 of with_conservation/biopython_maf_processing2.py

import sys
import re
from Bio import AlignIO
from Bio.AlignIO import MafIO

if snakemake.wildcards['species'] == "hsa":
       build = "hg38" 
elif snakemake.wildcards['species'] == "mmu":
	build = "mm10"
else:
	build = ''

idx = AlignIO.MafIO.MafIndex(snakemake.output[0],  snakemake.input[0], "{}.chr{}".format(build, snakemake.wildcards['chrom'])  )

Python Biopython From line 3 of with_conservation/biopython_maf_processing.py

script:
   "biopython_maf_processing.py"

SnakeMake From line 23 of with_conservation/Snakefile

script: 
   "biopython_maf_processing2.py"

SnakeMake From line 36 of with_conservation/Snakefile

if ( file.info(snakemake@input[[1]])$size == 0 ) {
        file.copy(from=snakemake@input[[1]], to=snakemake@output[[1]])
} else {
	output = filtar::MergeFasta(snakemake@input[[1]])
	write.table(output, snakemake@output[[1]], quote=FALSE, sep="\t", col.names=FALSE, row.names=FALSE)
}

R From line 3 of without_conservation/merge_fasta.R

shell: "scripts/get_bedtools_bed.sh {input} {output}"

SnakeMake From line 29 of without_conservation/Snakefile

shell: "bedtools getfasta -name -s -fi {input.dna} -bed {input.bed} -fo {output}"

SnakeMake BEDTools From line 37 of without_conservation/Snakefile

script: "merge_fasta.R"

SnakeMake From line 42 of without_conservation/Snakefile

shell: "scripts/convert_fa_to_tsv2.sh {input} {params} {output}"

SnakeMake From line 48 of without_conservation/Snakefile

from Bio import SeqIO

records = list(SeqIO.parse(snakemake.input[0],'fasta'))
filtered_records = []

for record in records:
	if record.id in snakemake.config['mirnas']:
		filtered_records.append(record)
	elif len(snakemake.config['mirnas']) == 0: # use all records if mirna config entry is empty
		filtered_records.append(record)
	else:
		pass

SeqIO.write(filtered_records,snakemake.output[0],'fasta')

Python Biopython From line 17 of mirna/filter_mirna_fasta.py

shell: "grep -A 1 {wildcards.species} {input} | awk '{{ print $1 }}' | sed 's/--//g' | sed '/^$/d' > {output}"

SnakeMake From line 26 of mirna/Snakefile

script: "filter_mirna_fasta.py"

SnakeMake From line 32 of mirna/Snakefile

united_quant = readr::read_tsv(
        file=snakemake@input[[1]],
        col_types=readr::cols(.default = 'd', Name = 'c')
)

united_quant2 = filtar::AvgSalmonQuant(united_quant)

avg_quant = united_quant2[,c('Name','avg')]
colnames(avg_quant) = c('Name','TPM')

write.table(
        x=avg_quant,
        file=snakemake@output[[1]],
        row.names=FALSE,
        col.names=TRUE,
        quote=FALSE,
        sep="\t"
)      

R From line 17 of salmon/get_average_quant2.R

united_quant = readr::read_tsv(
        file=snakemake@input[[1]],
        col_types=readr::cols(.default = 'd', Name = 'c')
)

united_quant2 = filtar::AvgSalmonQuant(united_quant)

avg_quant = tibble::tibble(Name=united_quant2$Name, Length=20, EffectiveLength=20.00, TPM=united_quant2$avg, NumReads=20.00)

real_output = paste(snakemake@output[[1]],'quant.sf',sep="/")
dir.create(snakemake@output[[1]])

write.table(
        x=avg_quant,
        file=real_output,
        row.names=FALSE,
        col.names=TRUE,
        quote=FALSE,
        sep="\t"
)      

R From line 17 of salmon/get_average_quant.R

import sys
import os

if len(snakemake.input) == 2:
        os.system('salmon quant --validateMappings --rangeFactorizationBins 4 --seqBias --posBias -p {} -i {} -l A -r {} -o {}'.format(snakemake.threads,snakemake.input['index'],snakemake.input['reads'][0],snakemake.output))
else:
        os.system('salmon quant --validateMappings --rangeFactorizationBins 4 --seqBias --posBias -p {} -i {} -l A -1 {} -2 {} -o {}'.format(snakemake.threads,snakemake.input['index'],snakemake.input['reads'][0],snakemake.input['reads'][1],snakemake.output))     

Python Quant Salmon From line 17 of salmon/quant_salmon.py

shell:
    "salmon index --threads {threads} -t {input} -i {output} --type quasi -k 31"

SnakeMake Salmon From line 30 of salmon/Snakefile

shell:
    "salmon index --threads {threads} -t {input} -i {output} --type quasi -k 31"

SnakeMake Salmon From line 40 of salmon/Snakefile

script:
	"quant_salmon.py"

SnakeMake From line 66 of salmon/Snakefile

script:
	"quant_salmon.py"

SnakeMake From line 79 of salmon/Snakefile

shell: "grep 'expected' {input}/lib_format_counts.json | awk '{{print $2}}' | sed 's/\"//g' | sed 's/,//g' > {output}"

SnakeMake From line 87 of salmon/Snakefile

shell: "salmon quantmerge --quants {input} --names {input} -o {output}"

SnakeMake Salmon From line 101 of salmon/Snakefile

script: "get_average_quant.R"

SnakeMake From line 108 of salmon/Snakefile

shell: "salmon quantmerge --quants {input} --names {input} -o {output}"

SnakeMake Salmon From line 122 of salmon/Snakefile

script: "get_average_quant2.R"

SnakeMake From line 127 of salmon/Snakefile

data = readr::read_tsv(snakemake@input[[1]], col_names=FALSE)
data = tidyr::separate(data, X5, into=c('X5.1','X5.2'), remove=TRUE)
data = tidyr::separate(data, X6, into=c('X6.1','X6.2'), remove=TRUE)

colnames(data) = c('miRNA_ID','transcript_ID','score','energy(kCal/Mol)','miRNA_start','miRNA_end','3UTR_start','3UTR_end','alignment_length','percent_matches', 'percent_matches_and_wobbles')

write.table(
	x=data,
	file=snakemake@output[[1]],
	quote=FALSE,
	row.names=FALSE,
	col.names=TRUE,
	sep="\t"
)

R From line 1 of miRanda/add_miRanda_header.R

shell: "sed 's/(+)//g' {input} | sed 's/(-)//g' > {output}"

SnakeMake From line 34 of miRanda/Snakefile

shell: "miranda {input.mirna} {input.utr} {params} -sc {config[miRanda.minimum_alignment_score]} -en {config[miRanda.minimum_energy_score]} -scale {config[miRanda.5_prime_3_prime_scaling_factor]} -go {config[miRanda.alignment_gap_open_penalty]} -ge {config[miRanda.alignment_gap_extension_penalty]} > {output}"

SnakeMake miranda From line 43 of miRanda/Snakefile

shell: "scripts/convert_miRanda_to_tsv.sh {input} {output}"

SnakeMake From line 48 of miRanda/Snakefile

shell: "cat {input} > {output}"

SnakeMake From line 55 of miRanda/Snakefile

script: "add_miRanda_header.R"

SnakeMake From line 60 of miRanda/Snakefile

if ( file.info(snakemake@input[['miRanda_scores']])$size == 0  ) {
	file.create(snakemake@output[[1]])	
} else {

	filtered_miRanda_scores = filtar::filter_miRanda_scores(snakemake@input[['miRanda_scores']], snakemake@input[['expression_values']],snakemake@params[['tpm_expression_threshold']])
	write.table(filtered_miRanda_scores, snakemake@output[[1]], row.names=FALSE, col.names=TRUE, sep="\t", quote=FALSE)
}

R From line 1 of without_reannotation/filter_miRanda_scores.R

script: 'filter_miRanda_scores.R'

SnakeMake From line 8 of without_reannotation/Snakefile

script: 'filter_miRanda_scores.R'

SnakeMake From line 18 of without_reannotation/Snakefile

shell: 'cp {input} {output}'

SnakeMake From line 24 of without_reannotation/Snakefile

if ( file.info(snakemake@input[['miRanda_scores']])$size == 0  ) {
	file.create(snakemake@output[[1]])	
} else {

	filtered_miRanda_scores = filtar::filter_miRanda_scores(snakemake@input[['miRanda_scores']], snakemake@input[['expression_values']],snakemake@params[['tpm_expression_threshold']])
	write.table(filtered_miRanda_scores, snakemake@output[[1]], row.names=FALSE, col.names=TRUE, sep="\t", quote=FALSE)
}

R From line 1 of with_reannotation/filter_miRanda_scores.R

script: 'filter_miRanda_scores.R'

SnakeMake From line 8 of with_reannotation/Snakefile

script: 'filter_miRanda_scores.R'

SnakeMake From line 19 of with_reannotation/Snakefile

filtar::filter_mir_families(snakemake@input[['mature_mirnas']],snakemake@input[['mirna_families']], snakemake@config[["mirnas"]], snakemake@output[[1]])

R From line 1 of targetscan/filter_mir_families.R

filtar::filter_mature_mirs(snakemake@input[[1]], snakemake@config[["mirnas"]], snakemake@output[[1]])

R From line 1 of targetscan/filter_mir_for_context_scores.R

species = snakemake@wildcards$species

test = filtar::get_mirna_context(snakemake@input$mirna_seed, snakemake@input$mature_mirnas, snakemake@config$tax_ids[[species]])

readr::write_tsv(test, snakemake@output[[1]], col_names=FALSE)

R From line 1 of targetscan/get_mirna_context.R

species = snakemake@wildcards$species

mirna_seeds = filtar::get_mirna_family(snakemake@input[[1]], snakemake@wildcards$species, snakemake@config$tax_ids[[species]])

write.table(mirna_seeds, snakemake@output[[1]], sep="\t", quote=FALSE, col.names=FALSE, row.names=FALSE)

R From line 1 of targetscan/get_mirna_family.R

shell:
   "{input.script} {input.data} > {output}"

SnakeMake From line 28 of targetscan/Snakefile

script:
   "get_mirna_family.R"

SnakeMake From line 36 of targetscan/Snakefile

shell:
   "{input.script} {input.data} {wildcards.species} {params} > {output}"

SnakeMake From line 45 of targetscan/Snakefile

script:
   "get_mirna_context.R"

SnakeMake From line 54 of targetscan/Snakefile

script: "filter_mir_for_context_scores.R"

SnakeMake From line 60 of targetscan/Snakefile

script: "filter_mir_families.R"

SnakeMake From line 67 of targetscan/Snakefile

	shell:
           "{input.script} {input.mirna_families} {input.msa} {output}"

SnakeMake From line 78 of targetscan/Snakefile

shell:
    "{input.script} {input.msa_3UTR} {params.tax_id}  > {output}"

SnakeMake From line 89 of targetscan/Snakefile

shell:
    "{input.script} {input.mirna_family} {input.mirna_sites} {input.branch_lengths} > {output}"

SnakeMake From line 102 of targetscan/Snakefile

shell:
    "{input.script} {input.mirna_seeds} {input.CDS} > {output.eightmer_counts}"

SnakeMake From line 113 of targetscan/Snakefile

shell:
    "{input.script} {input.mirnas} {input.msa} {input.PCTs} {input.CDS_lengths} {input.eightmer_counts} {output} {params.tax_id} {input.AIRs} {params.RNAplfold_dir}"

SnakeMake From line 134 of targetscan/Snakefile

shell: "cat {input} | sed '1b;/Gene/d' > {output}"

SnakeMake From line 140 of targetscan/Snakefile

shell: "cat {input} | sed '1b;/Gene/d' > {output}"

SnakeMake From line 145 of targetscan/Snakefile

shell: "awk '{{ print $5}}' {input} | sort | uniq > {output}"

SnakeMake From line 152 of targetscan/Snakefile

       shell:
           "cd scripts/targetscan7 && wget http://www.targetscan.org/vert_72/vert_72_data_download/targetscan_70.zip && unzip targetscan_70.zip && rm UTR_Sequences_sample.txt miR_Family_info_sample.txt\
targetscan_70_output.txt README_70.txt targetscan_70.zip"

SnakeMake From line 158 of targetscan/Snakefile

      shell:
          "wget http://www.targetscan.org/vert_72/vert_72_data_download/targetscan_70_BL_PCT.zip && unzip targetscan_70_BL_PCT.zip && mv\
TargetScan7_BL_PCT/PCT_parameters data/ && mv TargetScan7_BL_PCT/targetscan_70_BL_bins.pl TargetScan7_BL_PCT/targetscan_70_BL_PCT.pl scripts/targetscan7 && rm -rf targetscan_70_BL_PCT.zip TargetScan7_BL_PCT/"

SnakeMake From line 168 of targetscan/Snakefile

shell:
  "wget http://www.targetscan.org/vert_72/vert_72_data_download/TargetScan7_context_scores.zip && unzip TargetScan7_context_scores.zip && mv TargetScan7_context_scores/Agarwal_2015_parameters.txt TargetScan7_context_scores/TA_SPS_by_seed_region.txt TargetScan7_context_scores/targetscan_count_8mers.pl TargetScan7_context_scores/targetscan_70_context_scores.pl scripts/targetscan7/ && rm -rf TargetScan7_context_scores/ TargetScan7_context_scores.zip"

SnakeMake From line 179 of targetscan/Snakefile

shell: "sed -e '40 s/9606/$ARGV[1]/g' {input} | sed -e '38 s;PCT_parameters;data\/PCT_parameters;g' > {output} && chmod +x {output}"

SnakeMake oschmod From line 185 of targetscan/Snakefile

shell: "sed -e '55 s;PCT_parameters;data\/PCT_parameters;g' {input} > {output} && chmod +x {output}"

SnakeMake oschmod From line 190 of targetscan/Snakefile

filtered_contextpp_scores = filtar::filter_contextpp_scores(snakemake@input[['contextpp_scores']], snakemake@input[['expression_values']], snakemake@params[['tpm_expression_threshold']])

write.table(filtered_contextpp_scores, snakemake@output[[1]], row.names=FALSE, col.names=TRUE, sep="\t", quote=FALSE)

R From line 1 of without_reannotation/filter_contextpp_scores.R

script: 'filter_contextpp_scores.R'

SnakeMake From line 8 of without_reannotation/Snakefile

script: 'filter_contextpp_scores.R'

SnakeMake From line 18 of without_reannotation/Snakefile

shell: "cp {input} {output}" 

SnakeMake From line 24 of without_reannotation/Snakefile

filtered_contextpp_scores = filtar::filter_contextpp_scores(snakemake@input[['contextpp_scores']], snakemake@input[['expression_values']], snakemake@params[['tpm_expression_threshold']])

write.table(filtered_contextpp_scores, snakemake@output[[1]], row.names=FALSE, col.names=TRUE, sep="\t", quote=FALSE)

R From line 1 of with_reannotation/filter_contextpp_scores.R

script: 'filter_contextpp_scores.R'

SnakeMake From line 8 of with_reannotation/Snakefile

script: 'filter_contextpp_scores.R'

SnakeMake From line 17 of with_reannotation/Snakefile

shell:
   "trim_galore --output_dir results/trimmed_fastq/  --length {config[trim_galore.length]} --stringency {config[trim_galore.stringency]} {input}"

SnakeMake Trim_Galore From line 35 of trim_reads/Snakefile

shell:
   "trim_galore --output_dir results/trimmed_fastq/ --length  {config[trim_galore.length]} --stringency {config[trim_galore.stringency]}  --paired {input[0]} {input[1]}"

SnakeMake Trim_Galore From line 46 of trim_reads/Snakefile

input = readr::read_tsv(snakemake@input[[1]], col_names=FALSE, col_types='ciiic')

if (length(snakemake@config[['transcripts']]) == 0) {

	file.copy(from=snakemake@input[[1]],to=snakemake@output[[1]])

} else {

	for (transcript in snakemake@config[['transcripts']]) {
		if (!transcript %in% input$X5) {
			write(stringr::str_interp("The transcript identifer '${transcript}' is not a valid identifier for the selected species for ensembl release ${snakemake@config[['ensembl_release']]}"), stderr())
		}
	}

	filtered_input = input[input$X5 %in% snakemake@config[['transcripts']],]

	if (dim(filtered_input)[1] == 0) {
		write("No valid transcript identifiers in input. Halting execution.", stderr())
		quit(save="no", status=1)
	}

	write.table(filtered_input,snakemake@output[[1]], col.names=FALSE, row.names=FALSE, quote=FALSE, sep="\t")
}

R From line 17 of without_reannotation/filter_bed6.R

AIRs = filtar::get_canonical_AIRs(snakemake@input[[1]])

write.table(AIRs, snakemake@output[[1]], quote=FALSE, row.names=FALSE, col.names=FALSE, sep="\t")

R From line 1 of without_reannotation/get_canonical_AIR.R

utr_lengths = filtar::get_utr_lengths(snakemake@input[[1]]) 

write.table(
	x=utr_lengths,
	file=snakemake@output[[1]],
	sep="\t",
	quote=FALSE,
	row.names=FALSE,
	col.names=TRUE
)

R From line 1 of without_reannotation/get_utr_lengths.R

shell:
    "{input.script} {input.gtf} {wildcards.feature} {output}"

SnakeMake From line 34 of without_reannotation/Snakefile

script: 'get_utr_lengths.R'

SnakeMake From line 40 of without_reannotation/Snakefile

script: 'get_canonical_AIR.R'

SnakeMake From line 45 of without_reannotation/Snakefile

script: 'get_canonical_AIR.R'

SnakeMake From line 50 of without_reannotation/Snakefile

shell: 'grep -E "^{wildcards.chrom}\s" {input} > {output}'

SnakeMake From line 57 of without_reannotation/Snakefile

script: 'filter_bed6.R'

SnakeMake From line 62 of without_reannotation/Snakefile

shell: 'grep -E "^{wildcards.chrom}\s" {input} > {output} || true'

SnakeMake From line 67 of without_reannotation/Snakefile

if ( file.info(snakemake@input$normal_bed)$size == 0 | file.info(snakemake@input$extended_bed)$size == 0 ) {
	file.copy(from=snakemake@input$normal_bed, to=snakemake@output[[1]])	
} else {
	full_set_sorted = filtar::get_full_bed(snakemake@input$normal_bed, snakemake@input$extended_bed, snakemake@input$all_transcripts, snakemake@input$tx_quant)
        write.table(full_set_sorted, file=snakemake@output[[1]], quote=FALSE, row.names=FALSE, col.names=FALSE, sep="\t")
}

R From line 3 of with_reannotation/extend_bed2.R

input = readr::read_tsv(snakemake@input[[1]], col_names=FALSE, col_types='ciiciciiiicc')

if (length(snakemake@config[['transcripts']]) == 0) {
        file.copy(from=snakemake@input[[1]],to=snakemake@output[[1]])
} else {
	transcripts = gsub('\\..*','', snakemake@config[['transcripts']])

	filtered_input = input[input$X4 %in% transcripts,]

	write.table(filtered_input,snakemake@output[[1]], col.names=FALSE, row.names=FALSE, quote=FALSE, sep="\t")
}

R From line 17 of with_reannotation/filter_bed12.R

input = readr::read_tsv(snakemake@input[[1]], col_names=FALSE, col_types='ciiic')

if (length(snakemake@config[['transcripts']]) == 0) {

	file.copy(from=snakemake@input[[1]],to=snakemake@output[[1]])

} else {

	for (transcript in snakemake@config[['transcripts']]) {
		if (!transcript %in% input$X5) {
			write(stringr::str_interp("The transcript identifer '${transcript}' is not a valid identifier for the selected species for ensembl release ${snakemake@config[['ensembl_release']]}"), stderr())
		}
	}

	filtered_input = input[input$X5 %in% snakemake@config[['transcripts']],]

	if (dim(filtered_input)[1] == 0) {
		write("No valid transcript identifiers in input. Halting execution.", stderr())
		quit(save="no", status=1)
	}

	write.table(filtered_input,snakemake@output[[1]], col.names=FALSE, row.names=FALSE, quote=FALSE, sep="\t")
}

R From line 17 of with_reannotation/filter_bed6.R

united_bedgraph = readr::read_tsv(
	file=snakemake@input[[1]],
	col_names=FALSE,
	col_types=readr::cols(.default = 'd', X1 = 'c', X2 = 'i', X3 = 'i')
)

united_bedgraph = filtar::AvgBedgraph(united_bedgraph)

united_bedgraph = united_bedgraph[,c('X1','X2','X3','avg')]

write.table(
	x=united_bedgraph,
	file=snakemake@output[[1]],
	row.names=FALSE,
	col.names=FALSE,
	quote=FALSE,
	sep="\t"
)

R From line 3 of with_reannotation/get_average_bedgraph.R

if ( file.info(snakemake@input[[2]])$size == 1 ) {
        file.copy(from=snakemake@input[[1]], to=snakemake@output[[1]])
} else {
	AIR_file = filtar::get_AIR_file(snakemake@input[[1]], snakemake@input[[2]])
	write.table(AIR_file, snakemake@output[[1]], sep="\t", col.names=FALSE, row.names=FALSE, quote=FALSE)
}

R From line 1 of with_reannotation/get_tissue_specific_APA_file.R

utr_lengths = filtar::get_utr_lengths(snakemake@input[[1]]) 

write.table(
	x=utr_lengths,
	file=snakemake@output[[1]],
	sep="\t",
	quote=FALSE,
	row.names=FALSE,
	col.names=TRUE
)

R From line 1 of with_reannotation/get_utr_lengths.R

import sys
import os

index_base_name = snakemake.input['index'].replace('.1.ht2','')

if len(snakemake.input['reads']) == 1:
	print('single_end_processing')
	os.system('hisat2 -x {} -p {} -U {} -S {} {}'.format(index_base_name,snakemake.threads,snakemake.input['reads'],snakemake.output,snakemake.params[0]))
else:
	print('paired_end_processing')
	os.system('hisat2 -x {} -p {} -1 {} -2 {} -S {} {}'.format(index_base_name,snakemake.threads,snakemake.input['reads'][0], snakemake.input['reads'][1], snakemake.output, snakemake.params[0]))

Python HISAT2 From line 1 of hisat2/map_reads.py

shell: "python {input.py_script} {input.gtf} > {output}"

SnakeMake From line 73 of hisat2/Snakefile

shell: "python {input.py_script} {input.gtf} > {output}"

SnakeMake From line 80 of hisat2/Snakefile

shell: "hisat2-build -f -p {threads} --ss {input.splice_sites} --exon {input.exons} {input.assembly} data/{wildcards.species}"

SnakeMake HISAT2 From line 90 of hisat2/Snakefile

shell: "hisat2-build -f -p {threads} {input.assembly} data/{wildcards.species}"

SnakeMake HISAT2 From line 99 of hisat2/Snakefile

script:
	"map_reads.py"

SnakeMake From line 115 of hisat2/Snakefile

shell: "samtools view -@ {threads} -Sb  {input}  >  {output}"

SnakeMake SAMtools From line 124 of hisat2/Snakefile

wrapper:
    "0.27.0/bio/samtools/sort"

SnakeMake From line 134 of hisat2/Snakefile

import os

if len(snakemake.input) > 1:
	os.system("bedtools unionbedg -i {} > {}".format(snakemake.input, snakemake.output))
else:
	os.system("cp {} {}".format(snakemake.input, snakemake.output))

Python BEDTools From line 1 of with_reannotation/merge_bedgraphs.py

shell:
    "{input.script} {input.gtf} {output}"

SnakeMake From line 49 of with_reannotation/Snakefile

script: 'filter_bed6.R'

SnakeMake From line 55 of with_reannotation/Snakefile

shell: 'grep -E "^{wildcards.chrom}\s" {input} > {output} || true'

SnakeMake From line 60 of with_reannotation/Snakefile

shell:
    "gtfToGenePred {input.gtf} {output}"

SnakeMake gtftogenepred From line 70 of with_reannotation/Snakefile

shell:
    "genePredToBed {input.genepred} {output}"

SnakeMake From line 81 of with_reannotation/Snakefile

shell:
    "genomeCoverageBed -bg -ibam {input.sorted_bam} -split > {output}"

SnakeMake From line 94 of with_reannotation/Snakefile

shell: 'grep -E "^{wildcards.chrom}\s" {input} > {output}'

SnakeMake From line 100 of with_reannotation/Snakefile

script: "merge_bedgraphs.py"

SnakeMake From line 118 of with_reannotation/Snakefile

script:
        "get_average_bedgraph.R"

SnakeMake From line 127 of with_reannotation/Snakefile

script: "merge_bedgraphs.py"

SnakeMake From line 138 of with_reannotation/Snakefile

script:
        "get_average_bedgraph.R"

SnakeMake From line 147 of with_reannotation/Snakefile

shell: 'grep -E "^{wildcards.chrom}\s" {input} > {output}'

SnakeMake From line 153 of with_reannotation/Snakefile

script: "filter_bed12.R"

SnakeMake From line 158 of with_reannotation/Snakefile

shell: "./{input.script} -i {input.bedgraphs} -p {config[APAtrap.min_proportion_of_valid_nucleotides_in_window]} -c {config[APAtrap.min_window_coverage]} -w {config[APAtrap.window_size]} -e {config[APAtrap.utr_extension_size]} -m {input.bed} -o {output}"

SnakeMake From line 167 of with_reannotation/Snakefile

script:
    "extend_bed2.R"

SnakeMake From line 177 of with_reannotation/Snakefile

shell: "cat {input} > {output}"

SnakeMake From line 183 of with_reannotation/Snakefile

shell:
   "./{input.script} -i {input.bedgraphs} -g 1 -n 1 -d {config[APAtrap.min_cov_variation_between_APA_sites]} -c {config[APAtrap.min_average_cov]} -a {config[APAtrap.min_distance_between_APA_sites]} -w {config[APAtrap.predictAPA_window_size]} -u {input.bed}  -o {output}"

SnakeMake From line 194 of with_reannotation/Snakefile

shell: "cat {input} | sed '1b;/Gene/d' > {output}"

SnakeMake From line 200 of with_reannotation/Snakefile

script: 'get_utr_lengths.R'

SnakeMake From line 205 of with_reannotation/Snakefile

script: 'get_utr_lengths.R'

SnakeMake From line 210 of with_reannotation/Snakefile

script: "get_tissue_specific_APA_file.R"

SnakeMake From line 215 of with_reannotation/Snakefile

shell:
    "{input.script} {input.gtf} CDS {output}"

SnakeMake From line 225 of with_reannotation/Snakefile

script: "filter_bed6.R"

SnakeMake From line 231 of with_reannotation/Snakefile

shell: 'grep -E "^{wildcards.chrom}\s" {input} > {output} || true'

SnakeMake From line 236 of with_reannotation/Snakefile

sed 's/(+)//g' $1 | sed 's/(-)//g' | sed 's/(-)//g' | tr '\n' '\t' | sed 's/>/\n/g' | sed '/^$/d' | awk -v species=$2 '{OFS="\t"}{ print $1,species,$2}' > $3

Shell From line 1 of scripts/convert_fa_to_tsv2.sh

grep 'hit' $1
command=$? # get exit code

if [[ $command -eq 0  ]]
then
	grep -A 1 'hit' $1 | sed '/--/d' | grep '>' | sed 's/>//' > $2
else
	touch $2
fi

Shell From line 1 of scripts/convert_miRanda_to_tsv.sh

cat $1 | awk '$3 == "transcript"' | awk '{OFS="\t"}{print $1,$14,$16}' | sed 's/"//g' | sed 's/;//g' | sed 's/\t/\./2' | grep -E "^$2	" | awk '{print $2}'

Shell From line 1 of scripts/get_all_transcripts.sh

awk '{OFS="\t"}{ print $1,$2,$3,$5,$5,$4}' $1 | sed 's/\t1$/\t+/g' | sed 's/\t-1$/\t-/g' > $2

Shell From line 1 of scripts/get_bedtools_bed.sh

ShowHide 134 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://tbradley27.github.io/FilTar/

Name: filtar

Version: v1.3.0

Badge:

Insert copied code into your website to add a link to this workflow.

License: GNU General Public License v3.0

Keywords:

miranda BEDTools Biopython gtftogenepred HISAT2 Quant Salmon SAMtools Snakemake tidyverse oschmod Trim_Galore

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free