Automated Transcript Assembly Pipeline with Dependencies

public public 1yr ago 0 bookmarks

yetAnotherAutoTranscriptAssemblyPipeline

Requirements

  • ffq v0.2.1

  • FastQC v0.11.8

  • BBDuk v35.85

  • Kraken2 v2.1.2

  • ContFree-NGS.py v1.0

  • Trinity v2.8.5

  • CD-HIT-EST v4.8.1

  • BUSCO v5

  • transrate v1.0.3

  • Salmon v1.3.0

  • Python 3.x

References

Köster, J., Rahmann, S. (2012) Snakemake - a scalable bioinformatics workflow engine, Bioinformatics, Volume 28, Issue 19, 1 October 2012, Pag 2520–2522 - https://doi.org/10.1093/bioinformatics/bts480

Code Snippets

72
73
74
75
76
77
78
79
shell:
	"""
	cd MyAssembly_{params.genotype}/1_raw_reads_in_fastq_format && \
	{ffq} --ftp {wildcards.sample} | grep -Eo '\"url\": \"[^\"]*\"' | grep -o '\"[^\"]*\"$' | xargs wget && \
	gzip -dc < {wildcards.sample}_1.fastq.gz > {wildcards.sample}_1.fastq && \
	gzip -dc < {wildcards.sample}_2.fastq.gz > {wildcards.sample}_2.fastq && \
	cd -
	"""
SnakeMake From line 72 of main/Snakefile
102
103
104
shell:
	"{fastqc} -f fastq {input.R1} -o MyAssembly_{params.genotype}/2_raw_reads_fastqc_reports 2> {log};"
	"{fastqc} -f fastq {input.R2} -o MyAssembly_{params.genotype}/2_raw_reads_fastqc_reports 2> {log}"
SnakeMake From line 102 of main/Snakefile
123
124
125
126
shell:
	"""
	/usr/bin/time -v {salmon} index -t {input.transcriptome} -p {threads} -i {output.salmon_index} > {log} 2>&1
	"""
SnakeMake From line 123 of main/Snakefile
149
150
151
152
shell:
	"""
	/usr/bin/time -v {salmon} quant -p {threads} -i {input.salmon_index} -l A -1 {input.R1} -2 {input.R2} -o datasets_{params.genotype}/3_salmon/quant/{wildcards.sample} > {log} 2>&1
	"""
172
173
174
175
176
177
178
179
shell:
	"""
	{jq} -r '.library_types[]' {input.meta_info} > MyAssembly_{params.genotype}/3_salmon/quant/lib.txt 2>> {log}
	ls MyAssembly_{wildcards.genotype}/3_salmon/quant/ | grep -E "(SRR|ERR)" > MyAssembly_{params.genotype}/3_salmon/quant/id.txt 2>> {log}
	paste MyAssembly_{params.genotype}/3_salmon/quant/id.txt MyAssembly_{params.genotype}/3_salmon/quant/lib.txt -d, > MyAssembly_{params.genotype}/3_salmon/quant/stranded_status.csv 2>> {log}
	grep .S MyAssembly_{params.genotype}/3_salmon/quant/stranded_status.csv > MyAssembly_{params.genotype}/3_salmon/quant/stranded_samples.csv 2>> {log}
	cut -f1 -d, MyAssembly_{params.genotype}/3_salmon/quant/stranded_samples.csv | paste -s -d, > MyAssembly_{params.genotype}/3_salmon/quant/{params.genotype}_srrlist.csv 2>> {log}
	"""
SnakeMake From line 172 of main/Snakefile
215
216
217
218
219
220
221
222
223
224
225
226
227
228
shell:
	"{bbduk} -Xmx40g threads={threads} in1={input.R1} in2={input.R2} "
	"refstats={output.refstats} stats={output.stats} "
	"out1={output.R1} out2={output.R2} "
	"rref=/Storage/progs/bbmap_35.85/resources/adapters.fa "
	"fref=/Storage/progs/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta,"
	"/Storage/progs/sortmerna-2.1b/rRNA_databases/silva-bac-16s-id90.fasta,"
	"/Storage/progs/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta,"
	"/Storage/progs/sortmerna-2.1b/rRNA_databases/silva-bac-23s-id98.fasta,"
	"/Storage/progs/sortmerna-2.1b/rRNA_databases/silva-arc-16s-id95.fasta,"
	"/Storage/progs/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta,"
	"/Storage/progs/sortmerna-2.1b/rRNA_databases/silva-arc-23s-id98.fasta,"
	"/Storage/progs/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta "
	"minlength=75 qtrim=w trimq=20 tpe tbo 2> {log}"
SnakeMake From line 215 of main/Snakefile
251
252
253
shell:
	"{fastqc} -f fastq {input.R1} -o MyAssembly_{params.genotype}/4_trimmed_reads_fastqc_reports 2> {log};"
	"{fastqc} -f fastq {input.R2} -o MyAssembly_{params.genotype}/4_trimmed_reads_fastqc_reports 2> {log}"
SnakeMake From line 251 of main/Snakefile
273
274
275
shell:
	"{kraken2} --db /Storage/data1/felipe.peres/kraken2/completeDB "
	"--threads {threads} --report-zero-counts --confidence 0.05 --output {output} --paired {input.R1} {input.R2} 2> {log}"
SnakeMake From line 273 of main/Snakefile
296
297
shell:
	"split --number=l/10 -d --additional-suffix=.kraken {input} MyAssembly_{params.genotype}/5_trimmed_reads_kraken_reports/parts/{params.identificator}.trimmed_ 2> {log}"
SnakeMake From line 296 of main/Snakefile
318
319
shell:
	"{create_index} -R1 {input.R1} -R2 {input.R2} -o MyAssembly_{params.genotype}/6_contamination_removal/index/ 2> {log}"
SnakeMake From line 318 of main/Snakefile
343
344
shell:
	"python3.8 {contfree_ngs} --taxonomy {input.kraken_file} --s p --R1 {input.R1} --R2 {input.R2} --taxon Viridiplantae -o MyAssembly_{params.genotype}/6_contamination_removal/parts/ 2> {log};"
SnakeMake From line 343 of main/Snakefile
368
369
370
371
372
shell:
	"cat {input.filtered_parts_R1} >> {output.filtered_total_R1};"
	"cat {input.filtered_parts_R2} >> {output.filtered_total_R2};"
	"cat {input.unclassified_parts_R1} >> {output.unclassified_total_R1};"
	"cat {input.unclassified_parts_R2} >> {output.unclassified_total_R2}"
SnakeMake From line 368 of main/Snakefile
405
406
407
shell:
	"/usr/bin/time -v {trinity} --seqType fq --left {params.filtered_total_R1},{params.unclassified_total_R1} --right {params.filtered_total_R2},{params.unclassified_total_R2} --SS_lib_type RF --max_memory 10G --min_contig_length 200 --CPU {threads} --output 7_trinity_assembly/MyAssembly_{params.genotype}_trinity_k25 --full_cleanup --no_normalize_reads --KMER_SIZE 25 2> {log.k25};"
	"/usr/bin/time -v {trinity} --seqType fq --left {params.filtered_total_R1},{params.unclassified_total_R1} --right {params.filtered_total_R2},{params.unclassified_total_R2} --SS_lib_type RF --max_memory 10G --min_contig_length 200 --CPU {threads} --output 7_trinity_assembly/MyAssembly_{params.genotype}_trinity_k25 --full_cleanup --no_normalize_reads --KMER_SIZE 31 2> {log.k31}"
SnakeMake From line 405 of main/Snakefile
430
431
432
433
434
shell:
	"sed 's/>/>k25_{params.genotype}_/' {input.k25} > {output.mod_k25};"
	"sed 's/>/>k31_{params.genotype}_/' {input.k31} > {output.mod_k31};"
	"cat {output.mod_k25} {output.mod_k31} > {output.merged_mod};"
	"/usr/bin/time -v {cd_hit_est} -i {output.merged_mod} -o {output.final_cd_hit_est} -c 1 -n 11 -T {threads} -M 0 -d 0 -r 0 -g 1"
SnakeMake From line 430 of main/Snakefile
452
453
shell:
	"{extract_contigs} -f {input.transcriptome} -m 301 2> {log}"
SnakeMake From line 452 of main/Snakefile
471
472
shell:
	"/usr/bin/time -v run_BUSCO.py -i {input.transcriptome} -o {output.busco} -c {threads} -m transcriptome -l /Storage/databases/BUSCO_DBs/embryophyta_odb9/ 2> {log}"		
SnakeMake From line 471 of main/Snakefile
491
492
shell:
	"/usr/bin/time -v {transrate} --assembly {input.transcriptome} --reference {input.ref} --threads {threads} --output {output.transrate} 2> {log}"
SnakeMake From line 491 of main/Snakefile
511
512
shell:
	"/usr/bin/time -v {salmon} index -t {input.transcriptome} -p {threads} -i {output.salmon_index} --gencode 2> {log}"
SnakeMake From line 511 of main/Snakefile
539
540
shell:
	"/usr/bin/time -v {salmon} quant -i {input.salmon_index} -l A -1 {params.filtered_total_R1} {params.unclassified_total_R1} -2 {params.filtered_total_R2} {params.unclassified_total_R2} --validateMappings -o {output.salmon_quant} 2> {log}"
ShowHide 18 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/labbces/YAATAP
Name: yaatap
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: The Unlicense
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...