Repository to store the analysis of the paper on phage evolution.

public public 1yr ago 0 bookmarks

A Snakemake workflow for Phage directed evolution analysis .

Code Snippets

11
12
shell:
    "NanoPlot -t {threads} --plots dot --fastq {input.reads} -o {output.plot_dir}"
27
28
29
shell:
    'seqkit seq {input.reads} -j {threads} -m {params.min} -M {params.max} | seqkit fq2fa | '
    "sed -r '/>/ s|>|>{wildcards.sample}#1#|' > {output.putative_phage_genomes} "
47
48
49
50
51
52
53
shell:
    "cat {input.putative_phage_genomes} | bgzip -@ {threads} >{output.all_genomes_merged} && "
    "samtools faidx {output.all_genomes_merged} && "
    "samtools faidx {output.all_genomes_merged} "
    "-r <(wfmash {input.target} {output.all_genomes_merged} -s {params.segment_length} -l {params.block_length} -p {params.map_pct_id} -t {threads} | "
    "awk -v min_qcov={params.min_qcov} '/E_coli/ {{ qcov=$11/$2; if ( !(qcov >= min_qcov) ) print $1; }}' | sort -u | tee {output.ids_to_keep} ) > "
    "{output.all_genomes_merged_filtered}"
72
73
74
75
76
77
shell:
    'exec &> >( tee {params.log_dir}/{rule}_{wildcards.experiment}_$(date +%Y_%m_%d_-_%H_%M_%S).log ) && '
    "awk -F$'\\t' '/^{wildcards.experiment}/ {{print $3}}' {input.codes}  | "
    'while read f; do grep -P "^${{f}}#" {input.ids_to_keep} | shuf -n {params.sample_size}; done | tee {output.sample_ids} && '
    "samtools faidx {input.all_genomes_merged_filtered} -r {output.sample_ids} | "
    'bgzip -@ {threads} > {output.pggb_input} '
 95
 96
 97
 98
 99
100
101
shell:
    'exec &> >( tee {params.log_dir}/{rule}_{wildcards.experiment}_$(date +%Y_%m_%d_-_%H_%M_%S).log ) && '
    'seqkit split -O {output.split_fastas} --by-id {input.pggb_input} && '
    'gunzip {output.split_fastas}/*.fa.gz && '
    "find {output.split_fastas} -name '*.fa' -exec readlink -f {{}} \; > {output.list_of_files} && "
    'fastANI -t {threads} --fragLen {params.frag_lenght} --ql {output.list_of_files} --rl {output.list_of_files} -o /dev/stdout  | '
    "perl -pe 's|/.*?id_||g;s|.fa||g' | awk -v OFS='\\t' '{{print $1,$2,$3}}' >{output.fastani_distance_matrix}"
118
119
120
shell:
    'python3 {input.script_fix_id} {input.fastani_distance_matrix} {input.codes} > {output.fastani_distance_matrix_id_fixed} && '
    'Rscript {input.script_phylogeny_fastani} {output.fastani_distance_matrix_id_fixed} {input.codes} {output.rectangular} {params.title}'
138
139
140
141
142
shell:
    'exec &> >( tee {params.log_dir}/{rule}_{wildcards.experiment}_$(date +%Y_%m_%d_-_%H_%M_%S).log ) && '
    'python3 {input.phanotate_runner} --input_file_list {input.list_of_files} '
    ' --threads {threads} --out_format {params.out_format} --output_dir {output.phanotate_dir} && '
    '>{output.finished} '
160
161
162
163
164
165
shell:
    'exec &> >( tee {params.log_dir}/{rule}_{wildcards.experiment}_$(date +%Y_%m_%d_-_%H_%M_%S).log ) && '
    'python3 {input.prokka_runner} --input_file_list {input.list_of_files} '
    '--output_dir {output.prokka_dir} --proteins {input.mmseqs_phrogs_db} '
    '--threads {threads} --prokka_threads 4 && '
    'touch {output.finished}'
177
178
179
shell:
    'exec &> >( tee {params.log_dir}/{rule}_$(date +%Y_%m_%d_-_%H_%M_%S).log ) && '
    'wget -O {output.phrogs_tar} {params.mmseqs_phrogs_url}'
198
199
200
201
202
203
shell:
    'exec &> >( tee {params.log_dir}/{rule}_$(date +%Y_%m_%d_-_%H_%M_%S).log ) && '
    'tar -xf {input.phrogs_tar} -C {params.phrogs_db_dir} && '
    'cat {params.mmseqs_multifasta_dir}/*.faa | python3 {input.format_phrogs_headers} '
    '> {params.phrogs_db_dir}/multifasta.faa && '
    'mmseqs easy-cluster {params.phrogs_db_dir}/multifasta.faa {params.phrogs_db_dir}/phrogs {params.phrogs_db_dir}/tmp --threads {threads}'
218
219
220
shell:
    'find {input.prokka_dir} -name "*.gff" -exec readlink -f {{}} \; > {output.list_of_gff_files} && '
    'panaroo -i {output.list_of_gff_files} -o {output.panaroo_dir} --clean-mode strict --threads {threads} '
235
236
237
shell:
    "n_mappings=$( zgrep -c '>' {input.pggb_input} ) && "
    "pggb -m -p {params.map_pct_id} -n $n_mappings -s {params.segment_length} -l {params.block_length} -k {params.min_match_len} -B {params.transclose_batch} -t {threads} -o {output.pggb_out} -i {input.pggb_input}"
ShowHide 7 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/pangenome/phage-evo-paper
Name: phage-evo-paper
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...