Phaeocystis Viral Elements Extraction Workflow for Polinton-like Virus Study

public public 1yr ago 0 bookmarks

Phaeocystis viral elements

The repository contains the bioinformatic workflow used for extraction of viral elements from Phaeocystis genomes for the paper Roitman et al (2023) "Infection cycle and phylogeny of a Polinton-like virus with a virophage lifestyle infecting Phaeocystis globosa ".

The workflow is built using snakemake . Dependencies are under the control of conda (see --use-conda ). Run as snakemake --cores 10 --use-conda .

Code Snippets

47
48
shell:
    "trim_galore -a ' {params.truseq} -a {params.nextera} -n 3' --cores {threads} --trim-n --paired --retain_unpaired --output_dir {params.outdir} --length {params.min_read} {input.r1} {input.r2}"
64
65
shell:
    "trim_galore -a ' {params.truseq} -a {params.nextera} -n 3' --cores {threads} --trim-n --length {params.min_read} --output_dir {params.outdir} {input.reads}"
74
75
shell:
    "seqkit faidx {input}"
150
151
shell:
    "megahit {params.reads} -f -o {output.out_dir} --k-list {params.k} --min-contig-len {params.min_contig_len} -t {threads} &> {log}"
170
171
shell:
    "bowtie2-build {input} {wildcards.prefix}"
183
184
shell:
    "cutadapt -o /dev/null --info-file /dev/stdout --quiet -b {params.old_linker} {input} | python workflow/utilities/info2fastq.py {params.new_linker} | gzip > {output}"
202
203
shell:
    "nxtrim -1 {input.r1} -2 {input.r2} -O {params.prefix} --separate --rf"
225
226
shell:
    "trim_galore -a ' {params.truseq} -a {params.nextera} -n 3' --cores {threads} --trim-n --paired --retain_unpaired --output_dir {params.outdir} --length {params.min_read} {input.r1} {input.r2}"
244
245
shell:
    "trim_galore -a ' {params.truseq} -a {params.nextera} -n 3' --cores {threads} --trim-n --output_dir {params.outdir} --length {params.min_read} {input}"
254
255
shell:
    "seqkit seq -rp {input} | gzip > {output}"
312
313
shell:
    "echo '{params.config}' > {output}"
328
329
shell:
    "SOAPdenovo-fusion -D -s {input.config} -p {threads} -K {wildcards.K} -g {params.prefix} -c {input.contigs}"
343
344
shell:
    "SOAPdenovo-127mer map -s {input.config} -p {threads} -g {params.prefix}"
358
359
shell:
    "SOAPdenovo-127mer scaff -p {threads} -g {params.prefix}"
372
373
shell:
    "getorf -minsize {params.minsize} -filter -sformat pearson {input.fasta} | hmmsearch -E {params.e_value} -o /dev/null --tblout {output} {input.hmm} -"
386
387
shell:
    "grep -hv '^#' {input.MCP} | cut -f1 -d' ' | sed 's/_[0-9]*$//' | sort -u | xargs seqkit faidx {input.fasta} | seqkit seq -m {params.min_len} -o {output}"
398
399
shell:
    "dust {input} {params.cutoff} > {output}"
418
419
shell:
    "bowtie2 --no-unal --threads {threads} --{params.mode} -x {input.fasta} -U {input.reads} 2> {log} | samtools sort -o {output}"
437
438
shell:
    "bowtie2 --threads {threads} --fr --{params.mode} -x {input.fasta} -1 {input.r1} -2 {input.r2} 2> {log} | awk '/^@/||!and($2,4)||!and($2,8)' | samtools sort -o {output}"
447
448
shell:
    "seqkit replace -p _pilon -o {output} {input}"
467
468
shell:
    "bowtie2 --no-unal --threads {threads} --{params.mode} -x {input.fasta} -U {input.reads} 2> {log} | samtools sort -o {output}"
486
487
shell:
    "bowtie2 --threads {threads} --fr --{params.mode} -x {input.fasta} -1 {input.r1} -2 {input.r2} 2> {log} | awk '/^@/||!and($2,4)||!and($2,8)' | samtools sort -o {output}"
496
497
shell:
    "samtools index {input}"
525
526
shell:
    "samtools cat {input} | samtools fastq | gzip > {output}"
567
568
shell:
    "pilon -Xmx{resources.mem_mb}M --genome {input.fasta} {params.bams} --outdir {output.outdir} --fix all"
609
610
shell:
    "pilon -Xmx{resources.mem_mb}M --genome {input.fasta} {params.bams} --outdir {output.outdir} --fix all"
630
631
632
633
634
shell:
    """
    docker run --user {params.user} --rm -v {params.basedir}:/app/mnt --workdir /app/mnt {params.container} \
        --threads {threads} --out {output.outdir} --min-overlap-length {params.min_overlap} --branch-limit {params.branch} {input.fasta} {input.reads}
    """
643
644
shell:
    "seqkit sort -l {input} | seqkit replace -p '_[0-9]+\\b' | seqkit rmdup | seqkit replace -sp [^ATGCatgc] -r N -o {output}"
655
656
shell:
    "seqkit seq -gM{params.lim} -o {output} {input}"
667
668
shell:
    "seqkit seq -gm{params.lim} -o {output} {input}"
675
676
677
678
679
680
681
682
683
684
685
686
shell:
    """
    echo "
        project = mira
        job = denovo,clustering,accurate
        parameters = --noclipping
        parameters = TEXT_SETTINGS -AS:epoq=no
        readgroup
        technology = text
        data = fna::{input}
    " > {output}
    """
703
704
shell:
    "mira -t {threads} {input.manifest} &> {log} && mv mira_assembly/* {params.outdir}/"
712
713
shell:
    "cat {input} > {output}"
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from sys import stdin, argv

new_adapt = argv[1]
new_adapt_len = len(new_adapt)

for line in stdin:
    read_name, error, *rest = line.rstrip('\n').split('\t')
    if int(error) < 0:
        read_seq, read_qual, *other = rest
        print('@%s\n%s\n+\n%s' % (read_name, read_seq, read_qual))
    else:
        start, end, seq_left, seq_adapt, seq_right, adapter_name, qual_left, qual_adapt, qual_right, *other = rest
        adapt_len = adapt_len_top = len(seq_adapt)
        if adapt_len_top > new_adapt_len:
            adapt_len_top = new_adapt_len
        start_offset = len(seq_left)
        end_offset = len(seq_right)
        if start_offset == 0:
            seq_adapt = new_adapt[-adapt_len_top:]
            qual_adapt = qual_adapt[-adapt_len_top:]
        elif end_offset == 0:
            seq_adapt = new_adapt[:adapt_len_top]
            qual_adapt = qual_adapt[:adapt_len_top]
        else:
            seq_adapt = new_adapt
            qual_adapt = qual_adapt[0:new_adapt_len] + qual_adapt[-1] * (new_adapt_len - adapt_len)
        print('@%s\n%s%s%s\n+\n%s%s%s' % (read_name, seq_left, seq_adapt, seq_right, qual_left, qual_adapt, qual_right))
ShowHide 26 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/BejaLab/phaeocystis-viral-elements
Name: phaeocystis-viral-elements
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...