Workflow Steps and Code Snippets

176 tagged steps and code snippets that match keyword seqtk

Snakemake-based workflow for the assembly of chloroplast genomes

 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
shell:
    """
    if [ -s {input.genome} ]
    then
        mkdir -p {params.dir}
        for contig in `grep '^>' {input.genome} | sed -e 's/>//g'`
        do
            echo $contig > {params.dir}/tmp
            seqtk subseq {input.genome} {params.dir}/tmp > {params.dir}/$contig.fasta

            nucmer --maxmatch {input.index} {params.dir}/$contig.fasta -p {params.dir}/out
            show-coords -THrd {params.dir}/out.delta > {params.dir}/out.coords
            start=`sort -k6,6hr {params.dir}/out.coords | head -n 1| cut -f3`
            echo ">$contig" >> {output}
            echo "$start XXX"
            if [ $start == 1 ]
            then
                grep -v '^>' {params.dir}/$contig.fasta | tr -d '\n' >> {output}
                echo "" >> {output} 
            elif [ ! -z $start ]
            then
                grep -v '^>' {params.dir}/$contig.fasta | tr -d '\n' > {params.dir}/temp.fasta
                cut -c ${{start}}- {params.dir}/temp.fasta > {params.dir}/start.fasta
                cut -c -$[start-1] {params.dir}/temp.fasta > {params.dir}/end.fasta
                cat {params.dir}/start.fasta {params.dir}/end.fasta | tr -d '\n' >> {output}
                echo "" >> {output}
            else
                grep -v '^>' {params.dir}/$contig.fasta | tr -d '\n' >> {output}
                echo "" >> {output}
            fi
            rm -rf {params.dir}/*
        done
        rm -rf {params.dir}
    else
        touch {output}
    fi
    """
332
333
334
335
336
shell:
    """
    echo {params.random_seed}
    seqtk sample -s {params.random_seed} {input} {config[number_reads]} > {output}
    """
355
356
357
358
359
360
361
362
shell:
    """
    samtools view {input.bam} | cut -f1 | sort | uniq > {output.list}
    seqtk subseq {input.fastFile} {output.list} \
        | bioawk -c fastx \
            'length($seq) > {config[read_min_length]} && length($seq) < {config[chloroplast_size]} \
            {{print \">\"$name\"\\n\"$seq}}' > {output.fastFile}
    """
405
406
407
408
409
410
shell:
    """
    awk 'NR == 1 {{print substr($1,2,length($1)), \"0\", \"10000\"}}' {input} > chloro_assembly/reference/index.bed
    seqtk subseq {input} chloro_assembly/reference/index.bed > {output}
    rm chloro_assembly/reference/index.bed
    """

A repository to conduct experiments with omnitig-related models for genome assembly. (v0.4.3)

2177
shell:  "${{CONDA_PREFIX}}/bin/time -v seqtk seq -AU '{input.reads}' > '{output.reads}'"

A pipeline for lightweight screening of Eukaryotic genomes and transcriptomes for recent HGT (v1.0.0)

262
263
264
shell:'''
seqtk subseq {input.fa} {input.gene_lst} > {output}
'''

Modular Shotgun Sequence Analysis Workflow: Oecophylla - Harnessing Snakemake

334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
run:
    aln2ext = {'utree': 'tsv', 'burst': 'b6', 'bowtie2': 'sam'}
    ext = aln2ext[params['aligner']]
    with tempfile.TemporaryDirectory(dir=find_local_scratch(TMP_DIR_ROOT)) as temp_dir:
        shell("""
              set +u; {params.env}; set -u

              # get stem file path
              stem={output.profile}
              stem=${{stem%.profile.txt}}

              # interleave paired fastq's and convert to fasta
              seqtk mergepe {input.forward} {input.reverse} | \
              seqtk seq -A > {temp_dir}/{wildcards.sample}.fna

              # map reads to reference database
              shogun align \
              --aligner {params.aligner} \
              --threads {threads} \
              --database {params.db} \
              --input {temp_dir}/{wildcards.sample}.fna \
              --output {temp_dir} \
              2> {log} 1>&2

              # build taxonomic profile based on read map
              shogun assign_taxonomy \
              --aligner {params.aligner} \
              --database {params.db} \
              --input {temp_dir}/alignment.{params.aligner}.{ext} \
              --output {output.profile} \
              2> {log} 1>&2

              # keep mapping file
              if [[ "{params.map}" == "True" ]]
              then
                gzip -c {temp_dir}/alignment.{params.aligner}.{ext} > $stem.{params.aligner}.{ext}.gz
              fi

              # redistribute reads to given taxonomic ranks
              if [[ ! -z {params.levels} ]]
              then
                IFS=',' read -r -a levels <<< "{params.levels}"
                for level in "${{levels[@]}}"
                do
                  shogun redistribute \
                  --database {params.db} \
                  --level $level \
                  --input {output.profile} \
                  --output $stem.redist.$level.txt \
                  2> {log} 1>&2
                done
              fi
              """)

HIV Drug Resistance Profiling Pipeline using Bowtie2, Lofreq, and SierraPy

12
13
14
15
16
shell:
    """
    seqtk trimfq {input.reads1} > {output.trim1}
    seqtk trimfq {input.reads2} > {output.trim2}
    """
32
33
34
35
36
shell:
    """
    seqtk sample -s {params.seed} {input.trim1} {params.n} > {output.sub1}
    seqtk sample -s {params.seed} {input.trim2} {params.n} > {output.sub2}
    """

Repository for the Microbiology Resource Announcements paper on five complete Streptococcus suis genomes

2
seqtk comp $1 | awk '{gc += ($4 + $5)} {at += ($3 + $6)} END {print gc/(gc + at)}'
tool / biotools

seqtk

A tool for processing sequences in the FASTA or FASTQ format. It parses both FASTA and FASTQ files which can also be optionally compressed by gzip.