V-pipe (main multi-virus version) workflow designed for the analysis of (NGS) data from viral pathogens.

public public 1yr ago Version: v2.99.3 0 bookmarks

V-pipe is a workflow designed for the analysis of next generation sequencing (NGS) data from viral pathogens. It produces a number of results in a curated format (e.g., consensus sequences, SNV calls, local/global haplotypes). V-pipe is written using the Snakemake workflow management system.

Usage

Different ways of initializing V-pipe are presented below. We strongly encourage you to deploy it using the quick install script , as this is our preferred method.

To configure V-pipe refer to the documentation present in config/README.md .

V-pipe expects the input samples to be organized in a two-level directory hierarchy, and the sequencing reads must be provided in a sub-folder named raw_data . Further details can be found on the website . Check the utils subdirectory for mass-importers tools that can assist you in generating this hierarchy.

We provide virus-specific base configuration files which contain handy defaults for, e.g., HIV and SARS-CoV-2. Set the virus in the general section of the configuration file:

general: virus_base_config: hiv

Also see snakemake's documentation to learn more about the command-line options available when executing the workflow.

Tutorials introducing usage of V-pipe are available in the docs/ subdirectory.

Using quick install script

To deploy V-pipe, use the installation script with the following parameters:

curl -O 'https://raw.githubusercontent.com/cbg-ethz/V-pipe/master/utils/quick_install.sh' ./quick_install.sh -w work

This script will download and install miniconda, checkout the V-pipe git repository (use -b to specify which branch/tag) and setup a work directory (specified with -w ) with an executable script that will execute the workflow:

cd work # edit config.yaml and provide samples/ directory ./vpipe --jobs 4 --printshellcmds --dry-run

Using Docker

Note: the docker image is only setup with components to run the workflow for HIV and SARS-CoV-2 virus base configurations. Using V-pipe with other viruses or configurations might require internet connectivity for additional software components.

Create config.yaml or vpipe.config and then populate the samples/ directory. For example, the following config file could be used:

general: virus_base_config: hiv output: snv: true local: true global: false visualization: true QA: true

Then execute:

docker run --rm -it -v $PWD:/work ghcr.io/cbg-ethz/v-pipe:master --jobs 4 --printshellcmds --dry-run

Using Snakedeploy

First install mamba , then create and activate an environment with Snakemake and Snakedeploy:

mamba create -c bioconda -c conda-forge --name snakemake snakemake snakedeploy conda activate snakemake

Snakemake's official workflow installer Snakedeploy can now be used:

snakedeploy deploy-workflow https://github.com/cbg-ethz/V-pipe --tag master . # edit config/config.yaml and provide samples/ directory snakemake --use-conda --jobs 4 --printshellcmds --dry-run

Code Snippets

 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
shell:
    """
    CONSENSUS_NAME={wildcards.dataset}
    CONSENSUS_NAME="${{CONSENSUS_NAME#*/}}"
    CONSENSUS_NAME="${{CONSENSUS_NAME//\//-}}"

    source {params.FUNCTIONS}

    ERRFILE=$(basename {log.errfile})
    OUTFILE=$(basename {log.outfile})

    # 1. copy initial reference for bwa
    rm -rf {params.WORK_DIR}/
    mkdir -p {params.WORK_DIR}/
    cp {input.global_ref} {params.WORK_DIR}/consensus.fasta
    cd {params.WORK_DIR}

    # 2. create bwa index
    {params.BWA} index consensus.fasta 2> >(tee $ERRFILE >&2)

    # 3. create initial alignment
    if [[ {params.PAIRED_BOOL} == "true" ]]; then
        {params.BWA} mem -t {threads} consensus.fasta ../preprocessed_data/R{{1,2}}.fastq > first_aln.sam 2> >(tee -a $ERRFILE >&2)
    else
        {params.BWA} mem -t {threads} consensus.fasta ../preprocessed_data/R1.fastq > first_aln.sam 2> >(tee -a $ERRFILE >&2)
    fi
    rm consensus.fasta.*

    # 4. remove unmapped reads
    {params.SAMTOOLS} view -b -F 4 first_aln.sam > mapped.bam 2> >(tee -a $ERRFILE >&2)
    rm first_aln.sam

    # 5. extract reads
    mkdir -p cleaned
    SamToFastq {params.PICARD} I=mapped.bam FASTQ=cleaned/R1.fastq {params.PAIRED} VALIDATION_STRINGENCY=SILENT 2> >(tee -a $ERRFILE >&2)
    rm mapped.bam

    # 6. create config file
    # NOTE: Tabs are required below
    if [[ {params.PAIRED_BOOL} == "true" ]]; then
        cat > vicuna_config.txt <<- _EOF_
            minMSize    9
            maxOverhangSize    2
            Divergence    8
            max_read_overhang    2
            max_contig_overhang    10
            pFqDir    cleaned/
            batchSize    100000
            LibSizeLowerBound    100
            LibSizeUpperBound    800
            min_output_contig_len    1000
            outputDIR    ./
        _EOF_
    else
        cat > vicuna_config.txt <<- _EOF_
            minMSize    9
            maxOverhangSize    2
            Divergence    8
            max_read_overhang    2
            max_contig_overhang    10
            npFqDir    cleaned/
            batchSize    100000
            min_output_contig_len    1000
            outputDIR    ./
        _EOF_
    fi

    # 7. VICUNA
    OMP_NUM_THREADS={threads} {params.VICUNA} vicuna_config.txt > $OUTFILE 2> >(tee -a $ERRFILE >&2)
    rm vicuna_config.txt
    rm -r cleaned/

    # 8. fix broken header
    sed -e 's:>dg-\([[:digit:]]\+\)\s.*:>dg-\1:g' contig.fasta > contig_clean.fasta

    # 9. InDelFixer + ConsensusFixer to polish up consensus
    for i in {{1..3}}
    do
            mv consensus.fasta old_consensus.fasta
            indelFixer {params.INDELFIXER} -i contig_clean.fasta -g old_consensus.fasta >> $OUTFILE 2> >(tee -a $ERRFILE >&2)
            sam2bam {params.SAMTOOLS} reads.sam >> $OUTFILE 2> >(tee $ERRFILE >&2)
            consensusFixer {params.CONSENSUSFIXER} -i reads.bam -r old_consensus.fasta -mcc 1 -mic 1 -d -pluralityN 0.01 >> $OUTFILE 2> >(tee $ERRFILE >&2)
    done

    sed -i -e "s/>.*/>${{CONSENSUS_NAME}}/" consensus.fasta
    echo "" >> consensus.fasta

    # 10. finally, move into place
    mkdir -p ../references
    mv {{,../references/vicuna_}}consensus.fasta
    """
158
159
160
161
162
163
164
165
shell:
    """
    cat {input} > initial_ALL.fasta
    {params.MAFFT} --nuc --preservecase --maxiterate 1000 --localpair --thread {threads} initial_ALL.fasta > references/initial_aln.fasta 2> >(tee {log.errfile} >&2)
    rm initial_ALL.fasta

    {params.REMOVE_GAPS} references/initial_aln.fasta -o {output} -p 0.5 > {log.outfile} 2> >(tee -a {log.errfile} >&2)
    """
SnakeMake From line 158 of rules/align.smk
181
182
183
184
185
186
187
188
189
shell:
    """
    CONSENSUS_NAME={wildcards.dataset}
    CONSENSUS_NAME="${{CONSENSUS_NAME#*/}}"
    CONSENSUS_NAME="${{CONSENSUS_NAME//\//-}}"

    mkdir -p {wildcards.dataset}/references/
    {params.EXTRACT_SEQ} {input} -o {output} -s "${{CONSENSUS_NAME}}"
    """
SnakeMake From line 181 of rules/align.smk
201
202
203
204
205
206
207
208
209
210
shell:
    """
    CONSENSUS_NAME={wildcards.dataset}
    CONSENSUS_NAME="${{CONSENSUS_NAME#*/}}"
    CONSENSUS_NAME="${{CONSENSUS_NAME//\//-}}"

    mkdir -p {wildcards.dataset}/references/
    cp {input} {output}
    sed -i -e "s/>.*/>${{CONSENSUS_NAME}}/" {output}
    """
SnakeMake From line 201 of rules/align.smk
222
223
224
225
226
227
228
229
230
231
shell:
    """
    CONSENSUS_NAME={wildcards.dataset}
    CONSENSUS_NAME="${{CONSENSUS_NAME#*/}}"
    CONSENSUS_NAME="${{CONSENSUS_NAME//\//-}}"

    mkdir -p {wildcards.dataset}/references/
    cp {input} {output}
    sed -i -e "s/>.*/>${{CONSENSUS_NAME}}/" {output}
    """
SnakeMake From line 222 of rules/align.smk
281
282
283
284
shell:
    """
    {params.GUNZIP} -c {input} > {output} 2> >(tee {log.errfile} >&2)
    """
SnakeMake From line 281 of rules/align.smk
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
shell:
    """
    CONSENSUS_NAME={wildcards.dataset}
    CONSENSUS_NAME="${{CONSENSUS_NAME#*/}}"
    CONSENSUS_NAME="${{CONSENSUS_NAME//\//-}}"

    # 1. clean previous run
    rm -rf   {wildcards.dataset}/alignments
    rm -f    {wildcards.dataset}/references/ref_ambig.fasta
    rm -f    {wildcards.dataset}/references/ref_majority.fasta
    mkdir -p {wildcards.dataset}/alignments
    mkdir -p {wildcards.dataset}/references

    # 2. perform alignment # -l = leave temps
    {params.NGSHMMALIGN} -v {params.EXTRA} -R {input.initial_ref} -o {output.good_aln} -w {output.reject_aln} -t {threads} -N "${{CONSENSUS_NAME}}" {params.LEAVE_TEMP} {input.FASTQ} > {log.outfile} 2> >(tee {log.errfile} >&2)

    # 3. move references into place
    mv {wildcards.dataset}/{{alignments,references}}/ref_ambig.fasta
    mv {wildcards.dataset}/{{alignments,references}}/ref_majority.fasta
    """
SnakeMake From line 314 of rules/align.smk
365
366
367
368
369
370
shell:
    """
    cat {input} > ALL_{wildcards.kind}.fasta
    {params.MAFFT} --nuc --preservecase --maxiterate 1000 --localpair --thread {threads} ALL_{wildcards.kind}.fasta > {output} 2> >(tee {log.errfile} >&2)
    rm ALL_{wildcards.kind}.fasta
    """
SnakeMake From line 365 of rules/align.smk
407
408
409
410
shell:
    """
    {params.CONVERT_REFERENCE} -t {params.REF_NAME} -m {input.REF_ambig} -i {input.BAM} -o {output} > {log.outfile} 2> >(tee {log.errfile} >&2)
    """
SnakeMake From line 407 of rules/align.smk
441
442
443
444
445
446
447
shell:
    """
    echo "Writing BAM file"
    rm -f '{params.sort_tmp}'.[0-9]*.bam
    {params.SAMTOOLS} sort -T "{params.sort_tmp}" -o "{output.BAM}" "{input}" > {log.outfile} 2> >(tee {log.errfile} >&2)
    {params.SAMTOOLS} index "{output.BAM}" >> {log.outfile} 2> >(tee -a {log.errfile} >&2)
    """
SnakeMake From line 441 of rules/align.smk
475
476
477
478
shell:
    """
    {params.BWA} index {input} 2> >(tee {log.errfile} >&2)
    """
SnakeMake From line 475 of rules/align.smk
521
522
523
524
525
526
shell:
    """
    {params.BWA} mem -t {threads} {params.EXTRA} -o "{output.TMP_SAM}" "{input.REF}" {input.FASTQ} 2> >(tee {log.errfile} >&2)
    # Filter alignments: (1) remove unmapped reads (single-end) or keep only reads mapped in proper pairs (paired-end), (2) remove supplementary aligments
    {params.SAMTOOLS} view -h {params.FILTER} -F 2048 -o "{output.REF}" "{output.TMP_SAM}" 2> >(tee -a {log.errfile} >&2)
    """
SnakeMake From line 521 of rules/align.smk
551
552
553
554
shell:
    """
    {params.BOWTIE} {input} {input} 2> >(tee {log.errfile} >&2)
    """
SnakeMake From line 551 of rules/align.smk
588
589
590
591
592
593
shell:
    """
    {params.BOWTIE} -x {input.REF} -1 {input.R1} -2 {input.R2} {params.PHRED} {params.PRESET} -X {params.MAXINS} {params.EXTRA} -p {threads} -S {output.TMP_SAM} 2> >(tee {log.errfile} >&2)
    # Filter alignments: (1) keep only reads mapped in proper pairs, and (2) remove supplementary aligments
    {params.SAMTOOLS} view -h -f 2 -F 2048 -o "{output.REF}" "{output.TMP_SAM}" 2> >(tee -a {log.errfile} >&2)
    """
SnakeMake From line 588 of rules/align.smk
625
626
627
628
629
630
shell:
    """
    {params.BOWTIE} -x {input.REF} -U {input.R1} {params.PHRED} {params.PRESET} {params.EXTRA} -p {threads} -S {output.TMP_SAM} 2> >(tee {log.errfile} >&2)
    # Filter alignments: (1) remove unmapped reads, and (2) remove supplementary aligments
    {params.SAMTOOLS} view -h -F 4 -F 2048 -o "{output.REF}" "{output.TMP_SAM} 2> >(tee -a {log.errfile} >&2)
    """
SnakeMake From line 625 of rules/align.smk
11
12
13
14
shell:
    """
    rm -rf {params.DIR}/*/*/extracted_data
    """
20
21
22
23
shell:
    """
    rm -rf {params.DIR}/*/*/preprocessed_data
    """
29
30
31
32
33
34
35
36
37
shell:
    """
    rm -rf {params.DIR}/*/*/initial_consensus
    rm -rf {params.DIR}/*/*/references/vicuna_consensus.fasta
    rm -rf {params.DIR}/*/*/references/initial_consensus.fasta
    rm -rf references/initial_aln.fasta
    rm -rf references/initial_aln_gap_removed.fasta
    rm -rf references/MAFFT_initial_aln.*
    """
41
42
43
44
45
shell:
    """
    rm -rf references/ALL_aln_*.fasta
    rm -rf references/MAFFT_*_cohort.*
    """
51
52
53
54
55
56
57
58
shell:
    """
    rm -rf {params.DIR}/*/*/alignments
    rm -rf {params.DIR}/*/*/QA_alignments
    rm -rf {params.DIR}/*/*/references/ref_ambig.fasta
    rm -rf {params.DIR}/*/*/references/ref_majority.fasta
    rm -rf {params.DIR}/*/*/references/initial_consensus.fasta
    """
66
67
68
69
70
71
72
shell:
    """
    rm -f {input}
    rm -rf {params.DIR}/*/*/alignments
    rm -rf {params.DIR}/*/*/references/ref_ambig*.fasta
    rm -rf {params.DIR}/*/*/references/ref_majority*.fasta
    """
80
81
82
83
84
85
86
shell:
    """
    rm -f {input}
    rm -rf {params.DIR}/*/*/alignments
    rm -rf {params.DIR}/*/*/references/ref_ambig*.fasta
    rm -rf {params.DIR}/*/*/references/ref_majority*.fasta
    """
92
93
94
95
shell:
    """
    rm -rf {params.DIR}/*/*/variants/SNVs
    """
101
102
103
104
105
shell:
    """
    rm -rf {params.DIR}/*/*/variants/global/contigs_stage_?.fasta
    rm -rf {params.DIR}/*/*/variants/global/stage_?
    """
SnakeMake From line 101 of rules/clean.smk
111
112
113
114
shell:
    """
    rm {params.DIR}/*/*/variants/global/quasispecies.*
    """
SnakeMake From line 111 of rules/clean.smk
120
121
122
123
shell:
    """
    rm -rf {params.DIR}/*/*/variants/global/predicthaplo/
    """
SnakeMake From line 120 of rules/clean.smk
129
130
131
132
shell:
    """
    rm -rf {params.DIR}/*/*/visualization
    """
SnakeMake From line 129 of rules/clean.smk
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
shell:
    """
    {params.bcftools} mpileup \
        --threads {threads} \
        -Ou \
        -f {input.fname_ref} \
        --max-depth {params.max_coverage} \
        --max-idepth {params.max_coverage} \
        --annotate FORMAT/AD,FORMAT/DP,INFO/AD \
        {input.fname_bam} \
    | {params.bcftools} call \
        --threads {threads} \
        -Ou \
        -mv \
        --keep-alts \
    | {params.bcftools} norm \
        --threads {threads} \
        -Ou \
        -f {input.fname_ref} \
    | {params.bcftools} filter \
        --threads {threads} \
        -e 'TYPE="INDEL" & INFO/AD[1]<INFO/AD[0]' \
        -Ob \
        --output {output.fname_temp_bcf} \
        2> >(tee {log.errfile} >&2)
    #bcftools csq -f {input.fname_ref} -g wheretogetthis.gff3.gz in.vcf -Ob -o out.bcf

    {params.gunzip} -c {input.fname_cov} | tail -n +2 \
    | awk -v base={params.tsvbased} \
        '$3 < {params.mask_coverage_threshold} {{printf "%s\\t%d\\t%d\\n", $1, $2 - base, $2 - base + 1}}' \
    > {output.fname_mask_lowcoverage} \
        2> >(tee -a {log.errfile} >&2)

    # preparations
    {params.enhance_bcf} \
       {output.fname_temp_bcf} \
       {output.fname_bcf} \
       {params.ambiguous_base_coverage_threshold} \
       2> >(tee -a {log.errfile} >&2)

    {params.bcftools} index {output.fname_bcf} 2> >(tee -a {log.errfile} >&2)

    common_consensus_params="--fasta-ref {input.fname_ref} --mark-del - --mask {output.fname_mask_lowcoverage} --mask-with n"

    # majority bases
    {params.bcftools} consensus \
        --output {output.fname_fasta} \
        --chain {output.fname_chain} \
        $common_consensus_params \
        -H A \
        -i "INFO/AD[0]<INFO/AD[*]" \
        {output.fname_bcf} \
        2> >(tee -a {log.errfile} >&2)

    # ambiguous bases
    {params.bcftools} consensus \
        --output {output.fname_fasta_ambig} \
        --chain {output.fname_chain_ambig} \
        $common_consensus_params \
        -H I --iupac-codes \
        {output.fname_bcf} \
        2> >(tee -a {log.errfile} >&2)
    """
140
141
142
143
144
145
146
147
shell:
    """
    CONSENSUS_NAME={wildcards.dataset}
    CONSENSUS_NAME="${{CONSENSUS_NAME#*/}}"
    CONSENSUS_NAME="${{CONSENSUS_NAME//\//-}}"

    {params.EXTRACT_CONSENSUS} -i {input.BAM} -f {input.REF} -c {params.MIN_COVERAGE} -n {params.N_COVERAGE} -q {params.QUAL_THRD} -a {params.MIN_FREQ} -N "${{CONSENSUS_NAME}}" -o {params.OUTDIR} > {log.outfile} 2> >(tee -a {log.errfile} >&2)
    """
172
173
174
175
176
177
178
179
180
shell:
    """
    if tail -n +2 {input.REF_majority_dels} | grep -qE '[^n]'; then
        {params.MATCHER} -asequence {input.REF} -bsequence {input.REF_majority_dels} -outfile {output.REF_matcher} 2> >(tee {log.errfile} >&2)
    else
        touch {output.REF_matcher}
        echo "pure 'nnnn...' consensus, no possible alignement" | tee {log.outfile}
    fi
    """
210
211
212
213
shell:
    """
    {params.FRAMESHIFT_DEL_CHECKS} -i {input.BAM} -c {input.REF_majority_dels} -f {input.REF_NAME} -g {input.GENES_GFF} --english=true -o {output.FRAMESHIFT_DEL_CHECK_TSV} 2> >(tee {log.errfile} >&2)
    """
34
35
36
37
shell:
    """
    {params.ALN2BASECNT} --first "{params.ARRAYBASED}" --basecnt "{output.BASECNT}" --coverage "{output.COVERAGE}" --name "{params.NAME}" --stats "{output.STATS}" "{input.BAM}" > {log.outfile} 2> >(tee {log.errfile} >&2)
    """
SnakeMake From line 34 of rules/mafs.smk
56
57
58
run:
    with open(output[0], "w") as out:
        out.write("\n".join(input))
SnakeMake From line 56 of rules/mafs.smk
101
102
103
104
shell:
    """
    {params.GATHER_COVERAGE} --output {output.COVERAGE} --stats {output.COVSTATS} --threads {threads} @{input.COVLIST} > >(tee {log.outfile}) 2> >(tee {log.errfile} >&2)
    """
SnakeMake From line 101 of rules/mafs.smk
155
156
157
158
shell:
    """
    {params.MINORITY_CALLER} -r {input.REF} -c {params.MIN_COVERAGE} -N {params.NAMES} -t {threads} -o {params.OUTDIR} {params.FREQUENCIES} {input.BAM} > >(tee {log.outfile}) 2> >(tee {log.errfile} >&2)
    """
SnakeMake From line 155 of rules/mafs.smk
30
31
32
33
shell:
    """
    {params.GUNZIP} -c {input} > {output}
    """
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
shell:
    """
    echo "The length cutoff is: {params.LEN_CUTOFF}" > {log.outfile}

    {params.PRINSEQ} -fastq {input.R1} -fastq2 {input.R2} {params.EXTRA} -out_format 3 -out_good {wildcards.dataset}/preprocessed_data/R -out_bad null -min_len {params.LEN_CUTOFF} -log {log.outfile} 2> >(tee {log.errfile} >&2)

    # make sure that the lock held prinseq has been effectively released and propagated
    # on some networked shares this could otherwise lead to confusion or corruption
    if [[ "$OSTYPE" =~ ^linux ]]; then
        echo "Waiting for unlocks" >&2
        for U in {wildcards.dataset}/preprocessed_data/R_{{1,2}}.fastq; do
            flock -x -o ${{U}} -c "echo ${{U}} unlocked >&2"
        done
    fi

    mv {wildcards.dataset}/preprocessed_data/R{{_,}}1.fastq
    mv {wildcards.dataset}/preprocessed_data/R{{_,}}2.fastq
    rm -f {wildcards.dataset}/preprocessed_data/R_?_singletons.fastq

    gzip {wildcards.dataset}/preprocessed_data/R1.fastq
    gzip {wildcards.dataset}/preprocessed_data/R2.fastq
    """
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
shell:
    """
    echo "The length cutoff is: {params.LEN_CUTOFF}" > {log.outfile}

    {params.PRINSEQ} -fastq {input.R1} {params.EXTRA} -out_format 3 -out_good {wildcards.dataset}/preprocessed_data/R -out_bad null -min_len {params.LEN_CUTOFF} {params.EXTRA} -log {log.outfile} 2> >(tee {log.errfile} >&2)

    # make sure that the lock held prinseq has been effectively released and propagated
    # on some network shares this could otherwise lead to confusion or corruption
    if [[ "$OSTYPE" =~ ^linux ]]; then
        echo "Waiting for unlocks" >&2
        for U in {wildcards.dataset}/preprocessed_data/R_{{1,2}}.fastq; do
            flock -x -o ${{U}} -c "echo ${{U}} unlocked >&2"
        done
    fi

    mv {wildcards.dataset}/preprocessed_data/R{{,1}}.fastq

    gzip {wildcards.dataset}/preprocessed_data/R1.fastq
    """
187
188
189
190
shell:
    """
    {params.FASTQC} -o {params.OUTDIR} -t {threads} {params.NOGROUP} {input} 2> >(tee {log.errfile} >&2)
    """
42
43
44
45
shell:
    """
    {params.EXTRACT_COVERAGE_INTERVALS} -b {params.ARRAYBASED} -cf {input.TSV} -c {params.COVERAGE} --no-shorah -N {params.NAMES} -o {output} {input.BAM} > >(tee {log.outfile}) 2>&1
    """
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
shell:
    """
    SAMPLE_ID="{params.ID}"

    # Number of input reads
    LINECOUNT=$( cat {input.R1} | wc -l )
    let "INPUT=LINECOUNT / {params.FACTOR}"

    # Number of reads after QC
    # For portability reason not using zcat
    {params.GUNZIP} -c {input.R1_QC} > {params.R1_temp}
    LINECOUNT=$( cat {params.R1_temp} | wc -l )
    let "READCOUNT=LINECOUNT / {params.FACTOR}"
    rm {params.R1_temp}

    # Number of aligned reads
    ALNCOUNT=$( {params.SAMTOOLS} view {input.BAM} | wc -l )

    echo -e "${{SAMPLE_ID}}\t${{INPUT}}\t${{READCOUNT}}\t${{ALNCOUNT}}" > {output}
    """
91
92
93
94
95
96
shell:
    """
    cat {input} > stats/temp
    echo -e "ID\tInput\tQC\tAlignments" | cat - stats/temp > {output}
    rm stats/temp
    """
ShowHide 40 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/cbg-ethz/V-pipe.git
Name: v-pipe-main-multi-virus-version
Version: v2.99.3
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...