Snakemake Pipeline for processing BioMob WP2 partial genome sequencing data

public public 1yr ago Version: v0.6 0 bookmarks

Pipeline for processing Illumina sequencing data generated by target enrichment via hybrid capture experiments. Heavily follows the Phyluce methodology outlined in Tutorial I: UCE Phylogenomics .

  1. Trims Illumina adapters and merges reads together BBDuk, BBMerge

  2. Assembles trimmed and merged reads Abyss , SPAdes, rnaSPAdes

  3. Detects and extracts target contigs Phyluce

  4. Summary statistics on targets and assemblies BBTools Stats

  5. Optional scripts and starting points to perform phylogenic inference

Prerequisites

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
conda install -c bioconda -c conda-forge snakemake
  • Git

Getting Started

Within a working directory:

git clone https://github.com/AAFC-BICoE/snakemake-partial-genome-pipeline.git .
  • Create a folder named "fastq" that contains Illumina based raw reads in fastq.gz format. Fastq files should not begin with numbers, or contain a mix of "_" and "-" characters.

  • Create a folder named "probes" that contains a probe fasta file with fasta headers in Phyluce UCE format

>uce-1_p1
GCTGGTTATC...
>uce-1_p2
TAACAATA....
>uce-2_p1
AAGCATCT...

Dry-run to see if everything is prepared correctly

snakemake --use-conda -n

To run pipeline with 32 cores and continue if some samples fail:

snakemake --use-conda -k --cores 32

To save time on future runs, a central folder of conda environments can be called so they don't need to be repeatedly rebuilt. There is a path length limit to this feature so ensure the central folder is located in the home directory

snakemake --use-conda --conda-prefix <Path To Snakemake Conda Envs> --cores 32

Pipeline Overview

Alt text

Pipeline Summary

This pipeline was heavily inspired by and closely followed protocols developed by Dr. Brant Faircloth and prescribed in Tutorial I: UCE Phylogenomics . Software versions employed and specific parameters and commands are available in the Conda yml environment files and the Snakefile respectively.

Illumina paired end reads from target enrichment sequencing are trimmed of adaptors using BBDuk. A copy of the trimmed fastq reads are merged using BBMerge. The unmerged reads are assembled using SPAdes, rnaSPAdes and Abyss. Merging paired end reads prior to assembly with Abyss demonstrated a noticeable impact on the number of detected targets when using Phyluce. Merging reads had neglible impact with SPAdes and rnaSPAdes. Therefore the merged reads were assembled using Abyss.

Phyluce, along with the corresponding probe set used in the target enrichment experiment is used to process each assembly independently. This generates four separate Phyluce databases of probe hits and UCE target contigs. Due to the heavy variation in target detection depending on assembly method, we opted to combine all detected targets into a unique set per sample. The custom script merge_uces.py examines each sample, and all detected UCEs across the four assemblies. It combines all targets, and keeps only the longest of any targets found in multiple assemblies. This unique set of merged targets dramatically increases the amount of data available for Phylogeny. However, the unadulterated assemblies are available for processing if required.

The merged targets are concatenated into a single file which is a substitute of the Phyluce generated all-taxa-incomplete.fasta file that is the entry point for the Phyluce phylogeny workflow. A rapid phylogeny is generated for quality control examination. Example commands are provided in the script phylogeny.sh. Phyluce aligns all UCE targets using Mafft, trims the alignments using Gblocks, and removes any targets not present in 50% or more of samples. The generated phylip file serves as the entry point for RAxML or IQTree which produces a rapid phylogeny for the purposes of quality control and detecting sample or sequencing errors.

Author

Jackson Eyres
Bioinformatics Programmer
Agriculture & Agri-Food Canada
[email protected]

Copyright

Government of Canada, Agriculture & Agri-Food Canada

License

This project is licensed under the MIT License - see the LICENSE file for details

Publications & Additional Resources

  1. Brunke, A J., Hansen, A. K., Salnitska, M., Kypke, J. L., Escalona, H., Chapados, J.T., Eyres, J., Richter, R., Smetana, A., Ślipiński, A., Zwick, A., Hájek, J., Leschen, R., Solodovnikov, A. and Dettman, J.R. The limits of Quediini at last (Coleoptera: Staphylinidae: Staphylininae): a rove beetle mega-radiation resolved by comprehensive sampling and anchored phylogenomics. Systematic Entomology. Accepted. 1–36.

  2. Dr. Adam Brunke provides some further custom phylogeny instructions

  3. Douglas HB, Kundrata R, Brunke AJ, Escalona HE, Chapados JT, Eyres J, Richter R, Savard K, Ślipiński A, McKenna D, Dettman JR. Anchored Phylogenomics, Evolution and Systematics of Elateridae: Are All Bioluminescent Elateroidea Derived Click Beetles? Biology. 2021; 10(6):451. https://doi.org/10.3390/biology10060451

  4. Hai D. T. Nguyen, Wayne McCormick, Jackson Eyres, Quinn Eggertson, Sarah Hambleton & Jeremy R. Dettman (2021) Development and evaluation of a target enrichment bait set for phylogenetic analysis of oomycetes, Mycologia, 113:4, 856-867, DOI: https://doi.org/10.1080/00275514.2021.1889276

Known Issues

  • Fastq files that start with numbers fail with Phyluce

  • rnaSPAdes 3.13.1 sometimes with randomly fails to generate a transcripts.fasta on a sample after completing K127. A workaround is to choose one of the K*** assemblies, and copy and rename it to transcripts.fasta in the higher level directory. Snakemake requires a transcripts.fasta for each rnaspades assembly to progress to Phyluce.

  • AAFC Specific Due to an incorrect and challenging to fix server wide implementation of OpenMPI, qsub commands should be run with "qsub -pe smp 1" which prevents abyss from starting in parallel mode and crashing. However Spades and rnaSPAdes appear to still use multiple cores as assigned via snakemake jobs

Citations

  • BioPython - Tools for biological computation
    Cock, P.J.A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009 Jun 1; 25(11) 1422-3 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878

  • Snakemake - Workflow management system
    Köster, Johannes and Rahmann, Sven. “Snakemake - A scalable bioinformatics workflow engine”. Bioinformatics 2012.

  • SPAdes
    Nurk S. et al. (2013) Assembling Genomes and Mini-metagenomes from Highly Chimeric Reads. In: Deng M., Jiang R., Sun F., Zhang X. (eds) Research in Computational Molecular Biology. RECOMB 2013. Lecture Notes in Computer Science, vol 7821. Springer, Berlin, Heidelberg

  • BBTools
    Brian-JGI (2018) BBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data.

  • FASTQC
    Andrews S. (2018). FastQC: a quality control tool for high throughput sequence data. Available online at:

  • Phyluce - Target enrichment data analysis
    Faircloth BC. 2016. PHYLUCE is a software package for the analysis of conserved genomic loci. Bioinformatics 32:786-788. doi:10.1093/bioinformatics/btv646.

  • Ultraconserved elements BC Faircloth, McCormack JE, Crawford NG, Harvey MG, Brumfield RT, Glenn TC. 2012. Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. Systematic Biology 61: 717–726. doi:10.1093/sysbio/SYS004.

  • Abyss
    Shaun D Jackman, Benjamin P Vandervalk, Hamid Mohamadi, Justin Chu, Sarah Yeo, S Austin Hammond, Golnaz Jahesh, Hamza Khan, Lauren Coombe, René L Warren, and Inanc Birol (2017). ABySS 2.0: Resource-efficient assembly of large genomes using a Bloom filter. Genome research, 27(5), 768-777. doi:10.1101/gr.214346.116

Code Snippets

 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
from Bio import SeqIO
import os
import glob
import argparse


def main():
    parser = argparse.ArgumentParser(description='Merges Phyluce UCEs from SPAdes and rnaSPAdes')
    parser.add_argument('-o', type=str,
                        help='Output Folder', required=True)
    parser.add_argument('-i', type=str,
                        help='Input folder of merged fastas', required=True)
    args = parser.parse_args()
    print("Counts merged_uces into a summary file in {} directory".format(args.o))

    count_uces(args.o, args.i)


def count_uces(output_directory, input_directory):
    # Gather each specimen file produced from the Phyluce
    merged_fastas = glob.glob(os.path.join(input_directory, "*_merged.fasta"))

    # Put all the contigs into a single dictionary
    specimen_dict = {}
    for fasta in merged_fastas:
        specimen = os.path.basename(fasta)
        specimen_name = specimen.replace("_merged.fasta", "").replace("-","_")
        with open(fasta) as f:
            count = 0
            abyss_count = 0
            spades_count = 0
            rnaspades_count = 0
            abyss_u_count = 0
            for seq in SeqIO.parse(fasta, 'fasta'):
                if "_AU" in seq.id[-3:]:
                    abyss_u_count += 1
                elif "_A" in seq.id[-2:]:
                    abyss_count += 1
                elif "_R" in seq.id[-2:]:
                    rnaspades_count += 1
                elif "_S" in seq.id[-2:]:
                    spades_count += 1

                count += 1
        if specimen_name in specimen_dict:
            specimen_dict[specimen_name] = [count, abyss_count, abyss_u_count, spades_count, rnaspades_count]
        else:
            specimen_dict[specimen_name] = [count, abyss_count, abyss_u_count, spades_count, rnaspades_count]

    output_file = os.path.join(output_directory, "merged_uce_summary.csv")
    with open(output_file, "w") as g:
        g.write("Specimen, Merged Targets, Abyss Contribution, Abyss Unmerged Contribution, SPAdes Contribution, rnaSPAdes Contribution\n")
        for key, value in specimen_dict.items():
            g.write("{},{},{},{},{},{}\n".format(key, value[0],value[1],value[2],value[3],value[4]))


if __name__ == "__main__":
    main()
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import os
import argparse


def main():
    parser = argparse.ArgumentParser(description='Combines various log files into a CSV')
    parser.add_argument('-i', type=str,
                        help='UCE Log Input', required=True)
    parser.add_argument('-f', type=str,
                        help='Fastq Metrics from statswrapper.sh', required=True)
    parser.add_argument('-o', type=str,
                        help='UCE Output', required=True)

    args = parser.parse_args()
    summarize_uces(args.i, args.f, args.o)


def summarize_uces(input_path, fastq_metrics, output_path):
    with open(output_path, "w") as g:
        reads = {}

        with open(fastq_metrics) as f:
            lines = f.readlines()
            lines.pop(0)
            for line in lines:
                split = line.rstrip().split("\t")
                read_count = split[0]
                file_name = split[-1]
                sample_name = os.path.basename(file_name).\
                    replace("_L001_R1_001.fastq.gz", "").replace("_L001_R2_001.fastq.gz", "")
                reads[sample_name] = read_count

        with open(input_path) as f:

            index = 0
            index_start = 0
            index_end = 0
            lines = f.readlines()
            for line in lines:
                if "INFO - ---" in line:
                    if index_start > 0:
                        index_end = index
                    else:
                        index_start = index
                index += 1

            specimen_lines = lines[index_start+1: index_end]
            g.write("Species, Reads, Targets, Contigs, Dupes, Targets Filtered, Contigs Filtered\n")
            for line in specimen_lines:
                if "Writing" in line:
                    continue
                sliced = line[76:]
                split = sliced.split(" ")
                species = split[0].replace(":", "")
                species_name = split[0].replace("_A:", "").replace("_S:", "").replace("_R:", "").replace("_AU:", "")
                read_count = 0
                if species_name in reads:
                    read_count = reads[species_name]
                uniques = split[1]
                contigs = split[5]
                dupes = split[7]
                removed = split[11]
                match = split[19]

                g.write("{},{},{},{},{},{},{}\n".format(species, read_count, uniques, contigs, dupes, removed, match))


if __name__ == "__main__":
    main()
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
from Bio import SeqIO
import os
import glob
import argparse


def main():
    parser = argparse.ArgumentParser(description='Merges Phyluce UCEs from SPAdes and rnaSPAdes')
    parser.add_argument('-o', type=str,
                        help='Output Folder', required=True)
    parser.add_argument('-s', type=str,
                        help='SPAdes exploded-fastas folder', required=True)
    parser.add_argument('-r', type=str,
                        help='rnaSPAdes exploded-fastas folder', required=True)
    parser.add_argument('-a', type=str,
                        help='Abyss exploded-fastas folder', required=True)
    parser.add_argument('-u', type=str,
                        help='Abyss Unmerged exploded-fastas folder', required=True)
    args = parser.parse_args()
    print("Merging SPAdes and rnaSPAdes UCEs together into {} directory".format(args.o))

    combine_uces(args.o, args.s, args.r, args.a, args.u)


def combine_uces(output_directory, spades_directory, rnaspades_directory, abyss_directory, abyss_u_directory):
    """
    Takes the UCES from various assembly runs and creates a seperate file taking only the best sequence per UCE
    :return:
    """

    # Verify folders exist
    if os.path.isdir(spades_directory) and os.path.isdir(rnaspades_directory) and os.path.isdir(abyss_directory):
        pass
    else:
        print("Missing either {} or {} or {}".format(spades_directory, rnaspades_directory, abyss_directory))
        return

    # Gather each specimen file produced from the Phyluce
    spades_fastas = glob.glob(os.path.join(spades_directory, "*.fasta"))
    rnaspades_fastas = glob.glob(os.path.join(rnaspades_directory, "*.fasta"))
    abyss_fastas = glob.glob(os.path.join(abyss_directory, "*.fasta"))
    abyss_u_fastas = glob.glob(os.path.join(abyss_u_directory, "*.fasta"))
    # Put all the contigs into a single dictionary
    specimen_dict = {}
    for fasta in spades_fastas:
        specimen = os.path.basename(fasta)
        specimen_name = specimen.replace("-S.unaligned.fasta", "")
        specimen_dict[specimen_name] = [fasta]

    for fasta in rnaspades_fastas:
        specimen = os.path.basename(fasta)
        specimen_name = specimen.replace("-R.unaligned.fasta", "")
        if specimen_name in specimen_dict:
            specimen_dict[specimen_name].append(fasta)

    for fasta in abyss_fastas:
        specimen = os.path.basename(fasta)
        specimen_name = specimen.replace("-A.unaligned.fasta", "")
        if specimen_name in specimen_dict:
            specimen_dict[specimen_name].append(fasta)

    for fasta in abyss_u_fastas:
        specimen = os.path.basename(fasta)
        specimen_name = specimen.replace("-AU.unaligned.fasta", "")
        if specimen_name in specimen_dict:
            specimen_dict[specimen_name].append(fasta)

    # For each specimen, add all the UCES to a single dictionary from every file, then examine each UCE sequence and
    # choose the one with the greatest length. Write all filtered UCEs to both a merged file, and monolithic file
    for key, value in specimen_dict.items():
        all_uces = {}
        for fasta in value:
            for seq in SeqIO.parse(fasta, 'fasta'):
                uce = seq.description.split("|")[-1]
                if uce in all_uces:
                    all_uces[uce].append(seq)
                else:
                    all_uces[uce] = [seq]
        print(key, len(all_uces))

        final_uces = []
        for k, v in all_uces.items():
            max_uce = None
            max_length = 0
            for seq in v:
                if len(seq.seq) > max_length:
                    max_uce = seq
            final_uces.append(max_uce)

        # Write Final UCES to merged file
        if not os.path.exists(output_directory):
            os.makedirs(output_directory)

        file_name = str(key) + "_merged.fasta"
        file_path = os.path.join(output_directory, file_name)
        with open(file_path, "w") as f:
            for seq in final_uces:
                SeqIO.write(seq, handle=f, format="fasta")

        file_name = "all-taxa-incomplete-merged-renamed.fasta"
        file_path = os.path.join(output_directory, file_name)
        with open(file_path, "a") as f:
            for seq in final_uces:
                uce = str(seq.id).split("_")[0]
                specimen = key
                seq.description = "|" + uce
                seq.id = uce + "_" + specimen
                SeqIO.write(seq, handle=f, format="fasta")

        # # Log all the changes made to the SPAdes UCE file to create the merged file
        # file_name = "UCE_Change_Log.txt"
        # file_path = os.path.join(new_directory, file_name)
        # with open(file_path, "a") as f:
        #     f.writelines(uce_change_log)


if __name__ == "__main__":
    main()
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from Bio import SeqIO
import os
import glob
import argparse

def main():
    parser = argparse.ArgumentParser(description='Renames Abyss contigs to more closely match SPAdes')
    parser.add_argument("input", type=str,
                        help='Input File')
    parser.add_argument('output', type=str,
                        help='Output File')
    args = parser.parse_args()
    print("Renaming Contigs in {}".format(args.input))

    rename_contigs(args.input, args.output)

def rename_contigs(input, output):
    seqs = []
    with open(input, "r") as f:
        for seq in SeqIO.parse(f, 'fasta'):
            seq.name = ""
            split = seq.description.split(" ")
            header = "NODE_{}_length_{}_cov_{}".format(split[0],split[1],split[2])
            seq.id = header
            seq.description = ""
            seqs.append(seq)

    with open(output, "w") as g:
        SeqIO.write(seqs, handle=g, format="fasta")


if __name__ == "__main__":
    main()
123
shell: "statswrapper.sh {input.r1} {input.r2} > {output}"
SnakeMake From line 123 of master/Snakefile
135
136
shell:
    "fastqc -o fastqc {input.r1} {input.r2}"
150
shell: "bbduk.sh in1={input.r1} out1={output.out1} in2={input.r2} out2={output.out2} ref={adaptors} ktrim=r k=23 mink=11 hdist=1 tpe tbo &>{log}; touch {output.out1} {output.out2}"
163
shell: "bbmerge.sh in1={input.r1} in2={input.r2} out={output.out_merged} outu={output.out_unmerged} ihist={output.ihist} &>{log}"
SnakeMake From line 163 of master/Snakefile
175
176
shell:
    "fastqc -o fastqc_trimmed {input.i1} {input.i2} &>{log}"
189
190
shell:
    "multiqc -n multiqc_report.html -o multiqc fastqc; multiqc -n multiqc_report_trimmed.html -o multiqc fastqc_trimmed;"
205
206
shell:
    "spades.py -t {threads} -1 {input.r1} -2 {input.r2} -o spades_assemblies/{wildcards.sample} &>{log}"
215
216
217
218
219
220
221
run:
    if os.path.exists(input.assembly):
        if os.path.exists("phyluce-spades/assemblies"):
            pass
        else:
            os.path.mkdir("phyluce-spades/assemblies")
        copyfile(input.assembly,output.renamed_assembly)
SnakeMake From line 215 of master/Snakefile
226
227
228
229
230
run:
    with open (output.w1, "w") as f:
        f.write("[all]\n")
        for item in SAMPLES:
            f.write(item + "_S\n")
SnakeMake From line 226 of master/Snakefile
237
shell: "statswrapper.sh phyluce-spades/assemblies/*.fasta > {output}"
SnakeMake From line 237 of master/Snakefile
249
shell: "rm -r phyluce-spades/uce-search-results; cd phyluce-spades; phyluce_assembly_match_contigs_to_probes --keep-duplicates KEEP_DUPLICATES --contigs assemblies --output uce-search-results --probes ../probes/*.fasta"
SnakeMake From line 249 of master/Snakefile
256
shell: "cd phyluce-spades; phyluce_assembly_get_match_counts --locus-db uce-search-results/probe.matches.sqlite --taxon-list-config taxon.conf --taxon-group 'all' --incomplete-matrix --output taxon-sets/all/all-taxa-incomplete.conf"
SnakeMake From line 256 of master/Snakefile
265
shell: "cd phyluce-spades/taxon-sets/all; mkdir log; phyluce_assembly_get_fastas_from_match_counts --contigs ../../assemblies --locus-db ../../uce-search-results/probe.matches.sqlite --match-count-output all-taxa-incomplete.conf --output all-taxa-incomplete.fasta --incomplete-matrix all-taxa-incomplete.incomplete --log-path log"
SnakeMake From line 265 of master/Snakefile
274
shell: "cd phyluce-spades/taxon-sets/all; rm -r exploded-fastas; phyluce_assembly_explode_get_fastas_file --input all-taxa-incomplete.fasta --output exploded-fastas --by-taxon; phyluce_assembly_explode_get_fastas_file --input all-taxa-incomplete.fasta --output exploded-locus; cd ../../../; touch {output.exploded_fastas}"
SnakeMake From line 274 of master/Snakefile
281
shell: "statswrapper.sh {input} > {output}"
SnakeMake From line 281 of master/Snakefile
288
shell: "python pipeline_files/evaluate.py -i {input.r1} -f {input.f1} -o {output.r2}"
SnakeMake From line 288 of master/Snakefile
305
306
shell:
    "rnaspades.py -t {threads} -1 {input.r1} -2 {input.r2} -o rnaspades_assemblies/{wildcards.sample} &>{log}"
SnakeMake From line 305 of master/Snakefile
314
315
316
317
318
319
320
run:
    if os.path.exists(input.assembly):
        if os.path.exists("phyluce-rnaspades/assemblies"):
            pass
        else:
            os.path.mkdir("phyluce-rnaspades/assemblies")
        copyfile(input.assembly,output.renamed_assembly)
SnakeMake From line 314 of master/Snakefile
325
326
327
328
329
run:
    with open (output.w2, "w") as f:
        f.write("[all]\n")
        for item in SAMPLES:
            f.write(item + "_R\n")
SnakeMake From line 325 of master/Snakefile
339
shell: "rm -r phyluce-rnaspades/uce-search-results; cd phyluce-rnaspades; phyluce_assembly_match_contigs_to_probes --keep-duplicates KEEP_DUPLICATES --contigs assemblies --output uce-search-results --probes ../probes/*.fasta"
SnakeMake From line 339 of master/Snakefile
345
shell: "cd phyluce-rnaspades; phyluce_assembly_get_match_counts --locus-db uce-search-results/probe.matches.sqlite --taxon-list-config taxon.conf --taxon-group 'all' --incomplete-matrix --output taxon-sets/all/all-taxa-incomplete.conf"
SnakeMake From line 345 of master/Snakefile
354
shell: "cd phyluce-rnaspades/taxon-sets/all; mkdir log; phyluce_assembly_get_fastas_from_match_counts --contigs ../../assemblies --locus-db ../../uce-search-results/probe.matches.sqlite --match-count-output all-taxa-incomplete.conf --output all-taxa-incomplete.fasta --incomplete-matrix all-taxa-incomplete.incomplete --log-path log"
SnakeMake From line 354 of master/Snakefile
363
shell: "cd phyluce-rnaspades/taxon-sets/all; rm -r exploded-fastas; phyluce_assembly_explode_get_fastas_file --input all-taxa-incomplete.fasta --output exploded-fastas --by-taxon; phyluce_assembly_explode_get_fastas_file --input all-taxa-incomplete.fasta --output exploded-locus; cd ../../../; touch {output.exploded_fastas}"
SnakeMake From line 363 of master/Snakefile
370
shell: "statswrapper.sh phyluce-rnaspades/assemblies/*.fasta > {output}"
SnakeMake From line 370 of master/Snakefile
377
shell: "statswrapper.sh {input} > {output}"
SnakeMake From line 377 of master/Snakefile
384
shell: "python pipeline_files/evaluate.py -i {input.r1} -f {input.f1} -o {output.r2}"
SnakeMake From line 384 of master/Snakefile
401
402
shell:
    "abyss-pe --directory=abyss_assemblies/{wildcards.sample} name={wildcards.sample} k=31 in=../../{input.i2} se=../../{input.i1} &>{log}"
SnakeMake From line 401 of master/Snakefile
410
411
shell:
    "python pipeline_files/rename_abyss_contigs.py {input} {output}"
SnakeMake From line 410 of master/Snakefile
420
421
shell:
    "sed -e '/^[^>]/s/[^ATGCatgc]/N/g' {input.assembly} >> {output.renamed_assembly}"
SnakeMake From line 420 of master/Snakefile
428
shell: "statswrapper.sh {input} > {output}"
SnakeMake From line 428 of master/Snakefile
433
434
435
436
437
run:
    with open (output.w1, "w") as f:
        f.write("[all]\n")
        for item in SAMPLES:
            f.write(item + "_A\n")
SnakeMake From line 433 of master/Snakefile
446
shell: "rm -r phyluce-abyss/uce-search-results; cd phyluce-abyss; phyluce_assembly_match_contigs_to_probes --keep-duplicates KEEP_DUPLICATES --contigs assemblies --output uce-search-results --probes ../probes/*.fasta"
SnakeMake From line 446 of master/Snakefile
452
shell: "cd phyluce-abyss; phyluce_assembly_get_match_counts --locus-db uce-search-results/probe.matches.sqlite --taxon-list-config taxon.conf --taxon-group 'all' --incomplete-matrix --output taxon-sets/all/all-taxa-incomplete.conf"
SnakeMake From line 452 of master/Snakefile
460
shell: "cd phyluce-abyss/taxon-sets/all; mkdir log; phyluce_assembly_get_fastas_from_match_counts --contigs ../../assemblies --locus-db ../../uce-search-results/probe.matches.sqlite --match-count-output all-taxa-incomplete.conf --output all-taxa-incomplete.fasta --incomplete-matrix all-taxa-incomplete.incomplete --log-path log"
SnakeMake From line 460 of master/Snakefile
469
shell: "cd phyluce-abyss/taxon-sets/all; rm -r exploded-fastas; phyluce_assembly_explode_get_fastas_file --input all-taxa-incomplete.fasta --output exploded-fastas --by-taxon; phyluce_assembly_explode_get_fastas_file --input all-taxa-incomplete.fasta --output exploded-locus; cd ../../../; touch {output.exploded_fastas}"
SnakeMake From line 469 of master/Snakefile
476
shell: "statswrapper.sh {input} > {output}"
SnakeMake From line 476 of master/Snakefile
482
shell: "python pipeline_files/evaluate.py -i {input.r1} -f {input.f1} -o {output.r2}"
SnakeMake From line 482 of master/Snakefile
499
500
shell:
    "abyss-pe --directory=abyss_u_assemblies/{wildcards.sample} name={wildcards.sample} k=31 in='../../{input.r1} ../../{input.r2}' &>{log}"
SnakeMake From line 499 of master/Snakefile
508
509
shell:
    "python pipeline_files/rename_abyss_contigs.py {input} {output}"
SnakeMake From line 508 of master/Snakefile
518
519
shell:
    "sed -e '/^[^>]/s/[^ATGCatgc]/N/g' {input.assembly} >> {output.renamed_assembly}"
SnakeMake From line 518 of master/Snakefile
526
shell: "statswrapper.sh {input} > {output}"
SnakeMake From line 526 of master/Snakefile
531
532
533
534
535
run:
    with open (output.w1, "w") as f:
        f.write("[all]\n")
        for item in SAMPLES:
            f.write(item + "_AU\n")
SnakeMake From line 531 of master/Snakefile
544
shell: "rm -r phyluce-abyss_u/uce-search-results; cd phyluce-abyss_u; phyluce_assembly_match_contigs_to_probes --keep-duplicates KEEP_DUPLICATES --contigs assemblies --output uce-search-results --probes ../probes/*.fasta"
SnakeMake From line 544 of master/Snakefile
550
shell: "cd phyluce-abyss_u; phyluce_assembly_get_match_counts --locus-db uce-search-results/probe.matches.sqlite --taxon-list-config taxon.conf --taxon-group 'all' --incomplete-matrix --output taxon-sets/all/all-taxa-incomplete.conf"
SnakeMake From line 550 of master/Snakefile
558
shell: "cd phyluce-abyss_u/taxon-sets/all; mkdir log; phyluce_assembly_get_fastas_from_match_counts --contigs ../../assemblies --locus-db ../../uce-search-results/probe.matches.sqlite --match-count-output all-taxa-incomplete.conf --output all-taxa-incomplete.fasta --incomplete-matrix all-taxa-incomplete.incomplete --log-path log"
SnakeMake From line 558 of master/Snakefile
567
shell: "cd phyluce-abyss_u/taxon-sets/all; rm -r exploded-fastas; phyluce_assembly_explode_get_fastas_file --input all-taxa-incomplete.fasta --output exploded-fastas --by-taxon; phyluce_assembly_explode_get_fastas_file --input all-taxa-incomplete.fasta --output exploded-locus; cd ../../../; touch {output.exploded_fastas}"
SnakeMake From line 567 of master/Snakefile
574
shell: "statswrapper.sh {input} > {output}"
SnakeMake From line 574 of master/Snakefile
580
shell: "python pipeline_files/evaluate.py -i {input.r1} -f {input.f1} -o {output.r2}"
SnakeMake From line 580 of master/Snakefile
599
shell: "python pipeline_files/merge_uces.py -o merged_uces -s phyluce-spades/taxon-sets/all/exploded-fastas/ -r phyluce-rnaspades/taxon-sets/all/exploded-fastas/ -a phyluce-abyss/taxon-sets/all/exploded-fastas/ -u phyluce-abyss_u/taxon-sets/all/exploded-fastas/"
SnakeMake From line 599 of master/Snakefile
606
shell: "python pipeline_files/count_uces.py -o summaries -i merged_uces"
SnakeMake From line 606 of master/Snakefile
612
shell: "cat {input} >> {output}"
SnakeMake From line 612 of master/Snakefile
ShowHide 50 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/AAFC-BICoE/snakemake-partial-genome-pipeline
Name: snakemake-partial-genome-pipeline
Version: v0.6
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...