A repository to conduct experiments with omnitig-related models for genome assembly.

public public 1yr ago Version: v0.4.3 0 bookmarks

This is a repository to conduct experiments with omnitig-related models in the context of genome assembly. The algorithms are implemented in Rust , and around that we built a snakemake toolchain to conduct experiments. To ensure reproducibility, we wrapped everything into a conda environment.

Usage

Required Software

  • conda >= 4.8.3 (lower might be possible, but has not been tested)

Setup

First, set up the conda environment of this project.

conda env create -f environment.yml

Then, activate the environment.

source activate practical-omnitigs

Running Experiments

Make sure that you are in the right conda environment (should be practical-omnitigs ).

conda info

Subsequently, experiments can be run using snakemake .

snakemake --cores all <experiment>

Valid experiments are:

  • selftest : Check if you have conda set up correctly. It prints the version of snakemake conda and wget . The versions of snakemake and conda should match the definition in /environment.yaml and the version of wget should match the definition in /config/conda-selftest-env.yaml .

  • test : execute all integration tests of this project on a single small sample genome.

  • test_all : execute all integration tests of this project on all defined genomes (potentially very large).

The experiments are run inside a conda environment that is set up by snakemake. This ensures reproducibility of the results and automates the installation of required tools.

Using the Implementation Directly

The Rust code written for this project includes a command line interface that can be used directly. For documentation on how to use it, please refer to the documentation of the [cli crate][cli crate].

Troubleshooting

If you have problems with using this software package, take a look at our troubleshooting page . If that does not solve your issue, do not hesitate to file a bug report .

Technical Information

Directory Structure

  • .github : GitHub workflows for continuous testing.

  • .idea : Configuration for the IntelliJ IDEA integrated development environment.

  • config : All config files related to the experiments, including conda environments and experiment declarations.

  • data : Data used and produced by the experiments.

  • external-software : Location to install external software required by the experiment pipeline.

  • implementation : The algorithms that we are testing. Everything is written in Rust.

Implementation

The algorithms of this project are implemented in Rust. We split the implementation into multiple library crates to increase the reusability of our code. On top of that, the cli crate provides all implemented functionality via a command line interface. Refer to [its documentation][cli crate] for more information.

Except for cli , all crates are published on crates.io .

License

This project is licensed under the terms of the BSD 2-Clause license. See LICENSE.md for more information.

How to Cite

If you use this code in your research project, please cite it as "Safe and Complete Genome Assembly in Practice, DOI: 10.5281/zenodo.4335367"

Code Snippets

319
shell:  "echo 'No target specified'"
SnakeMake From line 319 of master/Snakefile
523
524
525
526
527
shell: """
    mkdir -p '{params.hashdir}'
    echo '{wildcards.report_name} {params.genome_name} {wildcards.report_file_name}' > '{params.name_file}'
    python3 '{input.script}' '{params.hashdir}' '{params.name_file}' 'none' 'none' '{input.combined_eaxmax_plot}' '{output}' {params.script_column_arguments}
    """
SnakeMake From line 523 of master/Snakefile
588
589
590
shell: """
    python3 '{input.script}' --source-reports '{params.source_reports_arg}' --source-report-names '{params.source_report_names_arg}' --output '{output.file}'
    """
SnakeMake From line 588 of master/Snakefile
601
602
603
604
shell: """
    mkdir -p "$(dirname '{output}')"
    python3 '{input.script}' '{params.input_quast_csvs}' '{output}'
    """
SnakeMake From line 601 of master/Snakefile
611
shell: "convert {input} {output}"
SnakeMake From line 611 of master/Snakefile
618
619
620
shell: """
    tectonic '{input}'
    """
SnakeMake From line 618 of master/Snakefile
685
shell: "${{CONDA_PREFIX}}/bin/time -v '{input.binary}' {params.command} --output-as-wtdbg2-node-ids --file-format wtdbg2 --input '{input.nodes}' --input '{input.reads}' --input '{input.raw_reads}' --input '{input.dot}' --output '{output.file}' --latex '{output.latex}' 2>&1 | tee '{log.log}'"
712
shell: "${{CONDA_PREFIX}}/bin/time -v '{input.binary}' {params.command} --file-format dot --input '{input.dot}' --output '{output.file}' --latex '{output.latex}' 2>&1 | tee '{log.log}'"
SnakeMake From line 712 of master/Snakefile
745
shell: "ln -sr '{input.raw_assembly_from_assembler}' '{output.raw_assembly}'"
SnakeMake From line 745 of master/Snakefile
768
shell: "ln -sr '{input.raw_assembly_from_assembler}' '{output.raw_assembly}'"
SnakeMake From line 768 of master/Snakefile
794
shell: "ln -sr '{input}' '{output}'"
SnakeMake From line 794 of master/Snakefile
920
shell: "'{input.script}' --threads {threads} --input-contigs '{input.contigs}' --input-reads '{input.reads}' --output-contigs '{output.broken_contigs}'"
SnakeMake From line 920 of master/Snakefile
934
shell: "'{input.binary}' compute-trivial-omnitigs --non-scc --file-format hifiasm --input '{input.contigs}' --output '{output.trivial_omnitigs}' 2>&1 | tee '{log.log}'"
968
969
970
971
shell: """
    read -r REFERENCE_LENGTH < '{input.reference_length}'
    ${{CONDA_PREFIX}}/bin/time -v '{input.binary}' -x {wildcards.wtdbg2_mode} -g $REFERENCE_LENGTH -i '{input.reads}' -t {threads} -fo '{params.output_prefix}' --dump-kbm '{output.kbm}' {params.fragment_correction_steps} 2>&1 | tee '{log.log}'
"""
SnakeMake From line 968 of master/Snakefile
 999
1000
1001
1002
shell: """
    read -r REFERENCE_LENGTH < '{input.reference_length}'
    ${{CONDA_PREFIX}}/bin/time -v '{input.binary}' -x {wildcards.wtdbg2_mode} -g $REFERENCE_LENGTH -i '{input.reads}' -t {threads} -fo '{params.output_prefix}' --dump-kbm '{output.kbm}' --skip-fragment-assembly {params.fragment_correction_steps} 2>&1 | tee '{log.log}'
"""
SnakeMake From line 999 of master/Snakefile
1029
1030
1031
1032
1033
shell: """
    read -r REFERENCE_LENGTH < '{input.reference_length}'
    read -r EDGE_COV < '{input.edge_cov}'
    ${{CONDA_PREFIX}}/bin/time -v '{input.binary}' -x {wildcards.wtdbg2_mode} -g $REFERENCE_LENGTH -e $EDGE_COV -i '{input.reads}' -t {threads} -fo '{params.output_prefix}' --load-nodes '{input.cached_nodes}' --load-clips '{input.cached_clips}' --load-kbm '{input.cached_kbm}' --inject-unitigs '{input.contigs}' {params.skip_fragment_assembly} {params.fragment_correction_steps} 2>&1 | tee '{log.log}'
"""
SnakeMake From line 1029 of master/Snakefile
1059
1060
1061
1062
1063
shell: """
    read -r REFERENCE_LENGTH < '{input.reference_length}'
    read -r EDGE_COV < '{input.edge_cov}'
    ${{CONDA_PREFIX}}/bin/time -v '{input.binary}' -x {wildcards.wtdbg2_mode} -g $REFERENCE_LENGTH -e $EDGE_COV -i '{input.reads}' -t {threads} -fo '{params.output_prefix}' --load-nodes '{input.cached_nodes}' --load-clips '{input.cached_clips}' --load-kbm '{input.cached_kbm}' --inject-fragment-unitigs '{input.fragment_contigs}' 2>&1 | tee '{log.log}'
"""
SnakeMake From line 1059 of master/Snakefile
1079
1080
1081
1082
shell:  """
    cd '{params.working_directory}'
    ${{CONDA_PREFIX}}/bin/time -v gunzip -k wtdbg2.{wildcards.subfile}.gz 2>&1 | tee '{params.abslog}'
    """
SnakeMake From line 1079 of master/Snakefile
1095
shell: "${{CONDA_PREFIX}}/bin/time -v {input.binary} -t {threads} -i '{input.contigs}' -fo '{output.consensus}' 2>&1 | tee '{log.log}'"
SnakeMake From line 1095 of master/Snakefile
1114
shell:  "${{CONDA_PREFIX}}/bin/time -v {input.binary} --input {input.contigs} --output {output.contigs} --normal-reads {input.normal_reads} --compute-threads {threads} 2>&1 | tee '{log.log}'"
SnakeMake From line 1114 of master/Snakefile
1123
shell:  "ln -sr -T '{input.contigs}' '{output.contigs}'"
SnakeMake From line 1123 of master/Snakefile
1128
1129
1130
shell:  """
    grep 'Set --edge-cov to ' '{input}' | sed 's/.*Set --edge-cov to //g' > '{output}'
    """
SnakeMake From line 1128 of master/Snakefile
1151
1152
1153
1154
shell:  """
    read -r REFERENCE_LENGTH < '{input.reference_length}'
    ${{CONDA_PREFIX}}/bin/time -v '{input.script}' -g $REFERENCE_LENGTH -t {threads} -o '{params.output_directory}' --{wildcards.flye_mode} '{input.reads}' 2>&1 | tee '{log.log}'
    """
SnakeMake From line 1151 of master/Snakefile
1162
shell:  "ln -sr -T '{input.contigs}' '{output.contigs}'"
SnakeMake From line 1162 of master/Snakefile
1188
shell: "${{CONDA_PREFIX}}/bin/time -v '{input.binary}' --primary -t {threads} -o '{params.output_prefix}' '{input.reads}' 2>&1 | tee '{log}'"
SnakeMake From line 1188 of master/Snakefile
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
run:
        with open(input.gfa, 'r') as input_file, open(input.alternate_gfa, 'r') as alternate_input_file, open(output.fa, 'w') as output_file:
            for line in itertools.chain(input_file, alternate_input_file):
                if line[0] != "S":
                    continue

                columns = line.split("\t")
                print(f"Writing contig {columns[1]}...")
                output_file.write(">{}\n{}\n".format(columns[1], columns[2]))
            print(f"Wrote all contigs")
SnakeMake From line 1198 of master/Snakefile
1217
shell: "${{CONDA_PREFIX}}/bin/time -v '{input.rust_binary}' compute-trivial-omnitigs --file-format hifiasm --input '{input.unitigs}' --output '{output.contigs}' --latex '{output.latex}' --non-scc 2>&1 | tee '{log.log}'"
1227
shell: "${{CONDA_PREFIX}}/bin/time -v '{input.rust_binary}' compute-omnitigs --file-format hifiasm --input '{input.unitigs}' --output '{output.contigs}' --latex '{output.latex}' --linear-reduction 2>&1 | tee '{log.log}'"
1250
1251
1252
1253
shell:  """
    RUST_BACKTRACE=full ${{CONDA_PREFIX}}/bin/time -v '{input.script}' '{input.reads}' '{params.output_prefix}' {threads} 2>&1 | tee '{log.log}'
    ln -sr -T '{params.original_contigs}' '{output.contigs}'
    """
SnakeMake From line 1250 of master/Snakefile
1272
1273
1274
1275
1276
shell:  """
    RUST_BACKTRACE=full ${{CONDA_PREFIX}}/bin/time -v '{input.binary}' '{input.reads}' -k 35 -l 12 --density 0.002 --threads {threads} --prefix '{params.output_prefix}' 2>&1 | tee '{log.log}'
    ${{CONDA_PREFIX}}/bin/time -v '{input.simplify_script}' '{params.output_prefix}' 2>&1 | tee -a '{log.log}'
    ln -sr -T '{params.original_contigs}' '{output.contigs}'
    """
SnakeMake From line 1272 of master/Snakefile
1295
1296
1297
1298
1299
shell:  """
    RUST_BACKTRACE=full ${{CONDA_PREFIX}}/bin/time -v '{input.binary}' '{input.reads}' -k 21 -l 14 --density 0.003 --threads {threads} --prefix '{params.output_prefix}' 2>&1 | tee '{log.log}'
    ${{CONDA_PREFIX}}/bin/time -v '{input.simplify_script}' '{params.output_prefix}' 2>&1 | tee -a '{log.log}'
    ln -sr -T '{params.original_contigs}' '{output.contigs}'
    """
SnakeMake From line 1295 of master/Snakefile
1318
1319
1320
1321
1322
shell:  """
    mkdir -p '{params.output_dir}'
    ${{CONDA_PREFIX}}/bin/time -v '{input.binary}' -t {threads} -o '{params.output_dir}' --reads '{input.reads}' 2>&1 | tee '{log.log}'
    ln -sr -T '{params.original_contigs}' '{output.contigs}'
    """
SnakeMake From line 1318 of master/Snakefile
1342
1343
1344
1345
1346
1347
shell:  """
    read -r REFERENCE_LENGTH < '{input.reference_length}'
    mkdir -p '{params.output_dir}'
    ${{CONDA_PREFIX}}/bin/time -v canu -assemble -p assembly -d '{params.output_dir}' genomeSize=$REFERENCE_LENGTH useGrid=false -pacbio-hifi '{input.reads}' 2>&1 | tee '{log.log}'
    ln -sr -T '{params.original_contigs}' '{output.contigs}'
    """
1827
shell:  "${{CONDA_PREFIX}}/bin/time -v '{input.binary}' --threads {params.threads} '{input.reads}' '{output.reads}' 2>&1 | tee '{log}'"
SnakeMake From line 1827 of master/Snakefile
1845
shell:  "${{CONDA_PREFIX}}/bin/time -v '{input.binary}' --threads {params.threads} '{input.reference}' '{output.reference}' 2>&1 | tee '{log}'"
SnakeMake From line 1845 of master/Snakefile
1857
shell: "ln -sr -T '{input.reads}' '{output.reads}'"
SnakeMake From line 1857 of master/Snakefile
1878
1879
1880
shell:  """
    '{input.binary}' -v -T{threads} -P'{params.tmp_dir}' -t1 -p -k{wildcards.fastk_k} '{input.reads}' 2>&1 | tee '{log}'
    """
SnakeMake From line 1878 of master/Snakefile
1897
1898
1899
1900
shell:  """
    '{input.binary}' -h10000 '{input.hist}' 2>&1 | tee '{log.histogram}'
    '{input.binary}' -k -h10000 '{input.hist}' 2>&1 | tee -a '{log.histogram}'
    """
SnakeMake From line 1897 of master/Snakefile
1920
1921
1922
shell:  """
    '{input.binary}' -v -T{threads} -P'{params.tmp_dir}' '{input.table}' '{output.table}' 2>&1 | tee '{log}'
    """
SnakeMake From line 1920 of master/Snakefile
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
shell:  """
    ln -sr -T '{input.table}' '{output.table}'

    for INPUT_FILE_NAME in $(ls '{params.input_dirname}/.{params.input_filename}.'* | xargs -n 1 basename); do
        INPUT_FILE='{params.input_dirname}'/"${{INPUT_FILE_NAME}}"
        OUTPUT_FILE_NAME=${{INPUT_FILE_NAME/{params.input_filename}/{params.output_filename}}}
        OUTPUT_FILE='{params.output_dirname}'/"${{OUTPUT_FILE_NAME}}"
        ln -sr -T "${{INPUT_FILE}}" "${{OUTPUT_FILE}}"
    done
    """
SnakeMake From line 1934 of master/Snakefile
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
shell:  """
    ln -sr -T '{input.profile}' '{output.profile}'

    for INPUT_FILE_NAME in $(ls '{params.input_dirname}/.{params.input_filename}.'* | xargs -n 1 basename); do
        INPUT_FILE='{params.input_dirname}'/"${{INPUT_FILE_NAME}}"
        OUTPUT_FILE_NAME=${{INPUT_FILE_NAME/{params.input_filename}/{params.output_filename}}}
        OUTPUT_FILE='{params.output_dirname}'/"${{OUTPUT_FILE_NAME}}"
        ln -sr -T "${{INPUT_FILE}}" "${{OUTPUT_FILE}}"
    done
    for INPUT_FILE_NAME in $(ls '{params.input_dirname}/.{params.input_filename_pidx}.'* | xargs -n 1 basename); do
        INPUT_FILE='{params.input_dirname}'/"${{INPUT_FILE_NAME}}"
        OUTPUT_FILE_NAME=${{INPUT_FILE_NAME/{params.input_filename_pidx}/{params.output_filename_pidx}}}
        OUTPUT_FILE='{params.output_dirname}'/"${{OUTPUT_FILE_NAME}}"
        ln -sr -T "${{INPUT_FILE}}" "${{OUTPUT_FILE}}"
    done
    """
SnakeMake From line 1957 of master/Snakefile
2010
2011
2012
2013
shell:  """
    cd '{params.working_directory}'
    '{params.input_binary}' -v -T{threads} -g{wildcards.himodel_min_valid}:{wildcards.himodel_max_valid} -e{wildcards.himodel_kmer_threshold} '{params.input_prefix}' 2>&1 | tee '{params.log}'
    """
SnakeMake From line 2010 of master/Snakefile
2150
2151
2152
2153
shell:  """
    '{input.binary}' -v '{input.reference}' '{input.model}' -o'{params.output_prefix}' {params.sim_params} -p{params.ploidy_tree} -fh -r3541529 -U 2>&1 | tee '{log}'
    ln -sr -T '{params.output_prefix}.fasta' '{output.reads}'
    """
SnakeMake From line 2150 of master/Snakefile
2167
shell:  "${{CONDA_PREFIX}}/bin/time -v '{input.script}' '{input.reads}' '{output.reads}' {wildcards.read_downsampling_factor}"
SnakeMake From line 2167 of master/Snakefile
2177
shell:  "${{CONDA_PREFIX}}/bin/time -v seqtk seq -AU '{input.reads}' > '{output.reads}'"
2192
shell:  "${{CONDA_PREFIX}}/bin/time -v '{input.script}' '{input.reference}' '{output.reference}'"
SnakeMake From line 2192 of master/Snakefile
2207
shell:  "${{CONDA_PREFIX}}/bin/time -v '{input.script}' '{input.reference}' '{output.reference}'"
SnakeMake From line 2207 of master/Snakefile
2223
shell:  "${{CONDA_PREFIX}}/bin/time -v '{input.script}' '{input.reference}' '{output.reference}'"
SnakeMake From line 2223 of master/Snakefile
2234
shell:  "${{CONDA_PREFIX}}/bin/time -v '{input.script}' '{input.reference}' '{output.reference_length}'"
SnakeMake From line 2234 of master/Snakefile
2245
shell: "cp '{input.filtered}' '{output.linear}'; data/target/release/cli circularise-genome --input '{input.filtered}' 2>&1 --output '{output.circular}' | tee '{output.log}'"
SnakeMake From line 2245 of master/Snakefile
2327
shell: "${{CONDA_PREFIX}}/bin/time -v {input.script} {params.extra_arguments} -t {threads} --no-html --large -o '{output.directory}' {params.references} '{input.contigs}'"
SnakeMake From line 2327 of master/Snakefile
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
run:
    result = {}
    for key, input_file_name in params.file_map.items():
        with open(input_file_name, 'r') as input_file:
            values = {}
            for line in input_file:
                if "Elapsed (wall clock) time (h:mm:ss or m:ss):" in line:
                    line = line.replace("Elapsed (wall clock) time (h:mm:ss or m:ss):", "").strip()
                    values["time"] = decode_time(line) + values.setdefault("time", 0)
                elif "Maximum resident set size" in line:
                    values["mem"] = max(int(line.split(':')[1].strip()), values.setdefault("mem", 0))

            assert "time" in values, f"No time found in {input_file_name}"
            assert "mem" in values, f"No mem found in {input_file_name}"
            result[key] = values

    sum_time = sum([values["time"] for values in result.values()])
    max_mem = max([values["mem"] for values in result.values()])
    result["total"] = {
        "time": sum_time,
        "mem": max_mem,
    }

    with open(output.file, 'w') as output_file:
        json.dump(result, output_file)
SnakeMake From line 2407 of master/Snakefile
2458
2459
2460
2461
2462
2463
shell:
    """
    cd '{input.contig_validator_dir}'
    # The abundance-min here has nothing to do with the abundance_min from bcalm2
    bash run.sh -suffixsave 0 -abundance-min 1 -kmer-size {wildcards.k} -r '../../{input.reference}' -a '../../{output.result}' -i '../../{input.reads}'
    """
SnakeMake From line 2458 of master/Snakefile
2474
shell:  "${{CONDA_PREFIX}}/bin/time -v '{input.converter}' {input.fa} {output.gfa} {wildcards.k}"
SnakeMake From line 2474 of master/Snakefile
2481
shell: "Bandage image {input} {output} --width 1000 --height 1000"
2549
2550
2551
2552
2553
2554
2555
shell:  """
    wget --progress=dot:mega -O '{output.file}' '{params.url}'
    wget --progress=dot:mega -O '{output.checksum_file}' '{params.checksum_url}'

    CHECKSUM=$(md5sum '{output.file}' | cut -f1 -d' ' | sed 's/[\]//g')
    cat '{output.checksum_file}' | grep "$CHECKSUM"
"""
SnakeMake From line 2549 of master/Snakefile
2570
2571
2572
2573
2574
2575
2576
2577
shell:  """
    wget --progress=dot:mega -O '{output.file}' '{params.url}'
    wget --progress=dot:mega -O '{output.checksum_file}' '{params.checksum_url}'

    CHECKSUM=$(md5sum '{output.file}' | cut -f1 -d' ' | sed 's/[\]//g')
    echo $CHECKSUM
    cat '{output.checksum_file}' | grep "$CHECKSUM"
"""
SnakeMake From line 2570 of master/Snakefile
2587
2588
2589
shell:  """
    wget --progress=dot:mega -O '{output.file}' '{params.url}'
"""
SnakeMake From line 2587 of master/Snakefile
2597
2598
2599
shell:  """
    wget --progress=dot:mega -O '{output.file}' '{params.url}'
"""
SnakeMake From line 2597 of master/Snakefile
2607
2608
2609
shell:  """
    bioawk -c fastx '{{ print ">" $name "\\n" $seq }}' '{input.file}' > '{output.file}'
"""
2615
shell:  "fastq-dump --stdout --fasta default '{input.file}' > '{output.file}'"
SnakeMake From line 2615 of master/Snakefile
2624
shell:  "cd '{params.working_directory}'; gunzip -k {wildcards.file}.gz"
SnakeMake From line 2624 of master/Snakefile
2671
shell: "ln -sr -T '{input.file}' '{output.file}'"
SnakeMake From line 2671 of master/Snakefile
2689
shell:  "cat {params.input_files} > '{output.file}'"
SnakeMake From line 2689 of master/Snakefile
2702
shell: "python3 '{input.script}' '{input.reads}' '{output.reads}' 2>&1 | tee '{log.log}'"
SnakeMake From line 2702 of master/Snakefile
2715
shell: "cargo fetch --manifest-path 'implementation/Cargo.toml' 2>&1 | tee '{log.log}'"
2727
shell: "cargo test -j {threads} --target-dir '{params.rust_dir}' --manifest-path 'implementation/Cargo.toml' --offline 2>&1 | tee '{log.log}'"
2739
shell: "cargo build -j {threads} --release --target-dir '{params.rust_dir}' --manifest-path 'implementation/Cargo.toml' --offline 2>&1 | tee '{log.log}'"
2747
2748
2749
2750
2751
2752
2753
2754
shell:  """
    mkdir -p '{params.external_software_scripts_dir}'
    cd '{params.external_software_scripts_dir}'

    rm -rf convertToGFA.py
    wget https://raw.githubusercontent.com/GATB/bcalm/v2.2.3/scripts/convertToGFA.py
    chmod u+x convertToGFA.py
    """
2763
2764
2765
2766
2767
2768
2769
2770
2771
2772
2773
shell:  """
    mkdir -p '{params.external_software_dir}'
    cd '{params.external_software_dir}'

    rm -rf ContigValidator
    git clone --recursive https://github.com/mayankpahadia1993/ContigValidator.git
    cd ContigValidator/src
    echo 'count_kmers: count_kmers_kmc' >> Makefile
    sed -i 's\\count_kmers: count_kmers_kmc.cpp KMC/kmc_api/kmc_file.o\\count_kmers_kmc: count_kmers_kmc.cpp KMC/kmc_api/kmc_file.o\\g' Makefile
    LIBRARY_PATH="../../sdsl-lite/lib" CPATH="../../sdsl-lite/include" make -j {threads}
    """
SnakeMake From line 2763 of master/Snakefile
2780
2781
2782
2783
2784
2785
2786
2787
2788
shell: """
    mkdir -p '{params.external_software_dir}'
    cd '{params.external_software_dir}'

    rm -rf quast
    git clone https://github.com/sebschmi/quast
    cd quast
    git checkout cf1461f48e937488928b094946bb591cd5b325a3
"""
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
shell:  """
    mkdir -p '{params.external_software_dir}'
    cd '{params.external_software_dir}'

    rm -rf sdsl-lite
    git clone https://github.com/simongog/sdsl-lite.git
    cd sdsl-lite
    git checkout v2.1.1
    HOME=`pwd` ./install.sh
    """
SnakeMake From line 2796 of master/Snakefile
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
shell: """
    mkdir -p '{params.external_software_dir}'
    cd '{params.external_software_dir}'

    rm -rf Ratatosk
    git clone --recursive https://github.com/GuillaumeHolley/Ratatosk.git
    cd Ratatosk
    git checkout --recurse-submodules 74ca617afb20a7c24d73d20f2dcdf223db303496

    mkdir build
    cd build
    cmake ..
    make -j {threads}
    """
SnakeMake From line 2813 of master/Snakefile
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
shell: """
    mkdir -p '{params.external_software_dir}'
    cd '{params.external_software_dir}'

    rm -rf wtdbg2
    git clone https://github.com/sebschmi/wtdbg2.git
    cd wtdbg2
    git checkout 78c3077b713aaee48b6c0835105ce6c666f6e796

    sed -i 's:CFLAGS=:CFLAGS=-I${{CONDA_PREFIX}}/include -L${{CONDA_PREFIX}}/lib :g' Makefile
    """
2858
2859
2860
2861
shell: """
    cd '{params.wtdbg2_dir}'
    make CC=x86_64-conda-linux-gnu-gcc -j {threads}
    """
SnakeMake From line 2858 of master/Snakefile
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
shell:  """
    mkdir -p '{params.external_software_dir}'
    cd '{params.external_software_dir}'

    wget -O sim-it.tar.gz https://github.com/ndierckx/Sim-it/archive/refs/tags/Sim-it1.2.tar.gz

    rm -rf Sim-it-Sim-it1.2
    rm -rf sim-it
    tar -xf sim-it.tar.gz
    mv Sim-it-Sim-it1.2/ sim-it/
    mv sim-it/Sim-it1.2.pl sim-it/sim-it.pl
"""
SnakeMake From line 2867 of master/Snakefile
2884
2885
2886
2887
2888
2889
2890
2891
2892
2893
2894
shell:  """
    mkdir -p '{params.external_software_dir}'
    cd '{params.external_software_dir}'

    rm -rf Flye
    git clone https://github.com/sebschmi/Flye
    cd Flye
    git checkout 38921327d6c5e57a59e71a7181995f2f0c04be75

    mv bin/flye bin/flye.disabled # rename such that snakemake does not delete it
    """
2906
2907
2908
2909
2910
2911
2912
2913
2914
2915
2916
2917
2918
2919
shell:  """
    cd '{params.flye_directory}'

    export CXX=x86_64-conda-linux-gnu-g++
    export CC=x86_64-conda-linux-gnu-gcc
    # export INCLUDES=-I/usr/include/ # Somehow this is not seen by minimap's Makefile, so we had to change it in our custom version of Flye
    # The following also doesn't seem to work when building minimap, so again we had to modify minimap's Makefile
    # export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:${{LD_LIBRARY_PATH:=''}} # Redirect library path to include conda libraries
    # make # This does not create the python script anymore

    /usr/bin/env python3 setup.py install

    mv bin/flye.disabled bin/flye # was renamed such that snakemake does not delete it
    """
2928
2929
2930
2931
2932
2933
2934
2935
2936
2937
2938
2939
2940
2941
2942
shell:  """
    mkdir -p '{params.external_software_dir}'
    cd '{params.external_software_dir}'

    rm -rf rust-mdbg
    git clone https://github.com/sebschmi/rust-mdbg
    cd rust-mdbg
    git checkout 4ff0122a8c63210820ba0341fa7365d6ac216612

    cargo fetch

    # rename such that snakemake does not delete them
    mv utils/magic_simplify utils/magic_simplify.original
    mv utils/multik utils/multik.original
    """
2957
2958
2959
2960
2961
2962
2963
2964
2965
2966
2967
2968
shell:  """
    cd '{params.mdbg_directory}'
    cargo --offline build --release -j {threads} --target-dir '{params.mdbg_target_directory}'

    # were renamed such that snakemake does not delete them
    cp utils/magic_simplify.original utils/magic_simplify
    cp utils/multik.original utils/multik

    # use built binaries instead of rerunning cargo
    sed -i 's:cargo run --manifest-path .DIR/../Cargo.toml --release:'"'"'{params.rust_mdbg}'"'"':g' utils/multik
    sed -i 's:cargo run --manifest-path .DIR/../Cargo.toml --release --bin to_basespace --:'"'"'{params.to_basespace}'"'"':g' utils/magic_simplify
    """
2975
2976
2977
2978
2979
2980
2981
2982
2983
2984
2985
2986
2987
2988
shell:  """
    mkdir -p '{params.external_software_dir}'
    cd '{params.external_software_dir}'

    rm -rf LJA
    git clone https://github.com/AntonBankevich/LJA
    cd LJA
    git checkout 99f93262c50ff269ee28707f7c3bb77ea00eb576

    #sed -i 's/find_package(OpenMP)//g' CMakeLists.txt
    #sed -i "s:\${{OpenMP_CXX_FLAGS}}:-L${{CONDA_PREFIX}}/lib -lgomp :g" CMakeLists.txt
    #sed -i "s:\${{OpenMP_C_FLAGS}}:-L${{CONDA_PREFIX}}/lib -lgomp :g" CMakeLists.txt
    #sed -i "s:\${{OpenMP_EXE_LINKER_FLAGS}}:-L${{CONDA_PREFIX}}/lib -lgomp :g" CMakeLists.txt
    """
SnakeMake From line 2975 of master/Snakefile
2998
2999
3000
3001
3002
3003
3004
3005
3006
shell:  """
    cd '{params.lja_directory}'

    export CXX=x86_64-conda-linux-gnu-g++
    export CC=x86_64-conda-linux-gnu-gcc

    cmake .
    make -j {threads}
    """
SnakeMake From line 2998 of master/Snakefile
3013
3014
3015
3016
3017
3018
3019
3020
3021
shell:  """
    mkdir -p '{params.external_software_dir}'
    cd '{params.external_software_dir}'

    rm -rf hifiasm
    git clone https://github.com/sebschmi/hifiasm
    cd hifiasm
    git checkout c914c80547d8cdcfef392291831d6b2fb3b011f5
    """
3031
3032
3033
3034
3035
3036
shell:  """
    cd '{params.hifiasm_directory}'

    make CXX=x86_64-conda-linux-gnu-g++ CC=x86_64-conda-linux-gnu-gcc CXXFLAGS=-I${{CONDA_PREFIX}}/include -j {threads}
    #make -j {threads}
    """
SnakeMake From line 3031 of master/Snakefile
3045
3046
3047
3048
3049
3050
3051
3052
3053
3054
3055
shell:  """
    mkdir -p '{params.external_software_dir}'
    cd '{params.external_software_dir}'

    rm -rf homopolymer-compress-rs
    git clone https://github.com/sebschmi/homopolymer-compress-rs.git
    cd homopolymer-compress-rs
    git checkout d94145fb8fa2868876bccb46dd80c12d3b17c724

    cargo fetch
    """
3066
3067
3068
3069
shell:  """
    cd '{params.homopolymer_compress_rs_dir}'
    cargo build --offline --release -j {threads}
    """
3078
3079
3080
3081
3082
3083
3084
3085
3086
3087
3088
shell:  """
    mkdir -p '{params.external_software_dir}'
    cd '{params.external_software_dir}'

    rm -rf wtdbg2-homopolymer-decompression
    git clone https://github.com/sebschmi/wtdbg2-homopolymer-decompression.git
    cd wtdbg2-homopolymer-decompression
    git checkout 3bec6c0b751a70d53312b359171b9a576f67ebb6

    cargo fetch
    """
3099
3100
3101
3102
shell:  """
    cd '{params.wtdbg2_homopolymer_decompression_dir}'
    cargo build --offline --release -j {threads}
    """
3111
3112
3113
3114
3115
3116
3117
3118
3119
3120
3121
shell:  """
    mkdir -p '{params.external_software_dir}'
    cd '{params.external_software_dir}'

    rm -rf HI.SIM
    git clone https://github.com/sebschmi/HI.SIM.git
    cd HI.SIM
    git checkout 734c25c4df3775761ca8920a7d2d57dc44cac09c

    sed -i 's:CFLAGS = :CFLAGS = -I${{CONDA_PREFIX}}/include -L${{CONDA_PREFIX}}/lib :g' Makefile
    """
SnakeMake From line 3111 of master/Snakefile
3132
3133
3134
3135
shell:  """
    cd '{params.hisim_dir}'
    make CC=x86_64-conda-linux-gnu-gcc -j {threads} all
    """
SnakeMake From line 3132 of master/Snakefile
3144
3145
3146
3147
3148
3149
3150
3151
3152
3153
3154
3155
3156
shell:  """
    mkdir -p '{params.external_software_dir}'
    cd '{params.external_software_dir}'

    rm -rf FASTK
    git clone https://github.com/thegenemyers/FASTK.git
    cd FASTK
    git checkout 4604bfcdfd9251d05b27fbd5aef38187e9a9c9ad

    sed -i 's:CFLAGS = :CFLAGS = -I${{CONDA_PREFIX}}/include -L${{CONDA_PREFIX}}/lib :g' Makefile
    sed -i 's:CFLAGS   = :CFLAGS = -I${{CONDA_PREFIX}}/include :g' HTSLIB/Makefile
    sed -i 's:LDFLAGS  = :LDFLAGS = -L${{CONDA_PREFIX}}/lib :g' HTSLIB/Makefile
    """
SnakeMake From line 3144 of master/Snakefile
3167
3168
3169
3170
3171
3172
shell:  """
    cd '{params.fastk_dir}'
    make CC=x86_64-conda-linux-gnu-gcc -j {threads} deflate.lib
    make CC=x86_64-conda-linux-gnu-gcc -j {threads} libhts.a
    make CC=x86_64-conda-linux-gnu-gcc -j {threads} all
    """
SnakeMake From line 3167 of master/Snakefile
3204
3205
3206
3207
shell: """
    mkdir -p data/reports
    rsync --verbose --recursive --no-relative --include="*/" --include="report.pdf" --include="aggregated-report.pdf" --exclude="*" turso:'/proj/sebschmi/git/practical-omnitigs/data/reports/' data/reports
    """
SnakeMake From line 3204 of master/Snakefile
3211
3212
3213
3214
shell: """
    mkdir -p data/reports
    rsync --verbose --recursive --no-relative --include="*/" --include="report.pdf" --include="aggregated-report.pdf" --exclude="*" tammi:'/abga/work/sebschmi/practical-omnitigs/reports/' data/reports
    """
SnakeMake From line 3211 of master/Snakefile
ShowHide 84 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/algbio/practical-omnitigs
Name: practical-omnitigs
Version: v0.4.3
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: BSD 2-Clause "Simplified" License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...