nomis_pipeline

public 1yr ago Version: NatComm 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

nomis_pipeline

About

Repository containing workflows for IMP3 downstream analyses
Related project(s): NOMIS

Setup

Conda

Conda user guide

# install miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod u+x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh # follow the instructions

Getting the repository including sub-modules

git clone --recurse-submodules ssh://[email protected]:8022/susheel.busi/nomis_pipeline.git

Create the main snakemake environment

# create venv
conda env create -f requirements.yml -n "snakemake"

Dependencies

The successful completion requires tools created by others

Notes:

Dependencies are included as submodules where possible
However, installation issues may persist
If so, check the respective repositories listed

How to run

The workflow can be launched using one of the option as follows

./config/sbatch.sh

(or)

CORES=48 snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --conda-prefix ${CONDA_PREFIX}/pipeline --cores $CORES -rpn

(or)

Note: For running on esb-compute-01 or litcrit adjust the CORES as needed to prevent MANTIS from spawning too many workers and launch as below

CORES=24 snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --conda-prefix ${CONDA_PREFIX}/pipeline --cores $CORES -rpn

Configs

All config files are stored in the folder config/ :

Workflows

imp workflow : setup the folders required for running IMP3 on each sample
viruses workflow : run VIBRANT and vCONTACT2 on assemblies, including CheckV on vibrant_output
eukaryotes workflow : running EUKUlele on assemblies
bins workflow : collects all bins together for taxonomy analyses
taxonomy workflow : run GTDBtk and CheckM on the bins
functions workflow : runs METABOLIC, MAGICCAVE and FUNCS analyses
mantis workflow : runs MANTIS on the bins explicitly
euk_bin workflow : performs coassembly specifically for eukaryotes (EukRep) and runs binning with CONCOCT
coassembly_binning : performs coassembly for all samples and subsequent binning
misc workflow : runs gRodon, antismash. To be implemented PopCOGent,and potentially anvi'o coassembly/binning.

Relevant paremters which have to be changed are listed for each workflow and config file. Parameters defining system-relevant settings are not listed but should be also be changed if required, e.g. number of threads used by certain tools etc.

STEPS

The workflow is setup in multiple steps. Prior to running change the following

config: config/
- config.yaml :
  - change steps

Options:

imp
viruses
eukaryotes
bins
taxonomy
functions
mantis
euk_bin
coassembly_binning
misc

IMPORTANT NOTE: only the imp step should be run first, followed by launching IMP3 outside of this pipeline. Subsequent other STEPS can be run

Launching `IMP3`

Per-sample IMP3 can be launched as follows:

chmod -R 775 ${SAMPLE} # adding permissions
cd ${SAMPLE}
sbatch ./launchIMP.sh # on IRIS

imp workflow

Download raw data required for the analysis.

config: config/
- config.yaml :
  - change work_dir
- sbatch.sh
  - change SMK_ENV
  - if not using slurm to submit jobs remove --cluster-config , --cluster from the snakemake CMD
- slurm.yaml (only relevant if using slurm for job submission)
workflow: workflow/

Prior to running the imp workflow make the following adjustments.

IMP_config.yaml:
- workflow/notes/IMP_config.yaml
  - change Metagenomics
run_threads:
- workflow/notes/runIMP.sh
  - change: threads
launch_threads:
- workflow/notes/runIMP.sh
  - change: -n8

IMPORTANT Note:

This above workflow should be run first, followed by launching IMP3 outside of this pipeline and then subsequent STEPS can be run

Main workflow

Main analysis workflow: given SR FASTQ files, run all the steps to generate required output. This includes:

setting up folders for IMP
viral and eukaryotic annotations
functional analyses and
taxonomic analyses (optional)

The workflow is run per sample and might require a couple of days to run depending on the sample, used configuration and available computational resources. Note that the workflow will create additional output files not necessarily required to re-create the figures shown in the manuscript.

config:
- per sample
- config/<sample>/config.yaml
  - change all path parameters (not all databases are required, see above)
- config/<sample>/sbatch.yaml
  - change SMK_ENV
  - if not using slurm to submit jobs remove --cluster-config , --cluster from the snakemake CMD
- config/<sample>/slurm.yaml (only relevant if using slurm for job submission)
workflow: workflow/

Report workflow (2021-05-26 15:54:59: NOT implemented)

This workflow creates various summary files, plots and an HTML report for a sample using the output of the main workflow.

Note: How the metaP peptide/protein reports were generated from raw metaP data is described in notes/gdb_metap.md .

config:
- sample configs used for the main workflow
workflow: workflow_report/

To execute this workflow for all samples:

./config/reports.sh "YourEnvName" "WhereToCreateCondEnvs"

Figures workflow (2021-05-26 15:54:52: NOT implemented)

Re-create figures (and tables) used in the manuscript. This workflow should be only run after running the main workflow and report workflow for all samples.

config: config/fig.yaml
- change work_dir
- change paths for all samples in samples
workflow: workflow_figures/

conda activate "YourEnvName"
snakemake -s workflow_figures/Snakefile --cores 1 --configfile config/fig.yaml --use-conda --conda-prefix "WhereToCreateCondEnvs" -rpn # dry-run

Notes

Notes for manual/additional analyses done using the generated data.

Code Snippets

shell:
    "ln -vs {input} {output}"

SnakeMake From line 24 of rules/bins.smk

shell:
    "ln -vs {input} {output}"

SnakeMake From line 35 of rules/bins.smk

shell:
    "for fname in {input} ; do echo $(basename -s \".contigs.fa\" \"${{fname}}\") ; done > {output}"

SnakeMake From line 51 of rules/bins.smk

shell:
    "cat {input} > {output[0]}"

SnakeMake From line 61 of rules/bins.smk

shell:
    "(date && cat {input.read1} > {output.or1} && cat {input.read2} > {output.or2} && date) &> >(tee {log})"

SnakeMake From line 33 of rules/coassembly_binning.smk

shell:
    "(date && clumpify.sh in={input.r1} in2={input.r2} out={output.odup1} out2={output.odup2} dupedist={config[clumpify][dupedist]} dedupe=t optical=t threads={threads} groups={config[clumpify][groups]} -Xmx{config[clumpify][memory]} && date) &> >(tee {log})"

SnakeMake From line 51 of rules/coassembly_binning.smk

shell:
    "(date && megahit -1 {input.sr1} -2 {input.sr2} --kmin-1pass -m 0.9 --k-list 27,37,47,57,67,77,87 --min-contig-len 1000 -t {threads} -o $(dirname {output})/tmp && "
    "cd $(dirname {output}) && "
    "rsync -avP tmp/ . && "
    "ln -sf final.contigs.fa $(basename {output}) && "
    "rm -rf tmp/ && "
    "date) &> >(tee {log})"

SnakeMake MEGAHIT From line 72 of rules/coassembly_binning.smk

shell:
    "(date && coverm contig -1 {input.r1} -2 {input.r2} --reference {input.fa} --output-file {output} -t {threads} && date) &> {log}"

SnakeMake CoverM From line 100 of rules/coassembly_binning.smk

shell:
    "(date && tail -n +2 {input} > {output} && date) &> {log}"

SnakeMake From line 112 of rules/coassembly_binning.smk

shell:
    "(date && bwa index {input} -p {params.idx_prefix} && date) &> {log}"

SnakeMake BWA From line 131 of rules/coassembly_binning.smk

shell:
    "(date && "
    "bwa mem -t {threads} {params.idx_prefix} {input.r1} {input.r2} | "
    "samtools view -@ {threads} -SbT {input.asm} | "
    "samtools sort -@ {threads} -m {params.chunk_size} -T {params.bam_prefix} -o {output} && "
    "date) &> {log}"

SnakeMake SAMtools BWA From line 155 of rules/coassembly_binning.smk

shell:
    "(date && "
    "jgi_summarize_bam_contig_depths --outputDepth {output.depth} --pairedContigs {output.paired} {input} && date) &> {log}"

SnakeMake From line 176 of rules/coassembly_binning.smk

shell:
    "(date && export PATH=$PATH:{config[maxbin2][perl]} && "
    "run_MaxBin.pl -thread {threads} -contig {input.fa} -out $(dirname {output})/coassembly -abund {input.cov} -min_contig_length {config[maxbin2][min_length]} && date) &> {log}"

SnakeMake SqueezeMeta MaxBin From line 196 of rules/coassembly_binning.smk

shell:
    "(date && metabat2 -i {input.fa} -a {input.cov} -o $(dirname {output})/coassembly -t {threads} -m {config[metabat2][min_length]} -v --unbinned --cvExt && date) &> {log}"

SnakeMake MetaBAT 2 From line 214 of rules/coassembly_binning.smk

shell:
    "(date && scripts/Fasta_to_Scaffolds2Bin.sh -i $(dirname {input.max}) -e fa > {output.maxscaf} && "
    "scripts/Fasta_to_Scaffolds2Bin.sh -i $(dirname {input.met}) -e fa > {output.metscaf} && date) &> {log}"

SnakeMake From line 232 of rules/coassembly_binning.smk

shell:
    "(date && export PATH=$PATH:{config[dastool][path]} && "
    "export PATH=$PATH:{config[dastool][src]} && "
    "DAS_Tool -i {input.max},{input.met} -c {input.fa} -o $(dirname {output.DIR}) --score_threshold {config[dastool][score]} --search_engine diamond -l maxbin2,metabat2 --write_bins 1 --write_bin_evals 1 --threads {threads} --db_directory {params.db} --create_plots 1 && "
    "touch {output.DUMMY} && date) &> {log}"

SnakeMake Diamond DAS From line 254 of rules/coassembly_binning.smk

shell:
    "(date && ln -vs {input} {output} && date) &> >(tee {log})"

SnakeMake From line 28 of rules/eukaryotes.smk

shell:
    "(date && EUKulele --sample_dir $(dirname {input}) -o {output[0]} -m mets && date) &> >(tee {log})"

SnakeMake EUKulele From line 44 of rules/eukaryotes.smk

run:
    tax=pd.read_csv(params.tax, header=0, sep="\t", index_col=0)
    cov=pd.read_csv(input.cov, header=None, sep="\t")
    cov.columns=['transcript_name', 'coverage']

    # keeping only rows that contain 'Eukary' in the 'full_classification' column
    euks=tax[tax['full_classification'].str.contains("Eukary", na=False)].drop(['counts'], axis=1)
    # keeping only rows that have at least 70% 'max_pid'
    filt_euks=euks.query("max_pid >=70")

    # merging taxonomy with coverage
    merged=filt_euks.merge(cov, how="left", on="transcript_name")
    filt_merged=merged[['full_classification','classification', 'coverage']]

    # Grouping same taxonomy and getting sum of coverage
    final=filt_merged.groupby(['full_classification','classification'], as_index=False)['coverage'].sum()

    # writing to file
    final.to_csv(output[0], index=None, sep="\t")

SnakeMake From line 58 of rules/eukaryotes.smk

run:
    # Collecting all files in folder 
    directory=os.path.dirname(input[0])
    os.chdir(directory)

    # verify the path using getcwd() 
    cwd = os.getcwd() 

    # print the current directory 
    print("Current working directory is:", cwd) 

    mylist=[f for f in glob.glob("*.txt")]
    mylist

    # making individual dataframes for each file
    dataframes= [ pd.read_csv( f, header=0, sep="\t", usecols=["full_classification", "coverage"]) for f in mylist ] # add arguments as necessary to the read_csv method

    # Merging all files based on common column
    merged=reduce(lambda left,right: pd.merge(left,right,on='full_classification', how='outer'), dataframes)

    # Giving appropriate column names
    names=['full_classification']+mylist
    new_cols=list(map(lambda x: x.replace('_eukaryotes.txt',''),names))
    merged.columns=new_cols

    # checking if any values are "NA"
    merged.isnull().values.any()
    # if "NA" run the following
    merged.fillna('', inplace=True)

    # Removing rows with all zeroes (0 or 0.0)
    merged.set_index('full_classification', inplace=True)  # first to make first column as rownames
    edited=merged.loc[~(merged==0).all(axis=1)]

    # Writing file without zeroes
    edited.to_csv(output[0], sep='\t', index=True, header=True)

SnakeMake From line 87 of rules/eukaryotes.smk

run:
    tax=pd.read_csv(input.tax, header=0, sep="\t", index_col=0)
    cov=pd.read_csv(input.cov, header=None, sep="\t")
    cov.columns=['transcript_name', 'coverage']

    # keeping only rows that contain 'Eukary' in the 'full_classification' column
    euks=tax.drop(['counts'], axis=1)

    # merging taxonomy with coverage
    merged=euks.merge(cov, how="left", on="transcript_name")
    filt_merged=merged[['full_classification','classification', 'coverage']]

    # Grouping same taxonomy and getting sum of coverage
    final=filt_merged.groupby(['full_classification','classification'], as_index=False)['coverage'].sum()

    # writing to file
    final.to_csv(output[0], index=None, sep="\t")

SnakeMake From line 135 of rules/eukaryotes.smk

run:
    # Collecting all files in folder
    directory=os.path.dirname(input[0])
    os.chdir(directory)

    # verify the path using getcwd()
    cwd = os.getcwd()

    # print the current directory
    print("Current working directory is:", cwd)

    mylist=[f for f in glob.glob("*ALL.txt")]
    mylist

    # making individual dataframes for each file
    dataframes= [ pd.read_csv( f, sep="\t", usecols=['full_classification', 'coverage']) for f in mylist ] # add arguments as necessary to the read_csv method

    # Merging all files based on common column
    merged=reduce(lambda left,right: pd.merge(left,right,on='full_classification', how='outer'), dataframes)

    # Giving appropriate column names
    names=['full_classification']+mylist
    new_cols=list(map(lambda x: x.replace('_eukulele_all.txt',''),names))
    merged.columns=new_cols

    # checking if any values are "NA"
    merged.isnull().values.any()
    # if "NA" run the following
    merged.fillna('', inplace=True)

    # Removing rows with all zeroes (0 or 0.0)
    merged.set_index('full_classification', inplace=True)  # first to make first column as rownames
    edited=merged.loc[~(merged==0).all(axis=1)]

    # Writing file without zeroes

SnakeMake From line 162 of rules/eukaryotes.smk

shell:
    "(date && clumpify.sh in={input.r1} in2={input.r2} out={output.odup1} out2={output.odup2} dupedist={config[clumpify][dupedist]} dedupe=t optical=t threads={threads} groups={config[clumpify][groups]} -Xmx{config[clumpify][memory]} && date) &> >(tee {log})"

SnakeMake From line 31 of rules/euk_bin.smk

shell:
    "(date && kraken2 --threads {threads} --db {input.database} --use-names --confidence 0.5 --paired {input.dedup1} {input.dedup2} --gzip-compressed --output {output.summary} --report {output.rep} && date) &> >(tee {log})"

SnakeMake kraken2 From line 55 of rules/euk_bin.smk

shell:
    "(date && awk '{{if ($3 ~ /unclassified/ || $3 ~ /Eukaryota/) print $2}}' {input} > {output} && date) &> >(tee {log})"

SnakeMake From line 67 of rules/euk_bin.smk

shell:
    "(date && seqtk subseq {input.dedup1} {input.ids} > {output.ex1} && seqtk subseq {input.dedup2} {input.ids} > {output.ex2} && date) &> >(tee {log})"

SnakeMake seqtk subSeq From line 84 of rules/euk_bin.smk

shell:
    "(date && cat {input.read1} > {output.or1} && cat {input.read2} > {output.or2} && date) &> >(tee {log})"

SnakeMake From line 99 of rules/euk_bin.smk

shell:
    "(date && clumpify.sh in={input.r1} in2={input.r2} out={output.odup1} out2={output.odup2} dupedist={config[clumpify][dupedist]} dedupe=t optical=t threads={threads} groups={config[clumpify][groups]} -Xmx{config[clumpify][memory]} && date) &> >(tee {log})"

SnakeMake From line 117 of rules/euk_bin.smk

shell:
    "(date && megahit -1 {input.sr1} -2 {input.sr2} --kmin-1pass -m 0.9 --k-list 27,37,47,57,67,77,87 --min-contig-len 1000 -t {threads} -o $(dirname {output})/tmp && "
    "cd $(dirname {output}) && "
    "rsync -avP tmp/ . && "
    "ln -sf final.contigs.fa $(basename {output}) && "
    "rm -rf tmp/ && "
    "date) &> >(tee {log})"

SnakeMake MEGAHIT From line 138 of rules/euk_bin.smk

shell:
    "(date && EukRep -i {input} -o {output} --min 2000 -m strict && date)"

SnakeMake eukrep From line 161 of rules/euk_bin.smk

shell:
    "(date && TMPDIR={RESULTS_DIR} coverm make -o {output} -t {threads} -r {input.ref} -c {input.read1} {input.read2} && date) &> >(tee {log})"

SnakeMake CoverM From line 179 of rules/euk_bin.smk

shell:
    """
    cut_up_fasta.py {input.contigs} -c 10000 -o 0 --merge_last -b contigs_10K.bed > {output.contigs_cut}
    concoct_coverage_table.py contigs_10K.bed {input.bam}/*bam > {output.coverage}
    """

SnakeMake CONCOCT From line 195 of rules/euk_bin.smk

shell:
    """
    concoct --coverage_file {input.coverage} --composition_file {input.contigs_cut} -t {threads} -b {output}
    """

SnakeMake CONCOCT From line 215 of rules/euk_bin.smk

shell:
    """
    merge_cutup_clustering.py {input.clustering}/clustering_gt1000.csv > {input.clustering}/clustering_merged.csv
    extract_fasta_bins.py {input.contigs} {input.clustering}/clustering_merged.csv --output_path {output}        
    """

SnakeMake From line 232 of rules/euk_bin.smk

    shell:
        "(date && "
        "while read -r line; do ls $(dirname {input})/*.fa | grep -o \"$line\" ; done < {output.sample} > {output.tmpfile} && "
        "sed 's@^@/work/projects/nomis/metaG_JULY_2020/IMP3/@g' {output.tmpfile} | "
        "sed 's@$@/run1/Preprocessing/mg.r1.preprocessed.fq@g' | "
        "awk -F, '{{print $0=$1\",\"$1}}' | awk 'BEGIN{{FS=OFS=\",\"}} {{gsub(\"r1\", \"r2\", $2)}} 1' | "
        "sed $'1 i\\\\\\n# Read pairs:' {output.reads}"     # using forward-slashes to get `\\\n`

rule metabolic:
    input:
        fa=os.path.join(RESULTS_DIR, "bins/bin_collection.done"),
        reads=rules.prep_metabolic.output
    output:
        directory(os.path.join(RESULTS_DIR, "metabolic_output"))
    log:
        os.path.join(RESULTS_DIR, "logs/metabolic.log")
    conda:
        os.path.join(ENV_DIR, "metabolic.yaml")
    params:
        gtdbtk=config["metabolic"]["db"],
        metabolic=config["metabolic"]["directory"]
    threads:
        config["metabolic"]["threads"]
    message:
        "Running metabolic for all MAGs"

SnakeMake From line 38 of rules/functions.smk

shell:
    "(date && "
    "export GTDBTK_DATA_PATH={params.gtdbtk} && "
    "export PERL5LIB && export PERL_LOCAL_LIB_ROOT && export PERL_MB_OPT && export PERL_MM_OPT && "
    """env PERL5LIB="" PERL_LOCAL_LIB_ROOT="" PERL_MM_OPT="" PERL_MB_OPT="" cpanm Array::Split && """
    "perl {params.metabolic}/METABOLIC-C.pl -t {threads} -in-gn $(dirname {input.fa}) -r {input.reads} -o {output} && "
    "date) &> >(tee {log})"

SnakeMake From line 63 of rules/functions.smk

run:
    bin=pd.read_csv(input[0], sep="\t")
    bin_edited=bin[['selected_by_DASTool', 'classification']]       # selecting columns

SnakeMake From line 84 of rules/functions.smk

run:
    kegg=pd.read_csv(input[0], sep="\t", skiprows=1)
    kegg_edited=kegg[['Geneid', 'Chr']]
    kegg_edited.rename(columns = {'Geneid': 'KEGG', 'Chr': 'Contig'}, inplace=True)
    kegg_contigs=(kegg_edited.assign(Contig = kegg_edited['Contig'].str.split(';')).explode('Contig').reset_index(drop=True))
    kegg_contigs=kegg_contigs.reindex(['Contig','KEGG'], axis=1)
    kegg_contigs.to_csv(output[0], sep="\t", index=False)

SnakeMake From line 101 of rules/functions.smk

run:
    kegg=pd.read_csv(input[0], sep="\t", header=0)
    cov=pd.read_csv(input[1], sep="\t", header=None)
    cov.rename(columns={0: 'Contig', 1: 'Coverage'}, inplace=True)

    length=pd.read_csv(input[2], sep="\t", header=None)
    length.rename(columns={0: 'Contig', 1: 'Length'}, inplace=True)

    tmp=pd.merge(kegg, cov, on='Contig')
    all_merged=pd.merge(tmp, length, on='Contig')
    all_merged.to_csv(output[0], sep="\t", index=False)

SnakeMake From line 120 of rules/functions.smk

run:
    scaffold=pd.read_csv(input[0], sep="\t", header=None)
    scaffold.columns=['Contig', 'Bin']
    scaffold.to_csv(output[0], sep="\t", index=False)

SnakeMake From line 141 of rules/functions.smk

run:
    opened=[]
    for ifile in input:
      df=pd.read_csv(ifile, index_col=None, sep="\t", header=0)
      df['Sample']=re.sub("_gtdbtk.txt", "", os.path.basename(ifile))
      df = df.reindex(['Sample','Bin','Taxa'], axis=1)
      opened.append(df)

    frame=pd.concat(opened, axis=0, ignore_index=True)
    frame.to_csv(output[0], sep="\t", index=False)

SnakeMake From line 154 of rules/functions.smk

run:
    opened=[]
    for ifile in input:
      df=pd.read_csv(ifile, index_col=None, sep="\t", header=0)

SnakeMake From line 172 of rules/functions.smk

run:
    opened=[]
    for ifile in input:
      df=pd.read_csv(ifile, index_col=None, sep="\t", header=0)
      opened.append(df)

    frame=pd.concat(opened, axis=0, ignore_index=True)
    frame.to_csv(output[0], sep="\t", index=False)

SnakeMake From line 190 of rules/functions.smk

run:
    opened=[]
    for ifile in input:
      df=pd.read_csv(ifile, index_col=None, sep="\t", header=0)

SnakeMake From line 206 of rules/functions.smk

script:
    "merge_funcs.R"

SnakeMake From line 230 of rules/functions.smk

shell:
    "cat {params} | awk '{{print $1\",\"$2}}' | sed '@^user@d' | sed '[email protected]@@g' | sed '[email protected]_sub.contigs@@g' | awk '!visited[$0]++' > {output}"

SnakeMake From line 245 of rules/functions.smk

shell:
    "(date && summarize-metabolism --input $(dirname {input.bins}) --output {output.sum} --metadata {input.meta} --summary {output.sum}/results/summarize_metabolism.csv --heatmap {output.sum}/results/summarize_metabolism.pdf --aggregate ON --plotting ON && summarize-metabolism --input $(dirname {input.bins}) --output {output.indiv} --metadata {input.meta} --summary {output.indiv}/results/individual_metabolism.csv --heatmap {output.indiv}/results/individual_metabolism.pdf --plotting ON && date) &> >(tee {log})"

SnakeMake From line 261 of rules/functions.smk

shell:
    "script=$(realpath {params.script}) && cd {params.path} && ${{script}}"

SnakeMake From line 278 of rules/functions.smk

shell:
    "(date && export PATH=$PATH:{params.path} && "
    "MagicLamp.py LithoGenie -bin_dir $(dirname {input.bins}) -bin_ext fa -out {output} -t {threads} --norm && date) &> >(tee {log})" 

SnakeMake From line 297 of rules/functions.smk

shell:
    "(date && ln -vs {input.in1} {output.fout1} && "
    "ln -vs {input.in2} {output.fout2} && date) &> >(tee {log})"

SnakeMake From line 29 of rules/imp.smk

shell:
    "(date && cp -v {input.config} {output.tout1} && "
    "cp -v {input.launcher} {output.tout2} && "
    "cp -v {input.runfile} {output.tout3} && "
    "sed -i 's/\"\$sample\"/{wildcards.sample}/g' {output.tout1} && "
    "sed -i 's/\"\$sample\"/{wildcards.sample}/g' {output.tout2} && "
    "sed -i 's/\"\$sample\"/{wildcards.sample}/g' {output.tout3} && date) &> >(tee {log})"

SnakeMake From line 42 of rules/imp.smk

shell:
    "ln -vs {input} {output}"

SnakeMake From line 28 of rules/mantis.smk

shell:
    "ln -vs {input} {output}"

SnakeMake From line 39 of rules/mantis.smk

shell:
    "(date && prokka --outdir $(dirname {output.FAA}) {input} --cpus {threads} --force && date) &> >(tee {log})"

SnakeMake metaprokka From line 63 of rules/mantis.smk

shell:
    "for fname in {input.txt} ; do echo echo \"${{fname}}\"\"    \"$(echo {input.FAA}) ; done > {output}"

SnakeMake From line 74 of rules/mantis.smk

run:
    with open(output[0], "w") as ofile:
        # default HMMs
        for hmm_name, hmm_path in config["mantis"]["default"].items():
            ofile.write("%s=%s\n" % (hmm_name, hmm_path))
        # custom HMMs
        for hmm_path in config["mantis"]["custom"]:
            ofile.write("custom_hmm=%s\n" % hmm_path)
        # weights
        for weights_name, weights_value in config["mantis"]["weights"].items():
            ofile.write("%s=%f\n" % (weights_name, weights_value))

SnakeMake MANTIS From line 83 of rules/mantis.smk

shell:
    "(date && python {config[mantis][path]}/ run_mantis -t {input.FAA} --output_folder $(dirname {output}) --mantis_config {input.config} --hmmer_threads {params.cores} --cores {threads} --memory {config[mantis][single_mem]} --kegg_matrix && date) &> >(tee {log})"

SnakeMake MANTIS From line 115 of rules/mantis.smk

shell:
    "(date && antismash --cpus {threads} --genefinding-tool prodigal --fullhmmer --pfam2go --asf --cb-knownclusters --clusterhmmer --cf-create-clusters {input} --output-dir $(dirname {output}) && date) &> >(tee {log})"

SnakeMake prodigal antiSMASH From line 36 of rules/misc.smk

shell:
    """(date && sed -n '/##FASTA/q;p' {input} | awk '$3=="CDS"' | awk '{{print $9}}' | awk 'gsub(";.*","")' | awk 'gsub("ID=","")' > {output} && date) &> >(tee {log})"""

SnakeMake From line 65 of rules/misc.smk

shell:
    "(date && export GTDBTK_DATA_PATH={params} && gtdbtk classify_wf --cpus {threads} -x fa --genome_dir $(dirname {input}) --out_dir {output} && date) &> >(tee {log})"

SnakeMake gtdbtk From line 34 of rules/taxonomy.smk

shell:
    "(date && checkm lineage_wf -r -t {threads} -x fa $(dirname {input}) {output} && date) &> >(tee {log})"

SnakeMake CheckM From line 51 of rules/taxonomy.smk

shell:
    "(date && python3 ./vibrant/VIBRANT/VIBRANT_run.py -t {threads} -i {input} -folder $(dirname $(dirname {output.viout1})) && date) &> >(tee {log})"

SnakeMake From line 42 of rules/viruses.smk

shell:
    "(date && python3 {config[convert_files][simplify]} {input} && "
    "export PATH='/scratch/users/sbusi/tools/miniconda3/envs/vcontact2/bin:$PATH' && "
    "vcontact2_gene2genome -p {output.tout1} -o {output.tout2} -s '{config[convert_files][type]}') &> >(tee {log})"

SnakeMake vcontact2 From line 59 of rules/viruses.smk

shell:
    "(date && export PATH=$PATH:'/scratch/users/sbusi/tools/miniconda3/envs/vcontact2/bin' && "
    "/scratch/users/sbusi/tools/miniconda3/envs/vcontact2/bin/vcontact2 --force-overwrite --raw-proteins {input.v1} --rel-mode 'Diamond' --proteins-fp {input.v2} --db 'ProkaryoticViralRefSeq94-Merged' --pcs-mode MCL --vcs-mode ClusterONE --c1-bin /home/users/sbusi/apps/vcontact2/cluster_one-1.0.jar --output-dir {output.cout5} && date) &> >(tee {log})"

SnakeMake vcontact2 From line 82 of rules/viruses.smk

shell:
    "(date && kraken2 --threads {threads} --db {config[kraken2][db]} --confidence 0.75 {input} --output {output.summary} --report {output.report} && date) &> >(tee {log})"

SnakeMake kraken2 From line 121 of rules/viruses.smk

shell:
    "(date && kaiju -z {threads} -t {config[kaiju][db]}/{params.nodes} -f {config[kaiju][db]}/{params.fmi} -i {input.fasta} -o {output} && date) &> >(tee {log})"

SnakeMake Kaiju From line 142 of rules/viruses.smk

shell:
    "(date && kaiju2table -e -t {config[kaiju][db]}/{params.nodes} -n {config[kaiju][db]}/{params.names} -r {config[kaiju][rank]} -o {output} {input.files} && date) &> >(tee {log})"

SnakeMake Kaiju From line 162 of rules/viruses.smk

shell:
    "(date && checkv end_to_end -d {config[checkv][db]} {input} $(dirname {output}) -t {threads} && date) &> >(tee {log})"

SnakeMake From line 178 of rules/viruses.smk

shell:
    "(date && antismash --cpus {threads} --genefinding-tool none --genefinding-gff3 {input.GFF} --fullhmmer --pfam2go --asf --cb-knownclusters --clusterhmmer --cf-create-clusters {input.FA} --output-dir $(dirname {output}) && date) &> >(tee {log})"