nomis_pipeline

public public 1yr ago Version: NatComm 0 bookmarks

nomis_pipeline

About

  • Repository containing workflows for IMP3 downstream analyses

  • Related project(s): NOMIS

Setup

Conda

Conda user guide

# install miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod u+x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh # follow the instructions

Getting the repository including sub-modules

git clone --recurse-submodules ssh://[email protected]:8022/susheel.busi/nomis_pipeline.git

Create the main snakemake environment

# create venv
conda env create -f requirements.yml -n "snakemake"

Dependencies

The successful completion requires tools created by others

  1. MANTIS

  2. MAGICCAVE/MAGICLAMP

Notes:

  • Dependencies are included as submodules where possible

  • However, installation issues may persist

  • If so, check the respective repositories listed

How to run

The workflow can be launched using one of the option as follows

./config/sbatch.sh

(or)

CORES=48 snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --conda-prefix ${CONDA_PREFIX}/pipeline --cores $CORES -rpn

(or)

Note: For running on esb-compute-01 or litcrit adjust the CORES as needed to prevent MANTIS from spawning too many workers and launch as below

CORES=24 snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --conda-prefix ${CONDA_PREFIX}/pipeline --cores $CORES -rpn

Configs

All config files are stored in the folder config/ :

Workflows

  1. imp workflow : setup the folders required for running IMP3 on each sample

  2. viruses workflow : run VIBRANT and vCONTACT2 on assemblies, including CheckV on vibrant_output

  3. eukaryotes workflow : running EUKUlele on assemblies

  4. bins workflow : collects all bins together for taxonomy analyses

  5. taxonomy workflow : run GTDBtk and CheckM on the bins

  6. functions workflow : runs METABOLIC, MAGICCAVE and FUNCS analyses

  7. mantis workflow : runs MANTIS on the bins explicitly

  8. euk_bin workflow : performs coassembly specifically for eukaryotes (EukRep) and runs binning with CONCOCT

  9. coassembly_binning : performs coassembly for all samples and subsequent binning

  10. misc workflow : runs gRodon, antismash. To be implemented PopCOGent,and potentially anvi'o coassembly/binning.

Relevant paremters which have to be changed are listed for each workflow and config file. Parameters defining system-relevant settings are not listed but should be also be changed if required, e.g. number of threads used by certain tools etc.

STEPS

The workflow is setup in multiple steps. Prior to running change the following

  • config: config/

    • config.yaml :

      • change steps

Options:

  1. imp

  2. viruses

  3. eukaryotes

  4. bins

  5. taxonomy

  6. functions

  7. mantis

  8. euk_bin

  9. coassembly_binning

  10. misc

IMPORTANT NOTE: only the imp step should be run first, followed by launching IMP3 outside of this pipeline. Subsequent other STEPS can be run

Launching IMP3

Per-sample IMP3 can be launched as follows:

chmod -R 775 ${SAMPLE} # adding permissions
cd ${SAMPLE}
sbatch ./launchIMP.sh # on IRIS

imp workflow

Download raw data required for the analysis.

  • config: config/

    • config.yaml :

      • change work_dir
    • sbatch.sh

      • change SMK_ENV

      • if not using slurm to submit jobs remove --cluster-config , --cluster from the snakemake CMD

    • slurm.yaml (only relevant if using slurm for job submission)

  • workflow: workflow/

Prior to running the imp workflow make the following adjustments.

  • IMP_config.yaml:

    • workflow/notes/IMP_config.yaml

      • change Metagenomics
  • run_threads:

    • workflow/notes/runIMP.sh

      • change: threads
  • launch_threads:

    • workflow/notes/runIMP.sh

      • change: -n8

IMPORTANT Note:

This above workflow should be run first, followed by launching IMP3 outside of this pipeline and then subsequent STEPS can be run

Main workflow

Main analysis workflow: given SR FASTQ files, run all the steps to generate required output. This includes:

  • setting up folders for IMP

  • viral and eukaryotic annotations

  • functional analyses and

  • taxonomic analyses (optional)

The workflow is run per sample and might require a couple of days to run depending on the sample, used configuration and available computational resources. Note that the workflow will create additional output files not necessarily required to re-create the figures shown in the manuscript.

  • config:

    • per sample

    • config/<sample>/config.yaml

      • change all path parameters (not all databases are required, see above)
    • config/<sample>/sbatch.yaml

      • change SMK_ENV

      • if not using slurm to submit jobs remove --cluster-config , --cluster from the snakemake CMD

    • config/<sample>/slurm.yaml (only relevant if using slurm for job submission)

  • workflow: workflow/

Report workflow (2021-05-26 15:54:59: NOT implemented)

This workflow creates various summary files, plots and an HTML report for a sample using the output of the main workflow.

Note: How the metaP peptide/protein reports were generated from raw metaP data is described in notes/gdb_metap.md .

  • config:

    • sample configs used for the main workflow
  • workflow: workflow_report/

To execute this workflow for all samples:

./config/reports.sh "YourEnvName" "WhereToCreateCondEnvs"

Figures workflow (2021-05-26 15:54:52: NOT implemented)

Re-create figures (and tables) used in the manuscript. This workflow should be only run after running the main workflow and report workflow for all samples.

  • config: config/fig.yaml

    • change work_dir

    • change paths for all samples in samples

  • workflow: workflow_figures/

conda activate "YourEnvName"
snakemake -s workflow_figures/Snakefile --cores 1 --configfile config/fig.yaml --use-conda --conda-prefix "WhereToCreateCondEnvs" -rpn # dry-run

Notes

Notes for manual/additional analyses done using the generated data.

Code Snippets

24
25
shell:
    "ln -vs {input} {output}"
SnakeMake From line 24 of rules/bins.smk
35
36
shell:
    "ln -vs {input} {output}"
SnakeMake From line 35 of rules/bins.smk
51
52
shell:
    "for fname in {input} ; do echo $(basename -s \".contigs.fa\" \"${{fname}}\") ; done > {output}"
SnakeMake From line 51 of rules/bins.smk
61
62
shell:
    "cat {input} > {output[0]}"
SnakeMake From line 61 of rules/bins.smk
33
34
shell:
    "(date && cat {input.read1} > {output.or1} && cat {input.read2} > {output.or2} && date) &> >(tee {log})"
51
52
shell:
    "(date && clumpify.sh in={input.r1} in2={input.r2} out={output.odup1} out2={output.odup2} dupedist={config[clumpify][dupedist]} dedupe=t optical=t threads={threads} groups={config[clumpify][groups]} -Xmx{config[clumpify][memory]} && date) &> >(tee {log})"
72
73
74
75
76
77
78
shell:
    "(date && megahit -1 {input.sr1} -2 {input.sr2} --kmin-1pass -m 0.9 --k-list 27,37,47,57,67,77,87 --min-contig-len 1000 -t {threads} -o $(dirname {output})/tmp && "
    "cd $(dirname {output}) && "
    "rsync -avP tmp/ . && "
    "ln -sf final.contigs.fa $(basename {output}) && "
    "rm -rf tmp/ && "
    "date) &> >(tee {log})"
100
101
shell:
    "(date && coverm contig -1 {input.r1} -2 {input.r2} --reference {input.fa} --output-file {output} -t {threads} && date) &> {log}"
112
113
shell:
    "(date && tail -n +2 {input} > {output} && date) &> {log}"
131
132
shell:
    "(date && bwa index {input} -p {params.idx_prefix} && date) &> {log}"
155
156
157
158
159
160
shell:
    "(date && "
    "bwa mem -t {threads} {params.idx_prefix} {input.r1} {input.r2} | "
    "samtools view -@ {threads} -SbT {input.asm} | "
    "samtools sort -@ {threads} -m {params.chunk_size} -T {params.bam_prefix} -o {output} && "
    "date) &> {log}"
176
177
178
shell:
    "(date && "
    "jgi_summarize_bam_contig_depths --outputDepth {output.depth} --pairedContigs {output.paired} {input} && date) &> {log}"
196
197
198
shell:
    "(date && export PATH=$PATH:{config[maxbin2][perl]} && "
    "run_MaxBin.pl -thread {threads} -contig {input.fa} -out $(dirname {output})/coassembly -abund {input.cov} -min_contig_length {config[maxbin2][min_length]} && date) &> {log}"
214
215
shell:
    "(date && metabat2 -i {input.fa} -a {input.cov} -o $(dirname {output})/coassembly -t {threads} -m {config[metabat2][min_length]} -v --unbinned --cvExt && date) &> {log}"
232
233
234
shell:
    "(date && scripts/Fasta_to_Scaffolds2Bin.sh -i $(dirname {input.max}) -e fa > {output.maxscaf} && "
    "scripts/Fasta_to_Scaffolds2Bin.sh -i $(dirname {input.met}) -e fa > {output.metscaf} && date) &> {log}"
254
255
256
257
258
shell:
    "(date && export PATH=$PATH:{config[dastool][path]} && "
    "export PATH=$PATH:{config[dastool][src]} && "
    "DAS_Tool -i {input.max},{input.met} -c {input.fa} -o $(dirname {output.DIR}) --score_threshold {config[dastool][score]} --search_engine diamond -l maxbin2,metabat2 --write_bins 1 --write_bin_evals 1 --threads {threads} --db_directory {params.db} --create_plots 1 && "
    "touch {output.DUMMY} && date) &> {log}"
28
29
shell:
    "(date && ln -vs {input} {output} && date) &> >(tee {log})"
44
45
shell:
    "(date && EUKulele --sample_dir $(dirname {input}) -o {output[0]} -m mets && date) &> >(tee {log})"
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
run:
    tax=pd.read_csv(params.tax, header=0, sep="\t", index_col=0)
    cov=pd.read_csv(input.cov, header=None, sep="\t")
    cov.columns=['transcript_name', 'coverage']

    # keeping only rows that contain 'Eukary' in the 'full_classification' column
    euks=tax[tax['full_classification'].str.contains("Eukary", na=False)].drop(['counts'], axis=1)
    # keeping only rows that have at least 70% 'max_pid'
    filt_euks=euks.query("max_pid >=70")

    # merging taxonomy with coverage
    merged=filt_euks.merge(cov, how="left", on="transcript_name")
    filt_merged=merged[['full_classification','classification', 'coverage']]

    # Grouping same taxonomy and getting sum of coverage
    final=filt_merged.groupby(['full_classification','classification'], as_index=False)['coverage'].sum()

    # writing to file
    final.to_csv(output[0], index=None, sep="\t")
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
run:
    # Collecting all files in folder 
    directory=os.path.dirname(input[0])
    os.chdir(directory)

    # verify the path using getcwd() 
    cwd = os.getcwd() 

    # print the current directory 
    print("Current working directory is:", cwd) 

    mylist=[f for f in glob.glob("*.txt")]
    mylist

    # making individual dataframes for each file
    dataframes= [ pd.read_csv( f, header=0, sep="\t", usecols=["full_classification", "coverage"]) for f in mylist ] # add arguments as necessary to the read_csv method

    # Merging all files based on common column
    merged=reduce(lambda left,right: pd.merge(left,right,on='full_classification', how='outer'), dataframes)

    # Giving appropriate column names
    names=['full_classification']+mylist
    new_cols=list(map(lambda x: x.replace('_eukaryotes.txt',''),names))
    merged.columns=new_cols

    # checking if any values are "NA"
    merged.isnull().values.any()
    # if "NA" run the following
    merged.fillna('', inplace=True)

    # Removing rows with all zeroes (0 or 0.0)
    merged.set_index('full_classification', inplace=True)  # first to make first column as rownames
    edited=merged.loc[~(merged==0).all(axis=1)]

    # Writing file without zeroes
    edited.to_csv(output[0], sep='\t', index=True, header=True)
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
run:
    tax=pd.read_csv(input.tax, header=0, sep="\t", index_col=0)
    cov=pd.read_csv(input.cov, header=None, sep="\t")
    cov.columns=['transcript_name', 'coverage']

    # keeping only rows that contain 'Eukary' in the 'full_classification' column
    euks=tax.drop(['counts'], axis=1)

    # merging taxonomy with coverage
    merged=euks.merge(cov, how="left", on="transcript_name")
    filt_merged=merged[['full_classification','classification', 'coverage']]

    # Grouping same taxonomy and getting sum of coverage
    final=filt_merged.groupby(['full_classification','classification'], as_index=False)['coverage'].sum()

    # writing to file
    final.to_csv(output[0], index=None, sep="\t")
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
run:
    # Collecting all files in folder
    directory=os.path.dirname(input[0])
    os.chdir(directory)

    # verify the path using getcwd()
    cwd = os.getcwd()

    # print the current directory
    print("Current working directory is:", cwd)

    mylist=[f for f in glob.glob("*ALL.txt")]
    mylist

    # making individual dataframes for each file
    dataframes= [ pd.read_csv( f, sep="\t", usecols=['full_classification', 'coverage']) for f in mylist ] # add arguments as necessary to the read_csv method

    # Merging all files based on common column
    merged=reduce(lambda left,right: pd.merge(left,right,on='full_classification', how='outer'), dataframes)

    # Giving appropriate column names
    names=['full_classification']+mylist
    new_cols=list(map(lambda x: x.replace('_eukulele_all.txt',''),names))
    merged.columns=new_cols

    # checking if any values are "NA"
    merged.isnull().values.any()
    # if "NA" run the following
    merged.fillna('', inplace=True)

    # Removing rows with all zeroes (0 or 0.0)
    merged.set_index('full_classification', inplace=True)  # first to make first column as rownames
    edited=merged.loc[~(merged==0).all(axis=1)]

    # Writing file without zeroes
31
32
shell:
    "(date && clumpify.sh in={input.r1} in2={input.r2} out={output.odup1} out2={output.odup2} dupedist={config[clumpify][dupedist]} dedupe=t optical=t threads={threads} groups={config[clumpify][groups]} -Xmx{config[clumpify][memory]} && date) &> >(tee {log})"
55
56
shell:
    "(date && kraken2 --threads {threads} --db {input.database} --use-names --confidence 0.5 --paired {input.dedup1} {input.dedup2} --gzip-compressed --output {output.summary} --report {output.rep} && date) &> >(tee {log})"
67
68
shell:
    "(date && awk '{{if ($3 ~ /unclassified/ || $3 ~ /Eukaryota/) print $2}}' {input} > {output} && date) &> >(tee {log})"
84
85
shell:
    "(date && seqtk subseq {input.dedup1} {input.ids} > {output.ex1} && seqtk subseq {input.dedup2} {input.ids} > {output.ex2} && date) &> >(tee {log})"
 99
100
shell:
    "(date && cat {input.read1} > {output.or1} && cat {input.read2} > {output.or2} && date) &> >(tee {log})"
117
118
shell:
    "(date && clumpify.sh in={input.r1} in2={input.r2} out={output.odup1} out2={output.odup2} dupedist={config[clumpify][dupedist]} dedupe=t optical=t threads={threads} groups={config[clumpify][groups]} -Xmx{config[clumpify][memory]} && date) &> >(tee {log})"
138
139
140
141
142
143
144
shell:
    "(date && megahit -1 {input.sr1} -2 {input.sr2} --kmin-1pass -m 0.9 --k-list 27,37,47,57,67,77,87 --min-contig-len 1000 -t {threads} -o $(dirname {output})/tmp && "
    "cd $(dirname {output}) && "
    "rsync -avP tmp/ . && "
    "ln -sf final.contigs.fa $(basename {output}) && "
    "rm -rf tmp/ && "
    "date) &> >(tee {log})"
161
162
shell:
    "(date && EukRep -i {input} -o {output} --min 2000 -m strict && date)"
179
180
shell:
    "(date && TMPDIR={RESULTS_DIR} coverm make -o {output} -t {threads} -r {input.ref} -c {input.read1} {input.read2} && date) &> >(tee {log})"
195
196
197
198
199
shell:
    """
    cut_up_fasta.py {input.contigs} -c 10000 -o 0 --merge_last -b contigs_10K.bed > {output.contigs_cut}
    concoct_coverage_table.py contigs_10K.bed {input.bam}/*bam > {output.coverage}
    """
215
216
217
218
shell:
    """
    concoct --coverage_file {input.coverage} --composition_file {input.contigs_cut} -t {threads} -b {output}
    """
232
233
234
235
236
shell:
    """
    merge_cutup_clustering.py {input.clustering}/clustering_gt1000.csv > {input.clustering}/clustering_merged.csv
    extract_fasta_bins.py {input.contigs} {input.clustering}/clustering_merged.csv --output_path {output}        
    """
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
    shell:
        "(date && "
        "while read -r line; do ls $(dirname {input})/*.fa | grep -o \"$line\" ; done < {output.sample} > {output.tmpfile} && "
        "sed 's@^@/work/projects/nomis/metaG_JULY_2020/IMP3/@g' {output.tmpfile} | "
        "sed 's@$@/run1/Preprocessing/mg.r1.preprocessed.fq@g' | "
        "awk -F, '{{print $0=$1\",\"$1}}' | awk 'BEGIN{{FS=OFS=\",\"}} {{gsub(\"r1\", \"r2\", $2)}} 1' | "
        "sed $'1 i\\\\\\n# Read pairs:' {output.reads}"     # using forward-slashes to get `\\\n`

rule metabolic:
    input:
        fa=os.path.join(RESULTS_DIR, "bins/bin_collection.done"),
        reads=rules.prep_metabolic.output
    output:
        directory(os.path.join(RESULTS_DIR, "metabolic_output"))
    log:
        os.path.join(RESULTS_DIR, "logs/metabolic.log")
    conda:
        os.path.join(ENV_DIR, "metabolic.yaml")
    params:
        gtdbtk=config["metabolic"]["db"],
        metabolic=config["metabolic"]["directory"]
    threads:
        config["metabolic"]["threads"]
    message:
        "Running metabolic for all MAGs"
63
64
65
66
67
68
69
shell:
    "(date && "
    "export GTDBTK_DATA_PATH={params.gtdbtk} && "
    "export PERL5LIB && export PERL_LOCAL_LIB_ROOT && export PERL_MB_OPT && export PERL_MM_OPT && "
    """env PERL5LIB="" PERL_LOCAL_LIB_ROOT="" PERL_MM_OPT="" PERL_MB_OPT="" cpanm Array::Split && """
    "perl {params.metabolic}/METABOLIC-C.pl -t {threads} -in-gn $(dirname {input.fa}) -r {input.reads} -o {output} && "
    "date) &> >(tee {log})"
84
85
86
run:
    bin=pd.read_csv(input[0], sep="\t")
    bin_edited=bin[['selected_by_DASTool', 'classification']]       # selecting columns
101
102
103
104
105
106
107
run:
    kegg=pd.read_csv(input[0], sep="\t", skiprows=1)
    kegg_edited=kegg[['Geneid', 'Chr']]
    kegg_edited.rename(columns = {'Geneid': 'KEGG', 'Chr': 'Contig'}, inplace=True)
    kegg_contigs=(kegg_edited.assign(Contig = kegg_edited['Contig'].str.split(';')).explode('Contig').reset_index(drop=True))
    kegg_contigs=kegg_contigs.reindex(['Contig','KEGG'], axis=1)
    kegg_contigs.to_csv(output[0], sep="\t", index=False)
120
121
122
123
124
125
126
127
128
129
130
run:
    kegg=pd.read_csv(input[0], sep="\t", header=0)
    cov=pd.read_csv(input[1], sep="\t", header=None)
    cov.rename(columns={0: 'Contig', 1: 'Coverage'}, inplace=True)

    length=pd.read_csv(input[2], sep="\t", header=None)
    length.rename(columns={0: 'Contig', 1: 'Length'}, inplace=True)

    tmp=pd.merge(kegg, cov, on='Contig')
    all_merged=pd.merge(tmp, length, on='Contig')
    all_merged.to_csv(output[0], sep="\t", index=False)
141
142
143
144
run:
    scaffold=pd.read_csv(input[0], sep="\t", header=None)
    scaffold.columns=['Contig', 'Bin']
    scaffold.to_csv(output[0], sep="\t", index=False)
154
155
156
157
158
159
160
161
162
163
run:
    opened=[]
    for ifile in input:
      df=pd.read_csv(ifile, index_col=None, sep="\t", header=0)
      df['Sample']=re.sub("_gtdbtk.txt", "", os.path.basename(ifile))
      df = df.reindex(['Sample','Bin','Taxa'], axis=1)
      opened.append(df)

    frame=pd.concat(opened, axis=0, ignore_index=True)
    frame.to_csv(output[0], sep="\t", index=False)
172
173
174
175
run:
    opened=[]
    for ifile in input:
      df=pd.read_csv(ifile, index_col=None, sep="\t", header=0)
190
191
192
193
194
195
196
197
run:
    opened=[]
    for ifile in input:
      df=pd.read_csv(ifile, index_col=None, sep="\t", header=0)
      opened.append(df)

    frame=pd.concat(opened, axis=0, ignore_index=True)
    frame.to_csv(output[0], sep="\t", index=False)
206
207
208
209
run:
    opened=[]
    for ifile in input:
      df=pd.read_csv(ifile, index_col=None, sep="\t", header=0)
230
231
script:
    "merge_funcs.R"
245
246
shell:
    "cat {params} | awk '{{print $1\",\"$2}}' | sed '@^user@d' | sed '[email protected]@@g' | sed '[email protected]_sub.contigs@@g' | awk '!visited[$0]++' > {output}"
261
262
shell:
    "(date && summarize-metabolism --input $(dirname {input.bins}) --output {output.sum} --metadata {input.meta} --summary {output.sum}/results/summarize_metabolism.csv --heatmap {output.sum}/results/summarize_metabolism.pdf --aggregate ON --plotting ON && summarize-metabolism --input $(dirname {input.bins}) --output {output.indiv} --metadata {input.meta} --summary {output.indiv}/results/individual_metabolism.csv --heatmap {output.indiv}/results/individual_metabolism.pdf --plotting ON && date) &> >(tee {log})"
278
279
shell:
    "script=$(realpath {params.script}) && cd {params.path} && ${{script}}"
297
298
299
shell:
    "(date && export PATH=$PATH:{params.path} && "
    "MagicLamp.py LithoGenie -bin_dir $(dirname {input.bins}) -bin_ext fa -out {output} -t {threads} --norm && date) &> >(tee {log})" 
29
30
31
shell:
    "(date && ln -vs {input.in1} {output.fout1} && "
    "ln -vs {input.in2} {output.fout2} && date) &> >(tee {log})"
SnakeMake From line 29 of rules/imp.smk
42
43
44
45
46
47
48
shell:
    "(date && cp -v {input.config} {output.tout1} && "
    "cp -v {input.launcher} {output.tout2} && "
    "cp -v {input.runfile} {output.tout3} && "
    "sed -i 's/\"\$sample\"/{wildcards.sample}/g' {output.tout1} && "
    "sed -i 's/\"\$sample\"/{wildcards.sample}/g' {output.tout2} && "
    "sed -i 's/\"\$sample\"/{wildcards.sample}/g' {output.tout3} && date) &> >(tee {log})"
SnakeMake From line 42 of rules/imp.smk
28
29
shell:
    "ln -vs {input} {output}"
39
40
shell:
    "ln -vs {input} {output}"
63
64
shell:
    "(date && prokka --outdir $(dirname {output.FAA}) {input} --cpus {threads} --force && date) &> >(tee {log})"
74
75
shell:
    "for fname in {input.txt} ; do echo echo \"${{fname}}\"\"    \"$(echo {input.FAA}) ; done > {output}"
83
84
85
86
87
88
89
90
91
92
93
run:
    with open(output[0], "w") as ofile:
        # default HMMs
        for hmm_name, hmm_path in config["mantis"]["default"].items():
            ofile.write("%s=%s\n" % (hmm_name, hmm_path))
        # custom HMMs
        for hmm_path in config["mantis"]["custom"]:
            ofile.write("custom_hmm=%s\n" % hmm_path)
        # weights
        for weights_name, weights_value in config["mantis"]["weights"].items():
            ofile.write("%s=%f\n" % (weights_name, weights_value))
115
116
shell:
    "(date && python {config[mantis][path]}/ run_mantis -t {input.FAA} --output_folder $(dirname {output}) --mantis_config {input.config} --hmmer_threads {params.cores} --cores {threads} --memory {config[mantis][single_mem]} --kegg_matrix && date) &> >(tee {log})"
36
37
shell:
    "(date && antismash --cpus {threads} --genefinding-tool prodigal --fullhmmer --pfam2go --asf --cb-knownclusters --clusterhmmer --cf-create-clusters {input} --output-dir $(dirname {output}) && date) &> >(tee {log})"
65
66
shell:
    """(date && sed -n '/##FASTA/q;p' {input} | awk '$3=="CDS"' | awk '{{print $9}}' | awk 'gsub(";.*","")' | awk 'gsub("ID=","")' > {output} && date) &> >(tee {log})"""
SnakeMake From line 65 of rules/misc.smk
34
35
shell:
    "(date && export GTDBTK_DATA_PATH={params} && gtdbtk classify_wf --cpus {threads} -x fa --genome_dir $(dirname {input}) --out_dir {output} && date) &> >(tee {log})"
51
52
shell:
    "(date && checkm lineage_wf -r -t {threads} -x fa $(dirname {input}) {output} && date) &> >(tee {log})"
42
43
shell:
    "(date && python3 ./vibrant/VIBRANT/VIBRANT_run.py -t {threads} -i {input} -folder $(dirname $(dirname {output.viout1})) && date) &> >(tee {log})"
59
60
61
62
shell:
    "(date && python3 {config[convert_files][simplify]} {input} && "
    "export PATH='/scratch/users/sbusi/tools/miniconda3/envs/vcontact2/bin:$PATH' && "
    "vcontact2_gene2genome -p {output.tout1} -o {output.tout2} -s '{config[convert_files][type]}') &> >(tee {log})"
82
83
84
shell:
    "(date && export PATH=$PATH:'/scratch/users/sbusi/tools/miniconda3/envs/vcontact2/bin' && "
    "/scratch/users/sbusi/tools/miniconda3/envs/vcontact2/bin/vcontact2 --force-overwrite --raw-proteins {input.v1} --rel-mode 'Diamond' --proteins-fp {input.v2} --db 'ProkaryoticViralRefSeq94-Merged' --pcs-mode MCL --vcs-mode ClusterONE --c1-bin /home/users/sbusi/apps/vcontact2/cluster_one-1.0.jar --output-dir {output.cout5} && date) &> >(tee {log})"
121
122
shell:
    "(date && kraken2 --threads {threads} --db {config[kraken2][db]} --confidence 0.75 {input} --output {output.summary} --report {output.report} && date) &> >(tee {log})"
142
143
shell:
    "(date && kaiju -z {threads} -t {config[kaiju][db]}/{params.nodes} -f {config[kaiju][db]}/{params.fmi} -i {input.fasta} -o {output} && date) &> >(tee {log})"
162
163
shell:
    "(date && kaiju2table -e -t {config[kaiju][db]}/{params.nodes} -n {config[kaiju][db]}/{params.names} -r {config[kaiju][rank]} -o {output} {input.files} && date) &> >(tee {log})"
178
179
shell:
    "(date && checkv end_to_end -d {config[checkv][db]} {input} $(dirname {output}) -t {threads} && date) &> >(tee {log})"
198
199
shell:
    "(date && antismash --cpus {threads} --genefinding-tool none --genefinding-gff3 {input.GFF} --fullhmmer --pfam2go --asf --cb-knownclusters --clusterhmmer --cf-create-clusters {input.FA} --output-dir $(dirname {output}) && date) &> >(tee {log})"
ShowHide 50 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/susheelbhanu/nomis_pipeline
Name: nomis_pipeline
Version: NatComm
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...