pipeline to find unique primer for genome bins in metagenomes
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
This pipeline designs primers that are unique to a genome/genome bin after metagenome assemlby and binning, which can be further used to verify the genome/genome bin are real by running PCR using the primers in the same DNA extract as the metagenome are sequenced from. Another use case is to match isolates from culturing to genome bins.
Authors
- Jiarong Guo (@jiarong)
Usage
Step 1: Install workflow
If you simply want to use this workflow, download and extract the latest release . If you intend to modify and further develop this workflow, fork this repository. Please consider providing any generally applicable modifications via a pull request.
In any case, if you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository and, if available, its DOI (see above).
Step 2: Configure workflow
Configure the workflow according to your needs via editing the file
config.yaml
.
Step 3: Execute workflow
Test your configuration by performing a dry-run via
snakemake -n
Execute the workflow locally via
snakemake --cores $N
using
$N
cores or run it in a cluster environment via
snakemake --cluster qsub --jobs 100
or
snakemake --drmaa --jobs 100
See the Snakemake documentation for further details.
Testing
Code Snippets
38 39 40 41 42 43 44 45 46 | shell: """ cd {wildcards.sample} \ && mkdir -p maxbin \ && run_MaxBin.pl -thread {threads} \ -max_iteration 100 \ -contig {wildcards.pid}_contigs.fasta.gz \ -abund {wildcards.pid}_contigcoverage.tsv \ -out maxbin/maxbin """ |
59 60 61 62 63 64 65 66 67 68 69 | shell: """ cd {wildcards.sample} rm -rf mb.cm.tmpdir checkm lineage_wf -x fasta \ -f mb.cm.out \ -t {threads} \ maxbin \ mb.cm.tmpdir \ || rm -rf mb.cm.tmpdir checkm tree_qa mb.cm.tmpdir -o 1 -f mb.cm.tax --tab_table """ |
79 80 81 82 83 84 85 86 | shell: """ cd {wildcards.sample} ## gather mash screen -p {threads} {DBDIR}/refseq.genomes.k21s1000.msh {input} \ | sort -g -r > mb.mashscreen.tsv mash screen -w -p {threads} {DBDIR}/refseq.genomes.k21s1000.msh {input} \ | sort -g -r > mb.mashscreenw.tsv """ |
95 96 97 98 99 100 101 102 103 104 105 106 107 | shell: """ # only pick top 10 most abundant to analyze if [ {wildcards.idx} -gt {TOPN} ]; then exit 0; fi cd {wildcards.sample}/maxbin ## search # -m: minimum kmer count, 1 for genome or bins, >2 for reads f=maxbin.{wildcards.idx}.fasta mash sketch -p {threads} -k 21 -m 1 $f # sort -k 3 --> pick 3rd col as key for sorting mash dist -p {threads} {DBDIR}/refseq.genomes.k21s1000.msh $f.msh \ | sort -g -k 3 \ > $f.mashdist.tsv """ |
117 118 119 120 121 122 123 124 125 | shell: """ cd {wildcards.sample} ## gather refseq_masher contains {input} \ -o mb.rmcontain.csv \ --output-type csv \ -i 0.5 \ -p {threads} """ |
134 135 136 137 138 139 140 141 142 143 144 145 | shell: """ # only pick top 10 most abundant to analyze if [ {wildcards.idx} -gt {TOPN} ]; then exit 0; fi cd {wildcards.sample}/maxbin ## search refseq_masher matches maxbin.{wildcards.idx}.fasta \ -o maxbin.{wildcards.idx}.fasta.rmmatch.csv \ --output-type csv \ -p {threads} \ -n 10 \ top n matches -m 1 # minimum kmer count, 1 for genome or bins, >2 for reads """ |
153 154 155 156 157 158 159 160 161 162 163 164 165 166 | shell: """ if [ {wildcards.idx} -gt {TOPN} ]; then exit 0; fi cd {wildcards.sample}/maxbin Seqfile_list=(maxbin.???.fasta) f=maxbin.{wildcards.idx}.fasta Other=${{Seqfile_list[@]/${{f}}}} python {SCRIPTDIR}/get-uniqkmer.py $f \ --ref $Other \ -k {K} \ -x 4e9 -N 4 \ --x2 1e8 --N2 4 \ > $f.uniqkmer \ || rm $f.uniqkmer """ |
177 178 179 180 181 182 183 184 185 | shell: """ cat {input} \ | python {SCRIPTDIR}/get-uniqkmer-nothread.py - \ --load {DBDIR}/refseq.20.bf \ -k {K} --x2 1e6 --N2 4 \ > {wildcards.sample}/maxbin/combined_uniqkmer.uniq2ref \ || rm {wildcards.sample}/maxbin/combined_uniqkmer.uniq2ref """ |
194 195 196 197 198 199 200 201 202 203 | shell: """ cd {wildcards.sample}/maxbin f=maxbin.{wildcards.idx}.fasta.uniqkmer python {SCRIPTDIR}/get-uniqkmer-nothread.py $f \ --ref combined_uniqkmer.uniq2ref \ -k {K} -x 1e6 -N 4 \ --x2 1e6 --N2 4 \ > $f.uniq2ref \ || rm $f.uniq2ref """ |
213 214 215 216 217 218 219 220 221 222 223 | shell: """ cd {wildcards.sample}/maxbin f=maxbin.{wildcards.idx}.fasta echo "*** output results for $f" python $Scriptdir/kmer2primer.py $Scriptdir/params.config \ $f $f.uniqkmer.uniq2ref \ | python $Scriptdir/filter-primer-individual.py \ $Scriptdir/params.config - \ | python $Scriptdir/filter-primer-pair.py $Scriptdir/params.config - \ | tee $f.uniqkmer.uniq2ref.primer """ |
Support
- Future updates