Coprolite host Identification pipeline

public 1yr ago Version: 1.1.1 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

A fully reproducible pipeline for COPROlite and paleofeces host IDentification

CoproID helps you to identify the "true maker" of Illumina sequenced Coprolites/Paleofaeces by checking the microbiome composition and the endogenous DNA.

It combines the analysis of putative host ancient DNA with a machine learning prediction of the feces source based on microbiome taxonomic composition:

( A ) First coproID performs a comparative mapping of all reads agains two (or three) target genomes (genome1, genome2, and eventually genome3) and computes a host-DNA species ratio ( NormalizedRatio )
( B ) Then coproID performs a metagenomic taxonomic profiling, and compares the obtained profiles to modern reference samples of the target species metagenomes. Using machine learning , coproID then estimates the host source from the metagenomic taxonomic composition ( prop_microbiome ).
Finally, coproID combines A and B to predict the likely host of the metagenomic sample.

The coproID pipeline is built using Nextflow , a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

A detailed description of coproID can be found in the article published in PeerJ .

Quick Start

i. Install nextflow

ii. Install either Docker or Singularity for full pipeline reproducibility (please only use Conda as a last resort; see docs )

iii. Download the pipeline and test it on a minimal dataset with a single command

nextflow run nf-core/coproid -profile test,<docker/singularity/conda/institute>

Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use -profile institute in your command. This will enable either docker or singularity and set the appropriate execution settings for your local compute environment.

iv. Start running your own analysis!

nextflow run maxibor/coproid --genome1 'GRCh37' --genome2 'CanFam3.1' --name1 'Homo_sapiens' --name2 'Canis_familiaris' --reads '*_R{1,2}.fastq.gz' --krakendb 'path/to/minikraken_db' -profile docker

This command runs coproID to estimate whether the source of test samples ( --reads '*_R{1,2}.fastq.gz' ) are coming from a human ( --genome1 'GRCh37' -name1 'Homo_sapiens' ) or a dog ( --genome2 'CanFam3.1' --name2 'Canis_familiaris' ), and specifies the path to the minikraken database ( --krakendb 'path/to/minikraken_db' ).

NB: The example above assumes access to iGenomes .

See usage docs for all of the available options when running the pipeline.

Documentation

The nf-core/coproid pipeline comes with documentation about the pipeline, found in the docs/ directory:

The nf-core/coproid pipeline comes with documentation about the pipeline, found in the docs/ directory and at the following address: coproid.readthedocs.io

Credits

nf-core/coproid was written by Maxime Borry .

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines .

For further information or help, don't hesitate to get in touch on Slack (you can join with this invite ).

Citing

coproID has been published in peerJ . The bibtex citation is available below:

@article{borry_coproid_2020,
 title = {{CoproID} predicts the source of coprolites and paleofeces using microbiome composition and host {DNA} content},
 volume = {8},
 issn = {2167-8359},
 url = {https://peerj.com/articles/9001},
 doi = {10.7717/peerj.9001},
 language = {en},
 urldate = {2020-04-20},
 journal = {PeerJ},
 author = {Borry, Maxime and Cordova, Bryan and Perri, Angela and Wibowo, Marsha and Honap, Tanvi Prasad and Ko, Jada and Yu, Jie and Britton, Kate and Girdland-Flink, Linus and Power, Robert C. and Stuijts, Ingelise and Salazar-García, Domingo C. and Hofman, Courtney and Hagan, Richard and Kagoné, Thérèse Samdapawindé and Meda, Nicolas and Carabin, Helene and Jacobson, David and Reinhard, Karl and Lewis, Cecil and Kostic, Aleksandar and Jeong, Choongwon and Herbig, Alexander and Hübner, Alexander and Warinner, Christina},
 month = apr,
 year = {2020},
 note = {Publisher: PeerJ Inc.},
 pages = {e9001}
}

Contributors

James A. Fellows Yates

Tool references

AdapterRemoval v2 Schubert, M., Lindgreen, S., & Orlando, L. (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 9, 88. https://doi.org/10.1186/s13104-016-1900-2
FastQC https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Bowtie2 Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), 357. https://dx.doi.org/10.1038%2Fnmeth.1923
Samtools Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics , 25(16), 2078–2079. https://doi.org/10.1093/bioinformatics/btp352
Kraken2 Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. BioRxiv, 762302. https://doi.org/10.1101/762302
PMDTools Skoglund, P., Northoff, B. H., Shunkov, M. V., Derevianko, A. P., Pääbo, S., Krause, J., & Jakobsson, M. (2014). Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proceedings of the National Academy of Sciences of the United States of America, 111(6), 2229–2234. https://doi.org/10.1073/pnas.1318934111
DamageProfiler Judith Neukamm (Unpublished): 10.5281/zenodo.1064062
Sourcepredict Borry, M. (2019). Sourcepredict: Prediction of metagenomic sample sources using dimension reduction followed by machine learning classification. The Journal of Open Source Software. https://doi.org/10.21105/joss.01540
MultiQC Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. https://doi.org/10.1093/bioinformatics/btw354

Code Snippets

"""
tar xvzf $ckdb
"""

NextFlow From line 328 of master/main.nf

"""
fastqc -q $reads
"""

NextFlow FastQC From line 414 of master/main.nf

"""
mv $genome $outname
"""

NextFlow From line 429 of master/main.nf

"""
mv $genome $outname
"""

NextFlow From line 442 of master/main.nf

"""
mv $genome $outname
"""

NextFlow From line 456 of master/main.nf

"""
AdapterRemoval --basename $name \\
               --file1 ${reads[0]} \\
               --file2 ${reads[1]} \\
               --trimns \\
               --trimqualities \\
               --collapse \\
               --minquality 20 \\
               --minlength 30 \\
               --output1 $out1 \\
               --output2 $out2 \\
               --outputcollapsed $col_out \\
               --threads ${task.cpus} \\
               --qualitybase ${params.phred} \\
               --settings $settings
"""

NextFlow AdapterRemoval From line 482 of master/main.nf

"""
AdapterRemoval --basename $name \\
               --file1 ${reads[0]} \\
               --file2 ${reads[1]} \\
               --trimns \\
               --trimqualities \\
               --minquality 20 \\
               --minlength 30 \\
               --output1 $out1 \\
               --output2 $out2 \\
               --threads ${task.cpus} \\
               --qualitybase ${params.phred} \\
               --settings $settings
"""

NextFlow AdapterRemoval From line 518 of master/main.nf

"""
AdapterRemoval --basename $name \\
               --file1 ${reads[0]} \\
               --trimns \\
               --trimqualities \\
               --minquality 20 \\
               --minlength 30 \\
               --output1 $se_out \\
               --threads ${task.cpus} \\
               --qualitybase ${params.phred} \\
               --settings $settings
"""

NextFlow AdapterRemoval From line 533 of master/main.nf

"""
bowtie2-build $fasta ${bt1_index}
"""

NextFlow Bowtie 2 From line 565 of master/main.nf

"""
bowtie2 -x $bt1_index -U ${reads[0]} $bowtie_setting --threads ${task.cpus} > $samfile 2> $fstat
samtools view -S -b -F 4 -@ ${task.cpus} $samfile | samtools sort -@ ${task.cpus} -o $outfile
samtools view -S -b -f 4 -@ ${task.cpus} $samfile | samtools sort -@ ${task.cpus} -o $outfile_unalign
"""

NextFlow SAMtools Bowtie 2 From line 592 of master/main.nf

"""
bowtie2 -x $bt1_index -1 ${reads[0]} -2 ${reads[1]} $bowtie_setting --threads ${task.cpus} > $samfile 2> $fstat
samtools view -S -b -F 4 -@ ${task.cpus} $samfile | samtools sort -@ ${task.cpus} -o $outfile
samtools view -S -b -f 4 -@ ${task.cpus} $samfile | samtools sort -@ ${task.cpus} -o $outfile_unalign
"""

NextFlow SAMtools Bowtie 2 From line 598 of master/main.nf

"""
samtools fastq -1 $out1 -2 $out2 -0 /dev/null -s /dev/null -n -F 0x900 $bam
"""

NextFlow SAMtools From line 619 of master/main.nf

"""
samtools fastq $bam > $out
"""

NextFlow SAMtools From line 624 of master/main.nf

"""
bowtie2-build $fasta ${bt2_index}
"""

NextFlow Bowtie 2 From line 643 of master/main.nf

"""
bowtie2-build $fasta ${bt3_index}
"""

NextFlow Bowtie 2 From line 662 of master/main.nf

"""
bowtie2 -x $bt2_index -U ${reads[0]} $bowtie_setting --threads ${task.cpus} > $samfile 2> $fstat
samtools view -S -b -F 4 -@ ${task.cpus} $samfile | samtools sort -@ ${task.cpus} -o $outfile
samtools view -S -b -f 4 -@ ${task.cpus} $samfile | samtools sort -@ ${task.cpus} -o $outfile_unalign
"""

NextFlow SAMtools Bowtie 2 From line 691 of master/main.nf

"""
bowtie2 -x $bt2_index -1 ${reads[0]} -2 ${reads[1]} $bowtie_setting --threads ${task.cpus} > $samfile 2> $fstat
samtools view -S -b -F 4 -@ ${task.cpus} $samfile | samtools sort -@ ${task.cpus} -o $outfile
samtools view -S -b -f 4 -@ ${task.cpus} $samfile | samtools sort -@ ${task.cpus} -o $outfile_unalign
"""

NextFlow SAMtools Bowtie 2 From line 697 of master/main.nf

"""
bowtie2 -x $bt3_index -U ${reads[0]} $bowtie_setting --threads ${task.cpus} > $samfile 2> $fstat
samtools view -S -b -F 4 -@ ${task.cpus} $samfile | samtools sort -@ ${task.cpus} -o $outfile
samtools view -S -b -f 4 -@ ${task.cpus} $samfile | samtools sort -@ ${task.cpus} -o $outfile_unalign
"""

NextFlow SAMtools Bowtie 2 From line 728 of master/main.nf

"""
bowtie2 -x $bt3_index -1 ${reads[0]} -2 ${reads[1]} $bowtie_setting --threads ${task.cpus} > $samfile 2> $fstat
samtools view -S -b -F 4 -@ ${task.cpus} $samfile | samtools sort -@ ${task.cpus} -o $outfile
samtools view -S -b -f 4 -@ ${task.cpus} $samfile | samtools sort -@ ${task.cpus} -o $outfile_unalign
"""

NextFlow SAMtools Bowtie 2 From line 734 of master/main.nf

"""
samtools view -h -F 4 $bam1 | pmdtools -t ${params.pmdscore} --header $library | samtools view -Sb - > $outfile
"""

NextFlow SAMtools pmdtools From line 757 of master/main.nf

"""
samtools view -h -F 4 $bam2 | pmdtools -t ${params.pmdscore} --header $library | samtools view -Sb - > $outfile
"""

NextFlow SAMtools pmdtools From line 773 of master/main.nf

"""
samtools view -h -F 4 $bam3 | pmdtools -t ${params.pmdscore} --header $library | samtools view -Sb - > $outfile
"""

NextFlow SAMtools pmdtools From line 790 of master/main.nf

"""
kraken2 --db ${krakendb} \\
        --threads ${task.cpus} \\
        --output $out \\
        --report $kreport \\
        --paired ${reads[0]} ${reads[1]}
"""    

NextFlow kraken2 From line 815 of master/main.nf

"""
kraken2 --db ${krakendb} \\
        --threads ${task.cpus} \\
        --output $out \\
        --report $kreport ${reads[0]}
"""

NextFlow kraken2 From line 823 of master/main.nf

"""
kraken_parse.py -c ${params.minKraken} $kraken_r
"""    

NextFlow From line 844 of master/main.nf

"""
merge_kraken_res.py -o $out
"""    

NextFlow From line 861 of master/main.nf

"""
sourcepredict -di ${params.sp_dim} \\
              -kne ${params.sp_neighbors} \\
              -me ${params.sp_embed} \\
              -n ${params.sp_norm} \\
              -l ${sp_labels} \\
              -s ${sp_sources} \\
              -t ${task.cpus} \\
              -o $outfile \\
              -e $embed_out $otu_table 
"""

NextFlow sourcepredict From line 881 of master/main.nf

"""
samtools index $bam1
samtools index $bam2
samtools index $abam1
samtools index $abam2
normalizedReadCount -n $name \\
                    -b1 $bam1 \\
                    -ab1 $abam1 \\
                    -b2 $bam2 \\
                    -ab2 $abam2 \\
                    -g1 $genome1 \\
                    -g2 $genome2 \\
                    -r1 $organame1 \\
                    -r2 $organame2 \\
                    -i ${params.identity} \\
                    -o $outfile \\
                    -ob1 $obam1 \\
                    -aob1 $aobam1 \\
                    -ob2 $obam2 \\
                    -aob2 $aobam2 \\
                    -ed1 ${params.endo1} \\
                    -ed2 ${params.endo2} \\
                    -p ${task.cpus}
"""

NextFlow SAMtools From line 920 of master/main.nf

"""
samtools index $bam1
samtools index $bam2
normalizedReadCount -n $name \\
                    -b1 $bam1 \\
                    -b2 $bam2 \\
                    -g1 $genome1 \\
                    -g2 $genome2 \\
                    -r1 $organame1 \\
                    -r2 $organame2 \\
                    -i ${params.identity} \\
                    -o $outfile \\
                    -ob1 $obam1 \\
                    -ob2 $obam2 \\
                    -ed1 ${params.endo1} \\
                    -ed2 ${params.endo2} \\
                    -p ${task.cpus}
"""

NextFlow SAMtools From line 946 of master/main.nf

"""
samtools index $bam1
samtools index $bam2
samtools index $bam3
samtools index $abam1
samtools index $abam2
samtools index $abam3
normalizedReadCount -n $name \\
                    -b1 $bam1 \\
                    -ab1 $abam1 \\
                    -b2 $bam2 \\
                    -ab2 $abam2 \\
                    -b3 $bam3 \\
                    -ab3 $abam3 \\
                    -g1 $genome1 \\
                    -g2 $genome2 \\
                    -g3 $genome3 \\
                    -r1 $organame1 \\
                    -r2 $organame2 \\
                    -r3 $organame3 \\
                    -i ${params.identity} \\
                    -o $outfile \\
                    -ob1 $obam1 \\
                    -aob1 $aobam1 \\
                    -ob2 $obam2 \\
                    -aob2 $aobam2 \\
                    -ob3 $obam3 \\
                    -aob3 $aobam3 \\
                    -ed1 ${params.endo1} \\
                    -ed2 ${params.endo2} \\
                    -ed3 ${params.endo3} \\
                    -p ${task.cpus}
"""

NextFlow SAMtools From line 999 of master/main.nf

"""
samtools index $bam1
samtools index $bam2
samtools index $bam3
normalizedReadCount -n $name \\
                    -b1 $bam1 \\
                    -b2 $bam2 \\
                    -b3 $bam3 \\
                    -g1 $genome1 \\
                    -g2 $genome2 \\
                    -g3 $genome3 \\
                    -r1 $organame1 \\
                    -r2 $organame2 \\
                    -r3 $organame3 \\
                    -i ${params.identity} \\
                    -o $outfile \\
                    -ob1 $obam1 \\
                    -ob2 $obam2 \\
                    -ob3 $obam3 \\
                    -ed1 ${params.endo1} \\
                    -ed2 ${params.endo2} \\
                    -ed3 ${params.endo3} \\
                    -p ${task.cpus}
"""

NextFlow SAMtools From line 1034 of master/main.nf

"""
mv $align $bam_name
damageprofiler -i $bam_name -r $fasta -o tmp
mv tmp/${smp_name}/5pCtoT_freq.txt $fwd_name
mv tmp/${smp_name}/3pGtoA_freq.txt $rev_name
mv tmp/${smp_name}/dmgprof.json ${smp_name}.dmgprof.json
"""

NextFlow From line 1082 of master/main.nf

"""
mv $align $bam_name
damageprofiler -i $bam_name -r $fasta -o tmp
mv tmp/${smp_name}/5pCtoT_freq.txt $fwd_name
mv tmp/${smp_name}/3pGtoA_freq.txt $rev_name
mv tmp/${smp_name}/dmgprof.json ${smp_name}.dmgprof.json
"""

NextFlow From line 1107 of master/main.nf

"""
mv $align $bam_name
damageprofiler -i $bam_name -r $fasta -o tmp
mv tmp/${smp_name}/5pCtoT_freq.txt $fwd_name
mv tmp/${smp_name}/3pGtoA_freq.txt $rev_name
mv tmp/${smp_name}/dmgprof.json ${smp_name}.dmgprof.json
"""

NextFlow From line 1133 of master/main.nf

"""
ls -1 *.bpc.csv | head -1 | xargs head -1 > coproID_bp.csv
tail -q -n +2 *.bpc.csv >> coproID_bp.csv
merge_bp_sp.py -c coproID_bp.csv -s $sp -o $outfile
"""

NextFlow From line 1159 of master/main.nf

"""
echo ${workflow.manifest.version} > version.txt
jupyter nbconvert \\
        --TagRemovePreprocessor.remove_input_tags='{"remove_cell"}' \\
        --TagRemovePreprocessor.remove_all_outputs_tags='{"remove_output"}' \\
        --TemplateExporter.exclude_input_prompt=True \\
        --TemplateExporter.exclude_output_prompt=True \\
        --ExecutePreprocessor.timeout=200 \\
        --execute \\
        --to html_embed $report
"""

NextFlow From line 1184 of master/main.nf

"""
echo ${workflow.manifest.version} > version.txt
jupyter nbconvert \\
        --TagRemovePreprocessor.remove_input_tags='{"remove_cell"}' \\
        --TagRemovePreprocessor.remove_all_outputs_tags='{"remove_output"}' \\
        --TemplateExporter.exclude_input_prompt=True \\
        --TemplateExporter.exclude_output_prompt=True \\
        --ExecutePreprocessor.timeout=200 \\
        --execute \\
        --to html_embed $report
"""

NextFlow From line 1211 of master/main.nf

"""
echo ${workflow.manifest.version} > version.txt
jupyter nbconvert \\
        --TagRemovePreprocessor.remove_input_tags='{"remove_cell"}' \\
        --TagRemovePreprocessor.remove_all_outputs_tags='{"remove_output"}' \\
        --TemplateExporter.exclude_input_prompt=True \\
        --TemplateExporter.exclude_output_prompt=True \\
        --ExecutePreprocessor.timeout=200 \\
        --execute \\
        --to html_embed $report
"""

NextFlow From line 1236 of master/main.nf

"""
echo $workflow.manifest.version > v_pipeline.txt
echo $workflow.nextflow.version > v_nextflow.txt
fastqc --version > v_fastqc.txt
multiqc --version > v_multiqc.txt
sourcepredict -h  > v_sourcepredict.txt
samtools --version > v_samtools.txt
kraken2 --version > v_kraken2.txt
bowtie2 --version > v_bowtie2.txt
python --version > v_python.txt
AdapterRemoval --version 2> v_adapterremoval.txt
scrape_software_versions.py &> software_versions_mqc.yaml
"""

NextFlow SAMtools FastQC MultiQC Bowtie 2 kraken2 AdapterRemoval sourcepredict From line 1266 of master/main.nf

"""
multiqc -f -d adapter_removal alignment fastqc DamageProfiler software_versions software_versions -c $multiqc_conf
"""

NextFlow FastQC MultiQC From line 1299 of master/main.nf

"""
markdown_to_html.py $output_docs -o results_description.html
"""

NextFlow From line 1337 of master/main.nf

ShowHide 33 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://nf-co.re/coproid

Name: coproid

Version: 1.1.1

Badge:

Insert copied code into your website to add a link to this workflow.

License: None

Keywords:

pmdtools sourcepredict AdapterRemoval Bowtie 2 FastQC kraken2 MultiQC Nextflow SAMtools Data identity and mapping DNA

Refs:

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free