Quantitative shotgun MS proteomics

public 1yr ago Version: dev 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Quantitative shotgun MS proteomics as done in Lehtio lab

This pipeline is no longer being maintained

Please see nf-core/quantms for a more up to date pipeline that covers much of the same functionality.

Introduction

The pipeline is built using Nextflow , a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker / singularity containers making installation trivial and results highly reproducible.

Documentation

The nf-core/ddamsproteomics pipeline comes with documentation about the pipeline, found in the docs/ directory:

Installation
Pipeline configuration
- Local installation
- Adding your own system
Running the pipeline
Output and how to interpret the results
Troubleshooting

Code Snippets

"""
echo $workflow.manifest.version > v_pipeline.txt
echo $workflow.nextflow.version > v_nextflow.txt
msgf_plus | head -n1 > v_msgf.txt
hardklor | head -n1 > v_hk.txt || true
kronik | head -n2 > v_kr.txt
percolator -h |& head -n1 > v_perco.txt || true
msspsmtable --version > v_mss.txt
source activate openms-2.4.0
IsobaricAnalyzer |& grep Version > v_openms.txt || true
scrape_software_versions.py > software_versions_mqc.yaml
"""

NextFlow OpenMS msgf_plus From line 235 of master/main.nf

"""
# Run hardklor on config file with added line for in/out files
# then run kronik on hardklor and quant isobaric labels if necessary
hardklor <(cat $hkconf <(echo "$infile" hardklor.out))
kronik -c 5 -d 3 -g 1 -m 8000 -n 600 -p 10 hardklor.out ${sample}.kr
source activate openms-2.4.0
${params.isobaric ? "IsobaricAnalyzer  -type $params.isobaric -in $infile -out \"${infile}.consensusXML\" -extraction:select_activation \"$activationtype\" -extraction:reporter_mass_shift $massshift -extraction:min_precursor_intensity 1.0 -extraction:keep_unannotated_precursor true -quantification:isotope_correction true" : ''}
"""

NextFlow Quant OpenMS From line 313 of master/main.nf

"""
msslookup spectra -i ${mzmlfiles.join(' ')} --setnames ${setnames.join(' ')}
"""

NextFlow Spectra From line 343 of master/main.nf

"""
# SQLite lookup needs copying to not modify the input file which would mess up a rerun with -resume
cat $lookup > db.sqlite
msslookup ms1quant --dbfile db.sqlite -i ${krfns.join(' ')} --spectra ${mzmls.join(' ')} --quanttype kronik --mztol 20.0 --mztoltype ppm --rttol 5.0 
msslookup isoquant --dbfile db.sqlite -i ${isofns.join(' ')} --spectra ${isosamples.collect{ x -> x + '.mzML' }.join(' ')}
"""

NextFlow From line 397 of master/main.nf

"""
# SQLite lookup needs copying to not modify the input file which would mess up a rerun with -resume
cat $lookup > db.sqlite
msslookup ms1quant --dbfile db.sqlite -i ${krfns.join(' ')} --spectra ${mzmls.join(' ')} --quanttype kronik --mztol 20.0 --mztoltype ppm --rttol 5.0 
"""

NextFlow From line 404 of master/main.nf

"""
sqlite3 $speclookup "SELECT mzmlfilename, COUNT(*) FROM mzml JOIN mzmlfiles USING(mzmlfile_id) JOIN biosets USING(set_id) GROUP BY mzmlfilename" > amount_spectra_files
"""

NextFlow From line 431 of master/main.nf

"""
#!/usr/bin/env python
platesets = [\"${splates.join('", "')}\"]
platescans = {p: 0 for p in platesets}
fileplates = {fn: p for fn, p in zip([\"${mzmlfiles.join('", "')}\"], platesets)}
with open('nr_spec_per_file') as fp:
    for line in fp:
        fn, scans = line.strip('\\n').split('|')
        platescans[fileplates[fn]] += int(scans)
with open('scans_per_plate', 'w') as fp:
    for plate, scans in platescans.items():
        fp.write('{}\\t{}\\n'.format(plate, scans))
"""

NextFlow From line 459 of master/main.nf

"""
tryprev.py $tdb
cat $tdb decoy_${tdb} > db.fa
"""

NextFlow From line 492 of master/main.nf

"""
msgf_plus -Xmx16G -d $db -s $x -o "${sample}.mzid" -thread 12 -mod $mods -tda 0 -t 10.0ppm -ti -1,2 -m 0 -inst ${msgfinstrument} -e 1 -protocol ${msgfprotocol} -ntt 2 -minLength 7 -maxLength 50 -minCharge 2 -maxCharge 6 -n 1 -addFeatures 1
msgf_plus -Xmx3500M edu.ucsd.msjava.ui.MzIDToTsv -i "${sample}.mzid" -o out.mzid.tsv
rm ${db.baseName.replaceFirst(/\.fasta/, "")}.c*
"""

NextFlow msgf_plus From line 513 of master/main.nf

"""
echo $samples
mkdir mzids
count=1;for sam in ${samples.join(' ')}; do ln -s `pwd`/mzid\$count mzids/\${sam}.mzid; echo mzids/\${sam}.mzid >> metafile; ((count++));done
msgf2pin -o percoin.xml -e trypsin -P "decoy_" metafile
percolator -j percoin.xml -X perco.xml -N 500000 --decoy-xml-output -y
"""

NextFlow From line 533 of master/main.nf

"""
perco_to_tsv.py -p $perco --plates ${platenames.join(' ')} --fractions ${fractions.join(' ')}
"""

NextFlow From line 559 of master/main.nf

"""
msspsmtable merge -i psms* -o psms.txt
msspsmtable conffilt -i psms.txt -o filtpsm --confidence-better lower --confidence-lvl 0.01 --confcolpattern 'PSM q-value'
msspsmtable conffilt -i filtpsm -o filtpep --confidence-better lower --confidence-lvl 0.01 --confcolpattern 'peptide q-value'
# SQLite lookup needs copying to not modify the input file which would mess up a rerun with -resume
cat lookup > $psmlookup
msslookup psms -i filtpep --dbfile $psmlookup ${params.onlypeptides ? '' : "--fasta ${td == 'target' ? tdb : "${ddb} --decoy"}"} ${params.martmap ? "--map ${martmap}" : ''}
msspsmtable specdata -i filtpep --dbfile $psmlookup -o prepsms.txt
${!params.noquant ? "msspsmtable quant -i prepsms.txt -o qpsms.txt --dbfile $psmlookup --precursor ${params.isobaric && td=='target' ? '--isobaric' : ''}" : 'mv prepsms.txt qpsms.txt'}
sed 's/\\#SpecFile/SpectraFile/' -i qpsms.txt
${!params.onlypeptides ? "msspsmtable genes -i qpsms.txt -o gpsms --dbfile $psmlookup" : ''}
${!params.onlypeptides ? "msslookup proteingroup -i qpsms.txt --dbfile $psmlookup" : ''}
${!params.onlypeptides ? "msspsmtable proteingroup -i gpsms -o ${params.hirief ? "pgpsms" : "$outpsms"} --dbfile $psmlookup" : 'mv qpsms.txt pgpsms'}
${params.hirief ? "peptide_pi_annotator.py -i $trainingpep -p pgpsms --o $outpsms --stripcolpattern Strip --pepcolpattern Peptide --fraccolpattern Fraction --strippatterns ${allstrips.join(' ')} --intercepts ${allstrips.collect() { params.strips[it].intercept}.join(' ')} --widths ${allstrips.collect() { params.strips[it].fr_width}.join(' ')} --ignoremods \'*\'" : ''}
msspsmtable split -i ${outpsms} --bioset
"""

NextFlow Quant From line 604 of master/main.nf

"""
# Create peptide table from PSM table, picking best scoring unique peptides
msspeptable psm2pep -i psms -o peptides --scorecolpattern svm --spectracol 1 ${!params.noquant && params.isobaric && td == 'target' ? "--isobquantcolpattern plex" : "" } ${!params.noquant ? "--ms1quantcolpattern area" : ""}
# Move peptide sequence to first column
paste <( cut -f ${col} peptides) <( cut -f 1-${col-1},${col+1}-500 peptides) > peptide_table.txt
# Create empty protein/gene/gene-symbol tables with only the identified accessions, will be filled later
echo Protein accession |tee proteins genes symbols
tail -n+2 psms|cut -f ${accolmap.proteins}|grep -v '\\;'|grep -v "^\$"|sort|uniq >> proteins
tail -n+2 psms|cut -f ${accolmap.genes}|grep -v '\\;'|grep -v "^\$"|sort|uniq >> genes
tail -n+2 psms|cut -f ${accolmap.assoc}|grep -v '\\;'|grep -v "^\$"|sort|uniq >> symbols
# Do isobaric quantification if necessary
${normalize && td == 'target' ? "msspsmtable isoratio -i psms -o proteinratios --protcol ${accolmap.proteins} --targettable proteins --isobquantcolpattern plex --minint 0.1 --denompatterns ${setdenoms[setname].join(' ')}" : 'touch proteinratios'}
${isoquant ? "msspsmtable isoratio -i psms -o pepisoquant --targettable peptide_table.txt --protcol ${accolmap.peptides} --isobquantcolpattern plex --minint 0.1 --denompatterns ${setdenoms[setname].join(' ')} ${normalize ? '--normalize median --norm-ratios proteinratios' : ''} > normratiosused" : ''}
${isoquant ? "mv pepisoquant peptide_table.txt" : ''}
# Create linear modeled q-values of peptides (modeled svm scores vs q-values) for more protein-FDR precision.
msspeptable modelqvals -i peptide_table.txt -o ${setname}_linmod --scorecolpattern svm --fdrcolpattern '^q-value'
"""

NextFlow From line 652 of master/main.nf

"""
mssprottable ms1quant -i proteins -o protms1 --psmtable psms --protcol ${accolmap[acctype]}
msspsmtable isoratio -i psms -o proteintable --protcol ${accolmap[acctype]} --targettable protms1 --isobquantcolpattern plex --minint 0.1 --denompatterns ${setdenoms[setname].join(' ')} ${normalize && td == 'target' ? '--norm-ratios pratios --normalize median': ''}
mssprottable bestpeptide -i proteintable -o bestpeptides --peptable peplinmod --scorecolpattern ${acctype == 'proteins' ? '\'^q-value\'' : '\'linear model\''} --logscore --protcol ${accolmap[acctype] + 1}
"""

NextFlow From line 716 of master/main.nf

"""
${td == 'target' && !params.noquant ? "mssprottable ms1quant -i proteins -o proteintable --psmtable psms --protcol ${accolmap[acctype]}" : 'mv proteins proteintable'}
mssprottable bestpeptide -i proteintable -o bestpeptides --peptable peplinmod --scorecolpattern ${acctype == 'proteins' ? '\'^q-value\'' : '\'linear model\''} --logscore --protcol ${accolmap[acctype] + 1}
"""

NextFlow From line 722 of master/main.nf

"""
mssprottable pickedfdr --picktype fasta --targetfasta $tfasta --decoyfasta $dfasta ${params.fastadelim ? "--fastadelim \'${params.fastadelim}\' --genefield ${params.genefield}" : ''} -i tbestpep --decoyfn dbestpep -o ${setname}_protfdr
"""

NextFlow From line 748 of master/main.nf

"""
mssprottable ${acctype == 'proteins' ? 'protfdr' : 'pickedfdr --picktype result'} -i tbestpep --decoyfn dbestpep -o ${setname}_protfdr
"""

NextFlow From line 752 of master/main.nf

"""
# SQLite lookup needs copying to not modify the input file which would mess up a rerun with -resume
cat $lookup > db.sqlite
msslookup ${acctype == 'peptides' ? 'peptides --fdrcolpattern \'^q-value\' --peptidecol' : 'proteins --fdrcolpattern \'q-value\' --protcol'} 1 --dbfile db.sqlite -i ${tables.join(' ')} --setnames ${setnames.join(' ')} ${!params.noquant ? "--ms1quantcolpattern area" : ""}  ${!params.noquant && params.isobaric ? '--psmnrcolpattern quanted --isobquantcolpattern plex' : ''} ${acctype in ['genes', 'assoc'] ? "--genecentric ${acctype}" : ''}
${acctype == 'peptides' ? 'msspeptable build' : 'mssprottable build --mergecutoff 0.01'} --dbfile db.sqlite -o proteintable ${!params.noquant && params.isobaric ? '--isobaric' : ''} ${!params.noquant ? "--precursor": ""} --fdr ${acctype in ['genes', 'assoc'] ? "--genecentric ${acctype}" : ''} ${params.onlypeptides ? "--noncentric" : ''}
sed -i 's/\\#/Amount/g' proteintable
"""

NextFlow From line 798 of master/main.nf

"""
qc_psms.R ${setnames[0].size()} ${fractionation ? 'TRUE' : 'FALSE'} ${plates.join(' ')}
echo "<html><body>" > psmqc.html
for graph in psm-scans missing-tmt miscleav
  do
  [[ -e \$graph ]] && paste -d \\\\0  <(echo "<div class=\\"chunk\\" id=\\"\${graph}\\"><img src=\\"data:image/png;base64,") <(base64 -w 0 \$graph) <(echo '"></div>') >> psmqc.html
  done 
for graph in retentiontime precerror fryield msgfscore
  do
  for plateid in ${plates.join(' ')}
    do
    plate="PLATE___\${plateid}___\${graph}"
    [[ -e \$plate ]] && paste -d \\\\0  <(echo "<div class=\\"chunk \$plateid\\" id=\\"\${graph}\\"><img src=\\"data:image/png;base64,") <(base64 -w 0 \$plate) <(echo '"></div>') >> psmqc.html
    done 
  done
echo "</body></html>" >> psmqc.html
"""

NextFlow From line 823 of master/main.nf

"""
${normalize ? "count=1;for setn in ${setnames.join(' ')}; do echo '' >> norm\${count} ; tail -n+2 norm\${count} | sed \$'s/ - /\t'\${setn}\$'\t/'; ((count++)); done >> normtable" : ''}
qc_protein.R ${setnames.size()} ${acctype} $peptable ${normalize ? 'normtable' : ''}
echo "<html><body>" > featqc.html
for graph in featyield precursorarea coverage isobaric nrpsms nrpsmsoverlapping percentage_onepsm normfac ms1nrpeps;
  do
  [ -e \$graph ] && paste -d \\\\0  <(echo "<div class=\\"chunk\\" id=\\"\${graph}\\"><img src=\\"data:image/png;base64,") <(base64 -w 0 \$graph) <(echo '"></div>') >> featqc.html
  done 
echo "</body></html>" >> featqc.html
"""

NextFlow From line 867 of master/main.nf

"""
count=1; for ac in ${acctypes.join(' ')}; do mv feat\$count \$ac.html; ((count++)); done
qc_collect.py $params.name ${params.hirief ? "hirief" : "nofrac"} ${plates.join(' ')}
"""

NextFlow From line 898 of master/main.nf

"""
markdown_to_html.r $output_docs results_description.html
"""

NextFlow From line 920 of master/main.nf

ShowHide 19 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://nf-co.re/ddamsproteomics

Name: ddamsproteomics

Version: dev

Badge:

Insert copied code into your website to add a link to this workflow.

License: None

Keywords:

msgf_plus Spectra Nextflow OpenMS Quant Protein expression Protein interactions Protein structure analysis

Refs:

https://nf-co.re/ddamsproteomics

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free