Houses a snakemake workflow used for multiple sequence alignment and maximum likelihood phylogeny construction of genes or proteins. Originally used to study sex determination genes in mosquitoes.

public public 1yr ago 0 bookmarks

Snakemake file

This snakemake file contains a workflow allowing the creation of MAFFT alignments and RAxML phylogenies. Inputs can either be a set of amino acid or nucleotide sequences. It is coded for use with a snakemake configuration by RomainFeron , so that it works with a computing cluster managed by SLURM. Note that the outgroups in the ProteinTree and GeneTree rules need to be changed to accomodate your dataset.

Dataset

Contains the amino acid and CDS sequences of the sex determination gene doublesex in Aedes aegypti, Drosophila melanogaster and 13 Anopheles species. Male and female transcripts are present within each. This was the original dataset the workflow was tested on when I made it.

Code Snippets

10
11
12
13
14
shell:
    "source /dcsrsoft/spack/bin/setup_dcsrsoft;"
    "module load gcc/8.3.0;"
    "module load mafft/7.453;"
    "mafft --maxiterate 100 --globalpair --clustalout {input} > GenesAlignments/{wildcards.gene}Gclus.aln;"
25
26
27
28
29
shell:
    "source /dcsrsoft/spack/bin/setup_dcsrsoft;"
    "module load gcc/8.3.0;"
    "module load mafft/7.453;"
    "mafft --maxiterate 100 --globalpair {input} > GenesAlignments/{wildcards.gene}Gfas.aln;"
41
42
43
44
45
46
47
shell:
    "module load Bioinformatics/Software/vital-it;"
    "module add SequenceAnalysis/Filtering/trimAl/1.4.1;"
    "module load SequenceAnalysis/Filtering/trimAl/1.4.1;"
    "trimal -in {input} -out {wildcards.gene}Gtrimmedfas.aln -htmlout {wildcards.gene}Gtrimmedfas.html -automated1;"
    "mv *trimmed* /users/jtan/scratch/jtan/1ststep/dsx/GenesAlignmentsTrimmed/;"
    "cd ..;"
62
63
64
65
66
67
68
69
shell:
    "module load Bioinformatics/Software/vital-it;"
    "module add Phylogeny/raxml/8.2.12;"
    "module load Phylogeny/raxml/8.2.12;"
    "cd GenesAlignmentsTrimmed;"
    "rm -r /users/jtan/scratch/jtan/1ststep/dsx/GenesTrees/{wildcards.gene}GTree/;"
    "mkdir /users/jtan/scratch/jtan/1ststep/dsx/GenesTrees/{wildcards.gene}GTree/;"
    "raxmlHPC -d -s {wildcards.gene}Gtrimmedfas.aln -n {wildcards.gene}GTree -m GTRGAMMA -x 10 -p 10 -# autoMR -f a -o Dmel_Female -w /users/jtan/scratch/jtan/1ststep/dsx/GenesTrees/{wildcards.gene}GTree/"
80
81
82
83
84
shell:
    "source /dcsrsoft/spack/bin/setup_dcsrsoft;"
    "module load gcc/8.3.0;"
    "module load mafft/7.453;"
    "mafft --maxiterate 100 --globalpair --clustalout {input} > GenesAlignments/{wildcards.gene}Pclus.aln;"
94
95
96
97
98
shell:
    "source /dcsrsoft/spack/bin/setup_dcsrsoft;"
    "module load gcc/8.3.0;"
    "module load mafft/7.453;"
    "mafft --maxiterate 100 --globalpair {input} > GenesAlignments/{wildcards.gene}Pclus.aln;"
110
111
112
113
114
115
116
shell:
    "module load Bioinformatics/Software/vital-it;"
    "module add SequenceAnalysis/Filtering/trimAl/1.4.1;"
    "module load SequenceAnalysis/Filtering/trimAl/1.4.1;"
    "trimal -in {input} -out {wildcards.gene}Ptrimmedfas.aln -htmlout {wildcards.gene}Ptrimmedfas.html -automated1;"
    "mv *trimmed* /users/jtan/scratch/jtan/1ststep/dsx/ProteinsAlignmentsTrimmed/;"
    "cd ..;"
131
132
133
134
135
136
137
138
shell:
    "cd ProteinsAlignmentsTrimmed;"
    "module load Bioinformatics/Software/vital-it;"
    "module add Phylogeny/raxml/8.2.12;"
    "module load Phylogeny/raxml/8.2.12;"
    "rm -r /users/jtan/scratch/jtan/1ststep/dsx/ProteinsTrees/{wildcards.gene}PTree/;"
    "mkdir /users/jtan/scratch/jtan/1ststep/dsx/ProteinsTrees/{wildcards.gene}PTree/;"
    "raxmlHPC -d -s {wildcards.gene}Ptrimmedfas.aln -n {wildcards.gene}PTree -m PROTGAMMAJTTF -x 10 -p 10 -# autoMR -f a -o Dmel_Female -w /users/jtan/scratch/jtan/1ststep/dsx/ProteinsTrees/{wildcards.gene}PTree/;"
ShowHide 5 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/JamesTanShengYi/Snakemake-Alignment_Phylogeny
Name: snakemake-alignment_phylogeny
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...