Snakemake workflow for running maximum likelihood (ML) phylogenetic analysis using RAxML-NG and the associated tools. Provides much more accurate results than IQ-TREE based phylogenies.

public public 1yr ago 0 bookmarks

Scalable RAxML-NG-based phylogenetic analysis using Snakemake

This is a Snakemake workflow for running scalable maximum likelihood (ML) phylogenetic analysis using RAxML-NG and associated tools (Pythia, ModelTest-NG). This workflow is considerably slower than the IQ-TREE-based one, but is much better in terms of accuracy, especially in difficult-to-analyze datasets.

This workflow performs all steps sequentially, from MSA to model selection and phylogeny inference.

Usage:

snakemake --cores 10 --snakefile Snakefile

Dependencies:

Mafft https://github.com/GSLBiotech/mafft

Trimal https://github.com/inab/trimal

Pythia https://github.com/tschuelia/PyPythia

ModelTest-NG https://github.com/ddarriba/modeltest

RAxML-NG https://github.com/amkozlov/raxml-ng

ETE3 http://etetoolkit.org/

Code Snippets

 5
 6
 7
 8
 9
10
11
12
13
14
15
if [ $# != 1 ]; then
	    echo "USAGE: ./script <fasta-file>"
	        exit
	fi

numSpec=$(grep -c  ">" $1)
tmp=$(cat $1 | sed "s/>[ ]*\(\w*\).*/;\1</"  | tr -d "\n" | tr -d ' '  | sed 's/^;//' | tr "<" " " )
length=$(($(echo $tmp | sed 's/[^ ]* \([^;]*\);.*/\1/'   | wc -m ) - 1))

echo "$numSpec $length"
echo  $tmp | tr ";" "\n"
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import re
import argparse
from ete3 import Tree

parser = argparse.ArgumentParser()
parser.add_argument('tree_newick')
args = parser.parse_args()

input = args.tree_newick

def midpoint_root(input):

    tree = Tree(input, format = 1)

    ## get midpoint root of tree ##
    midpoint = tree.get_midpoint_outgroup()

    ## set midpoint root as outgroup ##
    tree.set_outgroup(midpoint)

    tree.write(format=1, outfile=input+".midpoint_rooted")


midpoint_root(input)
14
shell: "cat {input[0]} > {output[0]}"
SnakeMake From line 14 of main/Snakefile
19
shell: "mafft --auto {input[0]} > {output[0]}"
24
shell: "trimal -in {input} -out {output} -fasta -gt 0.50"
29
shell: "./aln2phylip.sh {input} > {output}"
SnakeMake From line 29 of main/Snakefile
34
shell: "pythia --msa {input} -r raxml-ng/bin/raxml-ng --removeDuplicates -o {output}"
39
shell: "modeltest-ng -i {input[0]} -d aa -t ml -c -T raxml"
SnakeMake From line 39 of main/Snakefile
46
47
shell: "MODEL=`grep -P "\sBIC" all_genes.pep.aln.trimmed.phy.log | grep -v "Best" | sed 's/.*               //g' | sed 's/ .*//g'` && 
       raxml-ng-mpi --all --msa {input.msa} --model $MODEL --tree rand{10},pars{90} --threads 40"
SnakeMake From line 46 of main/Snakefile
52
shell: "python3 midpoint.py {input[0]}"
SnakeMake From line 52 of main/Snakefile
ShowHide 6 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/JasonCharamis/Snakemake-workflow-for-RAxML-NG-based-phylogenetic-analysis
Name: snakemake-workflow-for-raxml-ng-based-phylogenetic
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...