Skip to content

A Nextflow pipeline to generate a phylogenetic tree from SKA alignment

Notifications You must be signed in to change notification settings


Repository files navigation

SKA Phylogeny Pipeline


The SKA Phylogeny Pipeline is a Nextflow-based workflow for performing phylogenetic analysis using Split Kmer Analysis (SKA). This pipeline integrates various tools and methods to generate phylogenetic trees from genomic data.


  • Flexible input handling with CSV file support
  • Optional SNP-sites analysis
  • Multiple tree-building methods (IQ-TREE, RapidNJ, RAxML, FastME, FastTree)
  • Scalable execution on various computing environments


  • Nextflow (version 20.04.0 or later)
  • Java 8 or later
  • Docker or Singularity (for containerized execution)


nextflow run [options]


--input: Input CSV file with sample information (required)
--outdir: Output directory (default: './results')
--run_snpsites: Run SNP-sites (default: false)
--run_tree_building: Run tree building (default: false)
--tree_method: Tree building method (default: 'iqtree')
Options: 'iqtree', 'rapidnj', 'raxml', 'fastme', 'fasttree'

--help: Display help message
--version: Display version information


The pipeline expects an input CSV file with the following format:



The pipeline generates the following outputs in the specified output directory:

  • SKA alignment files
  • SNP-sites output (if enabled)
  • Phylogenetic tree files (if tree building is enabled)


Basic run with default settings:

nextflow run --input samples.csv -profile conda

Run with SNP-sites and IQ-TREE:

nextflow run --input samples.csv --run_snpsites --run_tree_building --tree_method iqtree -profile conda

Run with RAxML tree building:

nextflow run --input samples.csv --run_tree_building --tree_method raxml -profile conda

Run without tree building:

nextflow run --input samples.csv --run_snpsites -profile conda

Pipeline Steps

  • Input validation
  • SKA alignment
  • SNP-sites analysis (optional)
  • Tree building (optional)


If you use this pipeline in your research, please cite:

Nextflow: Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319.

SKA: Harris SR. 2018. SKA: Split Kmer Analysis Toolkit for Bacterial Genomic Epidemiology. bioRxiv 453142 doi:

SNP-sites: "SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments", Andrew J. Page, Ben Taylor, Aidan J. Delaney, Jorge Soares, Torsten Seemann, Jacqueline A. Keane, Simon R. Harris, Microbial Genomics 2(4), (2016)

IQ-TREE/RapidNJ/RAxML/FastME/FastTree: L. Nguyen, H.A. Schmidt, A. von Haeseler, B.Q. Minh (2015) IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol. Biol. and Evol., 32:268-274.

FASTME: Lefort V, Desper R, Gascuel O. FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program. Mol Biol Evol. 2015 Oct;32(10):2798-800. doi: 10.1093/molbev/msv150. Epub 2015 Jun 30. PMID: 26130081; PMCID: PMC4576710.

FASTTREE: Price, M.N., Dehal, P.S., and Arkin, A.P. (2009) FastTree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix. Molecular Biology and Evolution 26:1641-1650, doi:10.1093/molbev/msp077

RAPIDNJ: Rapid Neighbour Joining. Martin Simonsen, Thomas Mailund and Christian N. S. Pedersen. In Proceedings of the 8th Workshop in Algorithms in Bioinformatics (WABI), LNBI 5251, 113-122, Springer Verlag, October 2008. doi:10.1007/978-3-540-87361-7_10

RAXMLNG: Alexey M. Kozlov, Diego Darriba, Tomáš Flouri, Benoit Morel, and Alexandros Stamatakis (2019) RAxML-NG: A fast, scalable, and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics, 35 (21), 4453-4455 doi:10.1093/bioinformatics/btz305

Afolayan et al. (2024). SKA Phylogeny Pipeline. GitHub repository:

Credits and Acknowledgements

This is an ongoing project at the Microbial Genome Analysis Group, Institute for Infection Prevention and Hospital Epidemiology, Üniversitätsklinikum, Freiburg. The TAPIR (Tracking the Acquisition of Pathogens in Real-Time) project is funded by BMBF, Germany, and is led by Dr. Sandra Reuter.