Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

TaxSeedo: An energy-saving ANI alternative for rapid classification of bacteria

Kennedy, Ryan J. (2020) BINP50 20201
Degree Projects in Bioinformatics
Popular Abstract
An energy-saving tool for rapid classification of bacteria

When attempting to describe the characteristics of specific bacteria, it is important to know what species is under investigation. The 16S rRNA marker gene is generally used to classify bacteria into existing taxonomical hierarchies. However, a large percentage of the bacterial genome is left unconsidered as the ~1,500 bp of the 16S rRNA marker gene only constitutes ~0.04% of the entire genome. Furthermore, as 97% sequence identity is considered to be the threshold for species boundaries, less than 50 bp of the 1,500 bp molecule contribute towards the taxonomic resolution.

Average nucleotide identity (ANI) is regarded as the gold standard for classification of assembled... (More)
An energy-saving tool for rapid classification of bacteria

When attempting to describe the characteristics of specific bacteria, it is important to know what species is under investigation. The 16S rRNA marker gene is generally used to classify bacteria into existing taxonomical hierarchies. However, a large percentage of the bacterial genome is left unconsidered as the ~1,500 bp of the 16S rRNA marker gene only constitutes ~0.04% of the entire genome. Furthermore, as 97% sequence identity is considered to be the threshold for species boundaries, less than 50 bp of the 1,500 bp molecule contribute towards the taxonomic resolution.

Average nucleotide identity (ANI) is regarded as the gold standard for classification of assembled genomes, with a similarity score (identity) of ~95% representing the species boundary. The ANI algorithm computes whole-genome comparisons, resulting in a greater resolution compared to that of 16S rRNA marker. However, the computationally intensive nature of ANI is of particular concern for laboratories with limited computational resources. Further considerations include the energy consumption required to execute ANI, which translates to significant economic and environmental costs.

This study designed an algorithm, TaxSeedo, which aims at providing the same taxonomic resolution as ANI, however at a fraction of the computational costs. Thus, a database was generated containing unique 27 bp markers prefixed with ATG codons (seeds), extracted from the 5’ end of open reading frames. This database was used to classify a dataset consisting of 70 PacBio sequenced, Illumina polished bacteria, isolated from air. These samples are part of an ongoing project to map the microbiome of air. When comparing the time taken to analyse one isolate, TaxSeedo is approximately 8,400 times faster than ANI, providing an energy-saving alternative, at no cost of relative precision.

TaxSeedo’s ulterior motives
When culturing bacteria, it is difficult to confirm that isolates are pure. In addition, de novo assembly of genomes can be a time consuming and computationally intensive task. Allowing both fasta and fastq input formats, TaxSeedo was performed on the Illumina reads of the 70 isolates, classifying 56 isolates with the same taxonomic precision as ANI. Furthermore, TaxSeedo detected 12 potentially impure isolates containing at least two species above the threshold. Thus, suggesting that TaxSeedo is an effective pre-screener for reference assembly as well as capable of identifying potential impure isolates.

The NCBI database is plagued by incorrect taxonomical assignments, leading to ambiguous classification results. In addition, genome sequences that belong to the same species are assigned different names, known as heterotypic synonyms, create further ambiguity. Interestingly, TaxSeedo was able to identify four groups containing different species, that shared identity values greater than 95% (i.e. they should be classified as the same species).

The taxonomic classification of microbes remains a desirable task when investigating a microbiome. As TaxSeedo was able to identify potential mixed isolates with confidence, it is plausible that this can be extrapolated for classification of species in a meta-analysis of a sample containing multiple mixed microbes. Further intentions aim to extend TaxSeedo’s range from solely a bacterial classifier to include fungal species, as the two often co-exist in microbiomes.

Master’s Degree Project in Bioinformatics 30 credits 2020
Department of Biology, Lund University

Advisor: Stephan Schuster
Nanyang Technological University, Singapore. (Less)
Please use this url to cite or link to this publication:
author
Kennedy, Ryan J.
supervisor
organization
course
BINP50 20201
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
9029804
date added to LUP
2020-09-23 10:35:12
date last changed
2020-09-23 10:35:12
@misc{9029804,
  author       = {{Kennedy, Ryan J.}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{TaxSeedo: An energy-saving ANI alternative for rapid classification of bacteria}},
  year         = {{2020}},
}