Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

A data-driven early warning system for Escherichia coli in water based on microbial community analysis using flow cytometry 2D histograms

Erb, Isabel K. LU ; Gador, Niklas ; Jinbäck, Moa ; Lindberg, Elisabet and Paul, Catherine J. LU orcid (2025) In Water Research X 29.
Abstract

Traditional methods for microbial water quality testing take up to two days to produce results, putting humans in contact with this water risk during this period. Flow cytometry, including with online capacity, is a fast and efficient way to profile microbes in water. In this study, Escherichia coli concentrations determined by Colilert18 and flow cytometry profiles from the same water samples were taken from sixteen bathing locations in Southern Sweden. Applying machine learning algorithms confirmed correlations and identified patterns in the microbial community described by the flow cytometry 2D histograms associated with the presence of E. coli. A Random Forest algorithm was best in discriminating between water containing > 100... (More)

Traditional methods for microbial water quality testing take up to two days to produce results, putting humans in contact with this water risk during this period. Flow cytometry, including with online capacity, is a fast and efficient way to profile microbes in water. In this study, Escherichia coli concentrations determined by Colilert18 and flow cytometry profiles from the same water samples were taken from sixteen bathing locations in Southern Sweden. Applying machine learning algorithms confirmed correlations and identified patterns in the microbial community described by the flow cytometry 2D histograms associated with the presence of E. coli. A Random Forest algorithm was best in discriminating between water containing > 100 CFU/100 mL and water containing < 100 CFU/100 mL E. coli when compared to logistic regression and support vector machines, improving prediction accuracy to 80 % from a baseline approach of 55 % when using optimised parameters. The introduction of a two-threshold model, which only considered safe predictions, further improved accuracy to 87 % by utilizing the prediction probability information in random forest. This approach, however, could only predict 65 % of the samples. A feature importance ranking using random forest identified the most important region within the flow cytometric 2D histogram for classification. This study suggests machine learning can leverage microbial community information from flow cytometry, that when combined with established methods quantifying indicators, can rapidly assess microbial water quality as an early warning system that complements traditional approaches.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
E. coli, Early warning system, Flow cytometry, Machine learning, Microbial water quality
in
Water Research X
volume
29
article number
100404
publisher
Elsevier
external identifiers
  • scopus:105015038737
ISSN
2589-9147
DOI
10.1016/j.wroa.2025.100404
language
English
LU publication?
yes
id
b16e3f03-3f63-4545-a7ea-d1a50ca1e00a
date added to LUP
2025-10-02 14:52:32
date last changed
2025-10-02 14:52:49
@article{b16e3f03-3f63-4545-a7ea-d1a50ca1e00a,
  abstract     = {{<p>Traditional methods for microbial water quality testing take up to two days to produce results, putting humans in contact with this water risk during this period. Flow cytometry, including with online capacity, is a fast and efficient way to profile microbes in water. In this study, Escherichia coli concentrations determined by Colilert18 and flow cytometry profiles from the same water samples were taken from sixteen bathing locations in Southern Sweden. Applying machine learning algorithms confirmed correlations and identified patterns in the microbial community described by the flow cytometry 2D histograms associated with the presence of E. coli. A Random Forest algorithm was best in discriminating between water containing &gt; 100 CFU/100 mL and water containing &lt; 100 CFU/100 mL E. coli when compared to logistic regression and support vector machines, improving prediction accuracy to 80 % from a baseline approach of 55 % when using optimised parameters. The introduction of a two-threshold model, which only considered safe predictions, further improved accuracy to 87 % by utilizing the prediction probability information in random forest. This approach, however, could only predict 65 % of the samples. A feature importance ranking using random forest identified the most important region within the flow cytometric 2D histogram for classification. This study suggests machine learning can leverage microbial community information from flow cytometry, that when combined with established methods quantifying indicators, can rapidly assess microbial water quality as an early warning system that complements traditional approaches.</p>}},
  author       = {{Erb, Isabel K. and Gador, Niklas and Jinbäck, Moa and Lindberg, Elisabet and Paul, Catherine J.}},
  issn         = {{2589-9147}},
  keywords     = {{E. coli; Early warning system; Flow cytometry; Machine learning; Microbial water quality}},
  language     = {{eng}},
  publisher    = {{Elsevier}},
  series       = {{Water Research X}},
  title        = {{A data-driven early warning system for Escherichia coli in water based on microbial community analysis using flow cytometry 2D histograms}},
  url          = {{http://dx.doi.org/10.1016/j.wroa.2025.100404}},
  doi          = {{10.1016/j.wroa.2025.100404}},
  volume       = {{29}},
  year         = {{2025}},
}