A data-driven early warning system for Escherichia coli in water based on microbial community analysis using flow cytometry 2D histograms
(2025) In Water Research X 29.- Abstract
Traditional methods for microbial water quality testing take up to two days to produce results, putting humans in contact with this water risk during this period. Flow cytometry, including with online capacity, is a fast and efficient way to profile microbes in water. In this study, Escherichia coli concentrations determined by Colilert18 and flow cytometry profiles from the same water samples were taken from sixteen bathing locations in Southern Sweden. Applying machine learning algorithms confirmed correlations and identified patterns in the microbial community described by the flow cytometry 2D histograms associated with the presence of E. coli. A Random Forest algorithm was best in discriminating between water containing > 100... (More)
Traditional methods for microbial water quality testing take up to two days to produce results, putting humans in contact with this water risk during this period. Flow cytometry, including with online capacity, is a fast and efficient way to profile microbes in water. In this study, Escherichia coli concentrations determined by Colilert18 and flow cytometry profiles from the same water samples were taken from sixteen bathing locations in Southern Sweden. Applying machine learning algorithms confirmed correlations and identified patterns in the microbial community described by the flow cytometry 2D histograms associated with the presence of E. coli. A Random Forest algorithm was best in discriminating between water containing > 100 CFU/100 mL and water containing < 100 CFU/100 mL E. coli when compared to logistic regression and support vector machines, improving prediction accuracy to 80 % from a baseline approach of 55 % when using optimised parameters. The introduction of a two-threshold model, which only considered safe predictions, further improved accuracy to 87 % by utilizing the prediction probability information in random forest. This approach, however, could only predict 65 % of the samples. A feature importance ranking using random forest identified the most important region within the flow cytometric 2D histogram for classification. This study suggests machine learning can leverage microbial community information from flow cytometry, that when combined with established methods quantifying indicators, can rapidly assess microbial water quality as an early warning system that complements traditional approaches.
(Less)
- author
- Erb, Isabel K.
LU
; Gador, Niklas
; Jinbäck, Moa
; Lindberg, Elisabet
and Paul, Catherine J.
LU
- organization
- publishing date
- 2025-12
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- E. coli, Early warning system, Flow cytometry, Machine learning, Microbial water quality
- in
- Water Research X
- volume
- 29
- article number
- 100404
- publisher
- Elsevier
- external identifiers
-
- scopus:105015038737
- ISSN
- 2589-9147
- DOI
- 10.1016/j.wroa.2025.100404
- language
- English
- LU publication?
- yes
- id
- b16e3f03-3f63-4545-a7ea-d1a50ca1e00a
- date added to LUP
- 2025-10-02 14:52:32
- date last changed
- 2025-10-02 14:52:49
@article{b16e3f03-3f63-4545-a7ea-d1a50ca1e00a, abstract = {{<p>Traditional methods for microbial water quality testing take up to two days to produce results, putting humans in contact with this water risk during this period. Flow cytometry, including with online capacity, is a fast and efficient way to profile microbes in water. In this study, Escherichia coli concentrations determined by Colilert18 and flow cytometry profiles from the same water samples were taken from sixteen bathing locations in Southern Sweden. Applying machine learning algorithms confirmed correlations and identified patterns in the microbial community described by the flow cytometry 2D histograms associated with the presence of E. coli. A Random Forest algorithm was best in discriminating between water containing > 100 CFU/100 mL and water containing < 100 CFU/100 mL E. coli when compared to logistic regression and support vector machines, improving prediction accuracy to 80 % from a baseline approach of 55 % when using optimised parameters. The introduction of a two-threshold model, which only considered safe predictions, further improved accuracy to 87 % by utilizing the prediction probability information in random forest. This approach, however, could only predict 65 % of the samples. A feature importance ranking using random forest identified the most important region within the flow cytometric 2D histogram for classification. This study suggests machine learning can leverage microbial community information from flow cytometry, that when combined with established methods quantifying indicators, can rapidly assess microbial water quality as an early warning system that complements traditional approaches.</p>}}, author = {{Erb, Isabel K. and Gador, Niklas and Jinbäck, Moa and Lindberg, Elisabet and Paul, Catherine J.}}, issn = {{2589-9147}}, keywords = {{E. coli; Early warning system; Flow cytometry; Machine learning; Microbial water quality}}, language = {{eng}}, publisher = {{Elsevier}}, series = {{Water Research X}}, title = {{A data-driven early warning system for Escherichia coli in water based on microbial community analysis using flow cytometry 2D histograms}}, url = {{http://dx.doi.org/10.1016/j.wroa.2025.100404}}, doi = {{10.1016/j.wroa.2025.100404}}, volume = {{29}}, year = {{2025}}, }