Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

A machine learning framework for spatio-temporal vulnerability mapping of groundwaters to nitrate in a data scarce region in Lenjanat Plain, Iran

Jalali, Reza ; Tishehzan, Parvaneh and Hashemi, Hossein LU orcid (2024) In Environmental Science and Pollution Research 31(29). p.42088-42110
Abstract

The temporal aspect of groundwater vulnerability to contaminants such as nitrate is often overlooked, assuming vulnerability has a static nature. This study bridges this gap by employing machine learning with Detecting Breakpoints and Estimating Segments in Trend (DBEST) algorithm to reveal the underlying relationship between nitrate, water table, vegetation cover, and precipitation time series, that are related to agricultural activities and groundwater demand in a semi-arid region. The contamination probability of Lenjanat Plain has been mapped by comparing random forest (RF), support vector machine (SVM), and K-nearest-neighbors (KNN) models, fed with 32 input variables (dem-derived factors, physiography, distance and density maps,... (More)

The temporal aspect of groundwater vulnerability to contaminants such as nitrate is often overlooked, assuming vulnerability has a static nature. This study bridges this gap by employing machine learning with Detecting Breakpoints and Estimating Segments in Trend (DBEST) algorithm to reveal the underlying relationship between nitrate, water table, vegetation cover, and precipitation time series, that are related to agricultural activities and groundwater demand in a semi-arid region. The contamination probability of Lenjanat Plain has been mapped by comparing random forest (RF), support vector machine (SVM), and K-nearest-neighbors (KNN) models, fed with 32 input variables (dem-derived factors, physiography, distance and density maps, time series data). Also, imbalanced learning and feature selection techniques were investigated as supplementary methods, adding up to four scenarios. Results showed that the RF model, integrated with forward sequential feature selection (SFS) and SMOTE-Tomek resampling method, outperformed the other models (F1-score: 0.94, MCC: 0.83). The SFS techniques outperformed other feature selection methods in enhancing the accuracy of the models with the cost of computational expenses, and the cost-sensitive function proved more efficient in tackling imbalanced data issues than the other investigated methods. The DBEST method identified significant breakpoints within each time series dataset, revealing a clear association between agricultural practices along the Zayandehrood River and substantial nitrate contamination within the Lenjanat region. Additionally, the groundwater vulnerability maps created using the candid RF model and an ensemble of the best RF, SVM, and KNN models predicted mid to high levels of vulnerability in the central parts and the downhills in the southwest.

(Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Breakpoint analysis, DBEST, Imbalanced learning, Random Forest, Resampling, Sustainable groundwater management
in
Environmental Science and Pollution Research
volume
31
issue
29
pages
23 pages
publisher
Springer
external identifiers
  • scopus:85195694426
  • pmid:38862797
ISSN
0944-1344
DOI
10.1007/s11356-024-33920-8
language
English
LU publication?
yes
id
8cddbb41-cef1-4f11-b497-a0d1f3e4f123
date added to LUP
2024-09-16 10:31:38
date last changed
2025-07-08 13:48:19
@article{8cddbb41-cef1-4f11-b497-a0d1f3e4f123,
  abstract     = {{<p>The temporal aspect of groundwater vulnerability to contaminants such as nitrate is often overlooked, assuming vulnerability has a static nature. This study bridges this gap by employing machine learning with Detecting Breakpoints and Estimating Segments in Trend (DBEST) algorithm to reveal the underlying relationship between nitrate, water table, vegetation cover, and precipitation time series, that are related to agricultural activities and groundwater demand in a semi-arid region. The contamination probability of Lenjanat Plain has been mapped by comparing random forest (RF), support vector machine (SVM), and K-nearest-neighbors (KNN) models, fed with 32 input variables (dem-derived factors, physiography, distance and density maps, time series data). Also, imbalanced learning and feature selection techniques were investigated as supplementary methods, adding up to four scenarios. Results showed that the RF model, integrated with forward sequential feature selection (SFS) and SMOTE-Tomek resampling method, outperformed the other models (F<sub>1</sub>-score: 0.94, MCC: 0.83). The SFS techniques outperformed other feature selection methods in enhancing the accuracy of the models with the cost of computational expenses, and the cost-sensitive function proved more efficient in tackling imbalanced data issues than the other investigated methods. The DBEST method identified significant breakpoints within each time series dataset, revealing a clear association between agricultural practices along the Zayandehrood River and substantial nitrate contamination within the Lenjanat region. Additionally, the groundwater vulnerability maps created using the candid RF model and an ensemble of the best RF, SVM, and KNN models predicted mid to high levels of vulnerability in the central parts and the downhills in the southwest.</p>}},
  author       = {{Jalali, Reza and Tishehzan, Parvaneh and Hashemi, Hossein}},
  issn         = {{0944-1344}},
  keywords     = {{Breakpoint analysis; DBEST; Imbalanced learning; Random Forest; Resampling; Sustainable groundwater management}},
  language     = {{eng}},
  number       = {{29}},
  pages        = {{42088--42110}},
  publisher    = {{Springer}},
  series       = {{Environmental Science and Pollution Research}},
  title        = {{A machine learning framework for spatio-temporal vulnerability mapping of groundwaters to nitrate in a data scarce region in Lenjanat Plain, Iran}},
  url          = {{http://dx.doi.org/10.1007/s11356-024-33920-8}},
  doi          = {{10.1007/s11356-024-33920-8}},
  volume       = {{31}},
  year         = {{2024}},
}