Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Decision Tree-Based Data Mining and Rule Induction for Identifying High Quality Groundwater Zones to Water Supply Management : a Novel Hybrid Use of Data Mining and GIS

Jeihouni, Mehrdad ; Toomanian, Ara LU and Mansourian, Ali LU (2020) In Water Resources Management 34(1). p.139-154
Abstract

Groundwater is an important source to supply drinking water demands in both arid and semi-arid regions. Nevertheless, locating high quality drinking water is a major challenge in such areas. Against this background, this study proceeds to utilize and compare five decision tree-based data mining algorithms including Ordinary Decision Tree (ODT), Random Forest (RF), Random Tree (RT), Chi-square Automatic Interaction Detector (CHAID), and Iterative Dichotomiser 3 (ID3) for rule induction in order to identify high quality groundwater zones for drinking purposes. The proposed methodology works by initially extracting key relevant variables affecting water quality (electrical conductivity, pH, hardness and chloride) out of a total of eight... (More)

Groundwater is an important source to supply drinking water demands in both arid and semi-arid regions. Nevertheless, locating high quality drinking water is a major challenge in such areas. Against this background, this study proceeds to utilize and compare five decision tree-based data mining algorithms including Ordinary Decision Tree (ODT), Random Forest (RF), Random Tree (RT), Chi-square Automatic Interaction Detector (CHAID), and Iterative Dichotomiser 3 (ID3) for rule induction in order to identify high quality groundwater zones for drinking purposes. The proposed methodology works by initially extracting key relevant variables affecting water quality (electrical conductivity, pH, hardness and chloride) out of a total of eight existing parameters, and using them as inputs for the rule induction process. The algorithms were evaluated with reference to both continuous and discrete datasets. The findings were speculative of the superiority, performance-wise, of rule induction using the continuous dataset as opposed to the discrete dataset. Based on validation results, in continuous dataset, RF and ODT showed higher and RT showed acceptable performance. The groundwater quality maps were generated by combining the effective parameters distribution maps using inducted rules from RF, ODT, and RT, in GIS environment. A quick glance at the generated maps reveals a drop in the quality of groundwater from south to north as well as from east to west in the study area. The RF showed the highest performance (accuracy of 97.10%) among its counterparts; and so the generated map based on rules inducted from RF is more reliable. The RF and ODT methods are more suitable in the case of continuous dataset and can be applied for rule induction to determine water quality with higher accuracy compared to other tested algorithms.

(Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Decision tree, Geostatistics, Random forest, Random tree, Water quality, Machine Learning (ML), Artificial Intelligence (AI)
in
Water Resources Management
volume
34
issue
1
pages
16 pages
publisher
Springer
external identifiers
  • scopus:85076533051
ISSN
0920-4741
DOI
10.1007/s11269-019-02447-w
language
English
LU publication?
yes
id
93786bba-514d-4631-ae64-1fad6593ce15
date added to LUP
2020-01-07 14:53:05
date last changed
2023-08-30 12:55:00
@article{93786bba-514d-4631-ae64-1fad6593ce15,
  abstract     = {{<p>Groundwater is an important source to supply drinking water demands in both arid and semi-arid regions. Nevertheless, locating high quality drinking water is a major challenge in such areas. Against this background, this study proceeds to utilize and compare five decision tree-based data mining algorithms including Ordinary Decision Tree (ODT), Random Forest (RF), Random Tree (RT), Chi-square Automatic Interaction Detector (CHAID), and Iterative Dichotomiser 3 (ID3) for rule induction in order to identify high quality groundwater zones for drinking purposes. The proposed methodology works by initially extracting key relevant variables affecting water quality (electrical conductivity, pH, hardness and chloride) out of a total of eight existing parameters, and using them as inputs for the rule induction process. The algorithms were evaluated with reference to both continuous and discrete datasets. The findings were speculative of the superiority, performance-wise, of rule induction using the continuous dataset as opposed to the discrete dataset. Based on validation results, in continuous dataset, RF and ODT showed higher and RT showed acceptable performance. The groundwater quality maps were generated by combining the effective parameters distribution maps using inducted rules from RF, ODT, and RT, in GIS environment. A quick glance at the generated maps reveals a drop in the quality of groundwater from south to north as well as from east to west in the study area. The RF showed the highest performance (accuracy of 97.10%) among its counterparts; and so the generated map based on rules inducted from RF is more reliable. The RF and ODT methods are more suitable in the case of continuous dataset and can be applied for rule induction to determine water quality with higher accuracy compared to other tested algorithms.</p>}},
  author       = {{Jeihouni, Mehrdad and Toomanian, Ara and Mansourian, Ali}},
  issn         = {{0920-4741}},
  keywords     = {{Decision tree; Geostatistics; Random forest; Random tree; Water quality; Machine Learning (ML); Artificial Intelligence (AI)}},
  language     = {{eng}},
  number       = {{1}},
  pages        = {{139--154}},
  publisher    = {{Springer}},
  series       = {{Water Resources Management}},
  title        = {{Decision Tree-Based Data Mining and Rule Induction for Identifying High Quality Groundwater Zones to Water Supply Management : a Novel Hybrid Use of Data Mining and GIS}},
  url          = {{http://dx.doi.org/10.1007/s11269-019-02447-w}},
  doi          = {{10.1007/s11269-019-02447-w}},
  volume       = {{34}},
  year         = {{2020}},
}