Advanced

How do data-mining models consider arsenic contamination in sediments and variables importance?

Mirchooli, Fahimeh ; Motevalli, Alireza LU ; Pourghasemi, Hamid Reza ; Mohammadi, Maziar ; Bhattacharya, Prosun ; Maghsood, Fatemeh Fadia LU and Tiefenbacher, John P. (2019) In Environmental Monitoring and Assessment 191(12).
Abstract

Arsenic (As) is one of the most important dangerous elements as more than 100 million of people are exposed to risk, globally. The permissible threshold of As for drinking water is 10 μg/L according to both the WHO’s drinking water guidelines and the Iranian national standard. However, several studies have indicated that As concentrations exceed this threshold value in several regions of Iran. This research evaluates an As-susceptible region, the Tajan River watershed, using the following data-mining models: multivariate adaptive regression splines (MARS), functional data analysis (FDA), support vector machine (SVM), generalized linear model (GLM), multivariate discriminant analysis (MDA), and gradient boosting machine (GBM). This study... (More)

Arsenic (As) is one of the most important dangerous elements as more than 100 million of people are exposed to risk, globally. The permissible threshold of As for drinking water is 10 μg/L according to both the WHO’s drinking water guidelines and the Iranian national standard. However, several studies have indicated that As concentrations exceed this threshold value in several regions of Iran. This research evaluates an As-susceptible region, the Tajan River watershed, using the following data-mining models: multivariate adaptive regression splines (MARS), functional data analysis (FDA), support vector machine (SVM), generalized linear model (GLM), multivariate discriminant analysis (MDA), and gradient boosting machine (GBM). This study considers 12 factors for elevated As concentrations: land use, drainage density, profile curvature, plan curvature, slope length, slope degree, topographic wetness index, erosion, village density, distance from villages, precipitation, and lithology. The susceptibility mapping was conducted using training (70%) and validation (30%). The results of As contamination in sediment showed that classifications into 4 levels of concentration are very similar for two models of GLM and FDA. The GBM calculated the areas of highest arsenic contamination risk by MARS and SVM with percentages of 30.0% and 28.7%, respectively. FDA, GLM, MARS, and MDA models calculated the areas of lowest risk to be 3.3%, 23.0%, 72.0%, 25.2%, and 26.1%, respectively. The results of ROC curve reveal that the MARS, SVM, and MDA had the highest accuracies with area under the curve ROC values of 84.6%, 78.9%, and 79.5%, respectively. Land use, lithology, erosion, and elevation were the most important predictors of contamination potential with a value of 0.6, 0.59, 0.57, and 0.56, respectively. These are the most important factors. Finally, these data-mining methods can be used as appropriate, inexpensive, and feasible options to identify As-susceptible areas and can guide managers to reduce contamination in sediment of the environment and the food chain.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Arsenic, Data-mining, GIS-based mapping, Human health, Iran, LVQ
in
Environmental Monitoring and Assessment
volume
191
issue
12
article number
777
publisher
Springer
external identifiers
  • pmid:31781968
  • scopus:85075754389
ISSN
0167-6369
DOI
10.1007/s10661-019-7979-x
language
English
LU publication?
yes
id
66fd7e9c-e10e-4724-89c2-25fb0c3b00b4
date added to LUP
2019-12-17 08:26:13
date last changed
2021-01-19 02:38:50
@article{66fd7e9c-e10e-4724-89c2-25fb0c3b00b4,
  abstract     = {<p>Arsenic (As) is one of the most important dangerous elements as more than 100 million of people are exposed to risk, globally. The permissible threshold of As for drinking water is 10 μg/L according to both the WHO’s drinking water guidelines and the Iranian national standard. However, several studies have indicated that As concentrations exceed this threshold value in several regions of Iran. This research evaluates an As-susceptible region, the Tajan River watershed, using the following data-mining models: multivariate adaptive regression splines (MARS), functional data analysis (FDA), support vector machine (SVM), generalized linear model (GLM), multivariate discriminant analysis (MDA), and gradient boosting machine (GBM). This study considers 12 factors for elevated As concentrations: land use, drainage density, profile curvature, plan curvature, slope length, slope degree, topographic wetness index, erosion, village density, distance from villages, precipitation, and lithology. The susceptibility mapping was conducted using training (70%) and validation (30%). The results of As contamination in sediment showed that classifications into 4 levels of concentration are very similar for two models of GLM and FDA. The GBM calculated the areas of highest arsenic contamination risk by MARS and SVM with percentages of 30.0% and 28.7%, respectively. FDA, GLM, MARS, and MDA models calculated the areas of lowest risk to be 3.3%, 23.0%, 72.0%, 25.2%, and 26.1%, respectively. The results of ROC curve reveal that the MARS, SVM, and MDA had the highest accuracies with area under the curve ROC values of 84.6%, 78.9%, and 79.5%, respectively. Land use, lithology, erosion, and elevation were the most important predictors of contamination potential with a value of 0.6, 0.59, 0.57, and 0.56, respectively. These are the most important factors. Finally, these data-mining methods can be used as appropriate, inexpensive, and feasible options to identify As-susceptible areas and can guide managers to reduce contamination in sediment of the environment and the food chain.</p>},
  author       = {Mirchooli, Fahimeh and Motevalli, Alireza and Pourghasemi, Hamid Reza and Mohammadi, Maziar and Bhattacharya, Prosun and Maghsood, Fatemeh Fadia and Tiefenbacher, John P.},
  issn         = {0167-6369},
  language     = {eng},
  number       = {12},
  publisher    = {Springer},
  series       = {Environmental Monitoring and Assessment},
  title        = {How do data-mining models consider arsenic contamination in sediments and variables importance?},
  url          = {http://dx.doi.org/10.1007/s10661-019-7979-x},
  doi          = {10.1007/s10661-019-7979-x},
  volume       = {191},
  year         = {2019},
}