Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Interpretable machine learning for predicting the fate and transport of pentachlorophenol in groundwater

Rad, Mehran LU ; Abtahi, Azra LU ; Berndtsson, Ronny LU orcid ; McKnight, Ursula S and Aminifar, Amir LU orcid (2024) In Environmental Pollution 345.
Abstract

Pentachlorophenol (PCP) is a commonly found recalcitrant and toxic groundwater contaminant that resists degradation, bioaccumulates, and has a potential for long-range environmental transport. Taking proper actions to deal with the pollutant accounting for the life cycle consequences requires a better understanding of its behavior in the subsurface. We recognize the huge potential for enhancing decision-making at contaminated groundwater sites with the arrival of machine learning (ML) techniques in environmental applications. We used ML to enhance the understanding of the dynamics of PCP transport properties in the subsurface, and to determine key hydrochemical and hydrogeological drivers affecting its transport and fate. We demonstrate... (More)

Pentachlorophenol (PCP) is a commonly found recalcitrant and toxic groundwater contaminant that resists degradation, bioaccumulates, and has a potential for long-range environmental transport. Taking proper actions to deal with the pollutant accounting for the life cycle consequences requires a better understanding of its behavior in the subsurface. We recognize the huge potential for enhancing decision-making at contaminated groundwater sites with the arrival of machine learning (ML) techniques in environmental applications. We used ML to enhance the understanding of the dynamics of PCP transport properties in the subsurface, and to determine key hydrochemical and hydrogeological drivers affecting its transport and fate. We demonstrate how this complementary knowledge, provided by data-driven methods, may enable a more targeted planning of monitoring and remediation at two highly contaminated Swedish groundwater sites, where the method was validated. We evaluated 6 interpretable ML methods, 3 linear regressors and 3 non-linear (i.e., tree-based) regressors, to predict PCP concentration in the groundwater. The modeling results indicate that simple linear ML models were found to be useful in the prediction of observations for datasets without any missing values, while tree-based regressors were more suitable for datasets containing missing values. Considering that missing values are common in datasets collected during contaminated site investigations, this could be of significant importance for contaminated site planners and managers, ultimately reducing site investigation and monitoring costs. Furthermore, we interpreted the proposed models using the SHAP (SHapley Additive exPlanations) approach to decipher the importance of different drivers in the prediction and simulation of critical hydrogeochemical variables. Among these, sum of chlorophenols is of highest significance in the analyses. Setting that aside from the model, tetra chlorophenols, dissolved organic carbon, and conductivity found to be of highest importance. Accordingly, ML methods could potentially be used to improve the understanding of groundwater contamination transport dynamics, filling gaps in knowledge that remain when using more sophisticated deterministic modeling approaches.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Contaminated sites, Explainable artificial intelligence, SHAP value, Sustainable remediation, Tree-based regression
in
Environmental Pollution
volume
345
article number
123449
publisher
Elsevier
external identifiers
  • scopus:85185169504
  • pmid:38278404
  • pmid:38278404
ISSN
0269-7491
DOI
10.1016/j.envpol.2024.123449
language
English
LU publication?
yes
id
dbf505c9-5e6a-4522-8f0a-d101ef59148e
date added to LUP
2024-02-01 08:00:17
date last changed
2024-04-23 16:01:51
@article{dbf505c9-5e6a-4522-8f0a-d101ef59148e,
  abstract     = {{<p>Pentachlorophenol (PCP) is a commonly found recalcitrant and toxic groundwater contaminant that resists degradation, bioaccumulates, and has a potential for long-range environmental transport. Taking proper actions to deal with the pollutant accounting for the life cycle consequences requires a better understanding of its behavior in the subsurface. We recognize the huge potential for enhancing decision-making at contaminated groundwater sites with the arrival of machine learning (ML) techniques in environmental applications. We used ML to enhance the understanding of the dynamics of PCP transport properties in the subsurface, and to determine key hydrochemical and hydrogeological drivers affecting its transport and fate. We demonstrate how this complementary knowledge, provided by data-driven methods, may enable a more targeted planning of monitoring and remediation at two highly contaminated Swedish groundwater sites, where the method was validated. We evaluated 6 interpretable ML methods, 3 linear regressors and 3 non-linear (i.e., tree-based) regressors, to predict PCP concentration in the groundwater. The modeling results indicate that simple linear ML models were found to be useful in the prediction of observations for datasets without any missing values, while tree-based regressors were more suitable for datasets containing missing values. Considering that missing values are common in datasets collected during contaminated site investigations, this could be of significant importance for contaminated site planners and managers, ultimately reducing site investigation and monitoring costs. Furthermore, we interpreted the proposed models using the SHAP (SHapley Additive exPlanations) approach to decipher the importance of different drivers in the prediction and simulation of critical hydrogeochemical variables. Among these, sum of chlorophenols is of highest significance in the analyses. Setting that aside from the model, tetra chlorophenols, dissolved organic carbon, and conductivity found to be of highest importance. Accordingly, ML methods could potentially be used to improve the understanding of groundwater contamination transport dynamics, filling gaps in knowledge that remain when using more sophisticated deterministic modeling approaches.</p>}},
  author       = {{Rad, Mehran and Abtahi, Azra and Berndtsson, Ronny and McKnight, Ursula S and Aminifar, Amir}},
  issn         = {{0269-7491}},
  keywords     = {{Contaminated sites; Explainable artificial intelligence; SHAP value; Sustainable remediation; Tree-based regression}},
  language     = {{eng}},
  month        = {{03}},
  publisher    = {{Elsevier}},
  series       = {{Environmental Pollution}},
  title        = {{Interpretable machine learning for predicting the fate and transport of pentachlorophenol in groundwater}},
  url          = {{http://dx.doi.org/10.1016/j.envpol.2024.123449}},
  doi          = {{10.1016/j.envpol.2024.123449}},
  volume       = {{345}},
  year         = {{2024}},
}