Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Prediction of chlorophyll a concentration and analyzing its relationship with environmental factors in lake Vänern using random forest algorithm

Budakoti, Sachin LU and Pal, Mahendra LU (2026) In Limnology 27(2). p.235-250
Abstract

Chlorophyll a concentration is a crucial indicator of marine primary productivity and its accurate prediction is vital for the early warning systems in marine ecosystem. Present work focused to examine the relationship between chlorophyll a concentration and environmental factors in Lake Vänern and to predict the chlorophyll a concentration using Random Forest model and Generalized linear model during the period 2003 to 2023. The random forest model considers the optical (diffuse attenuation coefficient at 490 nm, Normalized Fluorescence Line Height) and meteorological parameters (Precipitation, surface air temperature, wind speed etc.) of Lake Vänern as inputs and further used feature importance ranking and partial dependence plots to... (More)

Chlorophyll a concentration is a crucial indicator of marine primary productivity and its accurate prediction is vital for the early warning systems in marine ecosystem. Present work focused to examine the relationship between chlorophyll a concentration and environmental factors in Lake Vänern and to predict the chlorophyll a concentration using Random Forest model and Generalized linear model during the period 2003 to 2023. The random forest model considers the optical (diffuse attenuation coefficient at 490 nm, Normalized Fluorescence Line Height) and meteorological parameters (Precipitation, surface air temperature, wind speed etc.) of Lake Vänern as inputs and further used feature importance ranking and partial dependence plots to identify the dominant drivers of chlorophyll a concentration. Ranking of feature importance indicates the close relationship between response variable (chlorophyll a concentration) and high importance feature in random forest model. From results it is observed that the random forest model achieved a high prediction accuracy, with a coefficient of determination (R2) of 0.98 and a root mean square error (RMSE) of 0.005 mg m− 3. From Random Forest model and Generalized linear model it is observed that particulate inorganic carbon and water temperature are the dominant drivers in limiting the chlorophyll a concentration in Lake Vänern. This study offers a valuable approach for accurately predicting chlorophyll a concentration, aiding in the control and prevention of harmful lake blooms thereby reducing their adverse impacts on the marine ecosystem and the surrounding environment.

(Less)
Please use this url to cite or link to this publication:
author
and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Chlorophyll a, Feature important ranking, Generalized linear model, Lake blooms, Random forest model
in
Limnology
volume
27
issue
2
pages
235 - 250
publisher
Springer
external identifiers
  • scopus:105022833135
ISSN
1439-8621
DOI
10.1007/s10201-025-00822-8
language
English
LU publication?
yes
id
9f59bcdc-7f13-4e49-8386-9cb853ad7a8c
date added to LUP
2026-02-05 13:42:15
date last changed
2026-06-10 09:10:59
@article{9f59bcdc-7f13-4e49-8386-9cb853ad7a8c,
  abstract     = {{<p>Chlorophyll a concentration is a crucial indicator of marine primary productivity and its accurate prediction is vital for the early warning systems in marine ecosystem. Present work focused to examine the relationship between chlorophyll a concentration and environmental factors in Lake Vänern and to predict the chlorophyll a concentration using Random Forest model and Generalized linear model during the period 2003 to 2023. The random forest model considers the optical (diffuse attenuation coefficient at 490 nm, Normalized Fluorescence Line Height) and meteorological parameters (Precipitation, surface air temperature, wind speed etc.) of Lake Vänern as inputs and further used feature importance ranking and partial dependence plots to identify the dominant drivers of chlorophyll a concentration. Ranking of feature importance indicates the close relationship between response variable (chlorophyll a concentration) and high importance feature in random forest model. From results it is observed that the random forest model achieved a high prediction accuracy, with a coefficient of determination (R<sup>2</sup>) of 0.98 and a root mean square error (RMSE) of 0.005 mg m<sup>− 3</sup>. From Random Forest model and Generalized linear model it is observed that particulate inorganic carbon and water temperature are the dominant drivers in limiting the chlorophyll a concentration in Lake Vänern. This study offers a valuable approach for accurately predicting chlorophyll a concentration, aiding in the control and prevention of harmful lake blooms thereby reducing their adverse impacts on the marine ecosystem and the surrounding environment.</p>}},
  author       = {{Budakoti, Sachin and Pal, Mahendra}},
  issn         = {{1439-8621}},
  keywords     = {{Chlorophyll a; Feature important ranking; Generalized linear model; Lake blooms; Random forest model}},
  language     = {{eng}},
  number       = {{2}},
  pages        = {{235--250}},
  publisher    = {{Springer}},
  series       = {{Limnology}},
  title        = {{Prediction of chlorophyll a concentration and analyzing its relationship with environmental factors in lake Vänern using random forest algorithm}},
  url          = {{http://dx.doi.org/10.1007/s10201-025-00822-8}},
  doi          = {{10.1007/s10201-025-00822-8}},
  volume       = {{27}},
  year         = {{2026}},
}