Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Global fields of daily accumulation-mode particle number concentrations using in situ observations, reanalysis data, and machine learning

Ovaska, Aino ; Rauth, Elio ; Holmberg, Daniel ; Artaxo, Paulo ; Backman, John ; Bergmans, Benjamin ; Collins, Don ; Franco, Marco Aurelio ; gani, Shahzad and Harrison, Roy M. , et al. (2025) In Aerosol Research 3.
Abstract
Accurate global estimates of accumulation-mode particle number concentrations (N100) are essential for understanding aerosol–cloud interactions and their climate effects and for improving Earth system models. However, traditional methods relying on sparse in situ measurements lack comprehensive coverage, and indirect satellite retrievals have limited sensitivity in the relevant size range. To overcome these challenges, we apply machine learning (ML) techniques – multiple linear regression (MLR) and eXtreme Gradient Boosting (XGB) – to generate daily global N100 fields using in situ measurements as target variables and reanalysis data from the Copernicus Atmosphere Monitoring Service (CAMS) and ERA5 as predictor variables. Our... (More)
Accurate global estimates of accumulation-mode particle number concentrations (N100) are essential for understanding aerosol–cloud interactions and their climate effects and for improving Earth system models. However, traditional methods relying on sparse in situ measurements lack comprehensive coverage, and indirect satellite retrievals have limited sensitivity in the relevant size range. To overcome these challenges, we apply machine learning (ML) techniques – multiple linear regression (MLR) and eXtreme Gradient Boosting (XGB) – to generate daily global N100 fields using in situ measurements as target variables and reanalysis data from the Copernicus Atmosphere Monitoring Service (CAMS) and ERA5 as predictor variables. Our cross-validation showed that ML models captured N100 concentrations well in environments well-represented in the training set, with over 70 % of daily estimates being within a factor of 1.5 of observations. However, performance declines in underrepresented regions and conditions, such as in clean and remote environments, including marine, tropical, and polar regions, underscoring the need for more diverse observations. The most important predictors for N100 in the ML models were aerosol-phase sulfate and gas-phase ammonia concentrations, followed by carbon monoxide and sulfur dioxide. Although black carbon and organic matter showed the highest feature importance values, their opposing signs in the MLR model coefficients suggest that their effects largely offset each other’s contributions to the N100 estimate. By directly linking estimates to in situ measurements, our ML approach provides valuable insights into the global distribution of N100 and serves as a complementary tool for evaluating Earth system model outputs and advancing the understanding of aerosol processes and their role in the climate system. (Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; ; ; and , et al. (More)
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; and (Less)
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
Aerosol Research
volume
3
DOI
10.5194/ar-3-589-2025
language
English
LU publication?
yes
id
0ff406f1-df38-4206-ac83-91fb59f662fb
date added to LUP
2025-12-04 17:29:45
date last changed
2025-12-05 08:27:12
@article{0ff406f1-df38-4206-ac83-91fb59f662fb,
  abstract     = {{Accurate global estimates of accumulation-mode particle number concentrations (N100) are essential for understanding aerosol–cloud interactions and their climate effects and for improving Earth system models. However, traditional methods relying on sparse in situ measurements lack comprehensive coverage, and indirect satellite retrievals have limited sensitivity in the relevant size range. To overcome these challenges, we apply machine learning (ML) techniques – multiple linear regression (MLR) and eXtreme Gradient Boosting (XGB) – to generate daily global N100 fields using in situ measurements as target variables and reanalysis data from the Copernicus Atmosphere Monitoring Service (CAMS) and ERA5 as predictor variables. Our cross-validation showed that ML models captured N100 concentrations well in environments well-represented in the training set, with over 70 % of daily estimates being within a factor of 1.5 of observations. However, performance declines in underrepresented regions and conditions, such as in clean and remote environments, including marine, tropical, and polar regions, underscoring the need for more diverse observations. The most important predictors for N100 in the ML models were aerosol-phase sulfate and gas-phase ammonia concentrations, followed by carbon monoxide and sulfur dioxide. Although black carbon and organic matter showed the highest feature importance values, their opposing signs in the MLR model coefficients suggest that their effects largely offset each other’s contributions to the N100 estimate. By directly linking estimates to in situ measurements, our ML approach provides valuable insights into the global distribution of N100 and serves as a complementary tool for evaluating Earth system model outputs and advancing the understanding of aerosol processes and their role in the climate system.}},
  author       = {{Ovaska, Aino and Rauth, Elio and Holmberg, Daniel and Artaxo, Paulo and Backman, John and Bergmans, Benjamin and Collins, Don and Franco, Marco Aurelio and gani, Shahzad and Harrison, Roy M. and Hooda, Rakesh and Hussein, Tareq and Hyvärinen, Antti-Pekka and Jaars, Kerneels and Kristensson, Adam and Kulmala, Markku and Laakso, L. and Laaksonen, Ari and Mihalopoulos, Nikolaos and O'Dowd, Colin and Ondracek, Jakub and Petäjä, Tuukka and Plauskaite, Kristina and Pöhlker, Mira and Qi, Ximeng and Tunved, Peter and Vakkari, Ville and Wiedensohler, A. and Puolamäki, Kai and Nieminen, Tuomo and Veli-Matti, Kerminen and Sinclair, Victoria A. and Paasonen, Pauli}},
  language     = {{eng}},
  month        = {{11}},
  series       = {{Aerosol Research}},
  title        = {{Global fields of daily accumulation-mode particle number concentrations using in situ observations, reanalysis data, and machine learning}},
  url          = {{http://dx.doi.org/10.5194/ar-3-589-2025}},
  doi          = {{10.5194/ar-3-589-2025}},
  volume       = {{3}},
  year         = {{2025}},
}