Global fields of daily accumulation-mode particle number concentrations using in situ observations, reanalysis data, and machine learning
(2025) In Aerosol Research 3.- Abstract
- Accurate global estimates of accumulation-mode particle number concentrations (N100) are essential for understanding aerosol–cloud interactions and their climate effects and for improving Earth system models. However, traditional methods relying on sparse in situ measurements lack comprehensive coverage, and indirect satellite retrievals have limited sensitivity in the relevant size range. To overcome these challenges, we apply machine learning (ML) techniques – multiple linear regression (MLR) and eXtreme Gradient Boosting (XGB) – to generate daily global N100 fields using in situ measurements as target variables and reanalysis data from the Copernicus Atmosphere Monitoring Service (CAMS) and ERA5 as predictor variables. Our... (More)
- Accurate global estimates of accumulation-mode particle number concentrations (N100) are essential for understanding aerosol–cloud interactions and their climate effects and for improving Earth system models. However, traditional methods relying on sparse in situ measurements lack comprehensive coverage, and indirect satellite retrievals have limited sensitivity in the relevant size range. To overcome these challenges, we apply machine learning (ML) techniques – multiple linear regression (MLR) and eXtreme Gradient Boosting (XGB) – to generate daily global N100 fields using in situ measurements as target variables and reanalysis data from the Copernicus Atmosphere Monitoring Service (CAMS) and ERA5 as predictor variables. Our cross-validation showed that ML models captured N100 concentrations well in environments well-represented in the training set, with over 70 % of daily estimates being within a factor of 1.5 of observations. However, performance declines in underrepresented regions and conditions, such as in clean and remote environments, including marine, tropical, and polar regions, underscoring the need for more diverse observations. The most important predictors for N100 in the ML models were aerosol-phase sulfate and gas-phase ammonia concentrations, followed by carbon monoxide and sulfur dioxide. Although black carbon and organic matter showed the highest feature importance values, their opposing signs in the MLR model coefficients suggest that their effects largely offset each other’s contributions to the N100 estimate. By directly linking estimates to in situ measurements, our ML approach provides valuable insights into the global distribution of N100 and serves as a complementary tool for evaluating Earth system model outputs and advancing the understanding of aerosol processes and their role in the climate system. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/0ff406f1-df38-4206-ac83-91fb59f662fb
- author
- organization
-
- Combustion Physics
- Lund Laser Centre, LLC
- LTH Profile Area: Photon Science and Technology
- LU Profile Area: Light and Materials
- LU Profile Area: Nature-based future solutions
- LTH Profile Area: The Energy Transition
- LTH Profile Area: Aerosols
- Dept of Physical Geography and Ecosystem Science
- Metalund
- MERGE: ModElling the Regional and Global Earth system
- publishing date
- 2025-11-28
- type
- Contribution to journal
- publication status
- published
- subject
- in
- Aerosol Research
- volume
- 3
- DOI
- 10.5194/ar-3-589-2025
- language
- English
- LU publication?
- yes
- id
- 0ff406f1-df38-4206-ac83-91fb59f662fb
- date added to LUP
- 2025-12-04 17:29:45
- date last changed
- 2025-12-05 08:27:12
@article{0ff406f1-df38-4206-ac83-91fb59f662fb,
abstract = {{Accurate global estimates of accumulation-mode particle number concentrations (N100) are essential for understanding aerosol–cloud interactions and their climate effects and for improving Earth system models. However, traditional methods relying on sparse in situ measurements lack comprehensive coverage, and indirect satellite retrievals have limited sensitivity in the relevant size range. To overcome these challenges, we apply machine learning (ML) techniques – multiple linear regression (MLR) and eXtreme Gradient Boosting (XGB) – to generate daily global N100 fields using in situ measurements as target variables and reanalysis data from the Copernicus Atmosphere Monitoring Service (CAMS) and ERA5 as predictor variables. Our cross-validation showed that ML models captured N100 concentrations well in environments well-represented in the training set, with over 70 % of daily estimates being within a factor of 1.5 of observations. However, performance declines in underrepresented regions and conditions, such as in clean and remote environments, including marine, tropical, and polar regions, underscoring the need for more diverse observations. The most important predictors for N100 in the ML models were aerosol-phase sulfate and gas-phase ammonia concentrations, followed by carbon monoxide and sulfur dioxide. Although black carbon and organic matter showed the highest feature importance values, their opposing signs in the MLR model coefficients suggest that their effects largely offset each other’s contributions to the N100 estimate. By directly linking estimates to in situ measurements, our ML approach provides valuable insights into the global distribution of N100 and serves as a complementary tool for evaluating Earth system model outputs and advancing the understanding of aerosol processes and their role in the climate system.}},
author = {{Ovaska, Aino and Rauth, Elio and Holmberg, Daniel and Artaxo, Paulo and Backman, John and Bergmans, Benjamin and Collins, Don and Franco, Marco Aurelio and gani, Shahzad and Harrison, Roy M. and Hooda, Rakesh and Hussein, Tareq and Hyvärinen, Antti-Pekka and Jaars, Kerneels and Kristensson, Adam and Kulmala, Markku and Laakso, L. and Laaksonen, Ari and Mihalopoulos, Nikolaos and O'Dowd, Colin and Ondracek, Jakub and Petäjä, Tuukka and Plauskaite, Kristina and Pöhlker, Mira and Qi, Ximeng and Tunved, Peter and Vakkari, Ville and Wiedensohler, A. and Puolamäki, Kai and Nieminen, Tuomo and Veli-Matti, Kerminen and Sinclair, Victoria A. and Paasonen, Pauli}},
language = {{eng}},
month = {{11}},
series = {{Aerosol Research}},
title = {{Global fields of daily accumulation-mode particle number concentrations using in situ observations, reanalysis data, and machine learning}},
url = {{http://dx.doi.org/10.5194/ar-3-589-2025}},
doi = {{10.5194/ar-3-589-2025}},
volume = {{3}},
year = {{2025}},
}
