Predicting Monthly Community-Level Domestic Radon Concentrations in the Greater Boston Area with an Ensemble Learning Model
(2021) In Environmental Science & Technology 55(10). p.7157-7166- Abstract
Inhaling radon and its progeny is associated with adverse health outcomes. However, previous studies of the health effects of residential exposure to radon in the United States were commonly based on a county-level temporally invariant radon model that was developed using measurements collected in the mid- to late 1980s. We developed a machine learning model to predict monthly radon concentrations for each ZIP Code Tabulation Area (ZCTA) in the Greater Boston area based on 363,783 short-term measurements by Spruce Environmental Technologies, Inc., during the period 2005-2018. A two-stage ensemble-based model was developed to predict radon concentrations for all ZCTAs and months. Stage one included 12 base statistical models that... (More)
Inhaling radon and its progeny is associated with adverse health outcomes. However, previous studies of the health effects of residential exposure to radon in the United States were commonly based on a county-level temporally invariant radon model that was developed using measurements collected in the mid- to late 1980s. We developed a machine learning model to predict monthly radon concentrations for each ZIP Code Tabulation Area (ZCTA) in the Greater Boston area based on 363,783 short-term measurements by Spruce Environmental Technologies, Inc., during the period 2005-2018. A two-stage ensemble-based model was developed to predict radon concentrations for all ZCTAs and months. Stage one included 12 base statistical models that independently predicted ZCTA-level radon concentrations based on geological, architectural, socioeconomic, and meteorological factors for each ZCTA. Stage two aggregated the predictions of these 12 base models using an ensemble learning method. The results of a 10-fold cross-validation showed that the stage-two model has a good prediction accuracy with a weighted R2 of 0.63 and root mean square error of 22.6 Bq/m3. The community-level time-varying predictions from our model have good predictive precision and accuracy and can be used in future prospective epidemiological studies in the Greater Boston area.
(Less)
- author
- Li, Longxiang ; Blomberg, Annelise J LU ; Stern, Rebecca A ; Kang, Choong-Min ; Papatheodorou, Stefania ; Wei, Yaguang ; Liu, Man ; Peralta, Adjani A ; Vieira, Carolina L Z and Koutrakis, Petros
- publishing date
- 2021
- type
- Contribution to journal
- publication status
- published
- keywords
- Air Pollutants, Radioactive/analysis, Air Pollution, Indoor/analysis, Boston, Housing, Machine Learning, Models, Statistical, Radon/analysis, United States
- in
- Environmental Science & Technology
- volume
- 55
- issue
- 10
- pages
- 7157 - 7166
- publisher
- The American Chemical Society (ACS)
- external identifiers
-
- pmid:33939421
- scopus:85106504351
- ISSN
- 1520-5851
- DOI
- 10.1021/acs.est.0c08792
- language
- English
- LU publication?
- no
- id
- 9d4d0778-b4b8-4095-8afa-5c13711ba960
- date added to LUP
- 2021-09-09 11:31:21
- date last changed
- 2024-09-22 01:03:30
@article{9d4d0778-b4b8-4095-8afa-5c13711ba960, abstract = {{<p>Inhaling radon and its progeny is associated with adverse health outcomes. However, previous studies of the health effects of residential exposure to radon in the United States were commonly based on a county-level temporally invariant radon model that was developed using measurements collected in the mid- to late 1980s. We developed a machine learning model to predict monthly radon concentrations for each ZIP Code Tabulation Area (ZCTA) in the Greater Boston area based on 363,783 short-term measurements by Spruce Environmental Technologies, Inc., during the period 2005-2018. A two-stage ensemble-based model was developed to predict radon concentrations for all ZCTAs and months. Stage one included 12 base statistical models that independently predicted ZCTA-level radon concentrations based on geological, architectural, socioeconomic, and meteorological factors for each ZCTA. Stage two aggregated the predictions of these 12 base models using an ensemble learning method. The results of a 10-fold cross-validation showed that the stage-two model has a good prediction accuracy with a weighted R2 of 0.63 and root mean square error of 22.6 Bq/m3. The community-level time-varying predictions from our model have good predictive precision and accuracy and can be used in future prospective epidemiological studies in the Greater Boston area.</p>}}, author = {{Li, Longxiang and Blomberg, Annelise J and Stern, Rebecca A and Kang, Choong-Min and Papatheodorou, Stefania and Wei, Yaguang and Liu, Man and Peralta, Adjani A and Vieira, Carolina L Z and Koutrakis, Petros}}, issn = {{1520-5851}}, keywords = {{Air Pollutants, Radioactive/analysis; Air Pollution, Indoor/analysis; Boston; Housing; Machine Learning; Models, Statistical; Radon/analysis; United States}}, language = {{eng}}, number = {{10}}, pages = {{7157--7166}}, publisher = {{The American Chemical Society (ACS)}}, series = {{Environmental Science & Technology}}, title = {{Predicting Monthly Community-Level Domestic Radon Concentrations in the Greater Boston Area with an Ensemble Learning Model}}, url = {{http://dx.doi.org/10.1021/acs.est.0c08792}}, doi = {{10.1021/acs.est.0c08792}}, volume = {{55}}, year = {{2021}}, }