Machine learning-enhanced gas sensor technology identifies ovarian and endometrial cancer of all stages through plasma volatile organic compound patterns
(2025) In EBioMedicine 122.- Abstract
Background: Ovarian cancer presents with non-specific symptoms that make early diagnosis challenging and the prognosis poor. Ovarian and endometrial cancers exhibit similar genomic mutations and biomarker profiles. Endogenous volatile organic compounds (VOCs) are products of metabolic activity. In cancer, metabolites increase due to tumour necrosis, leading to cancer-specific VOC patterns. The aim of this study was to evaluate VOC analyses in plasma as diagnostic tests for early diagnosis of ovarian and endometrial cancer. Methods: Preoperative plasma from 133 women with ovarian cancer (stages I–IV or borderline ovarian tumors) and 41 women with endometrial cancer (stages I-IV) was compared to 115 healthy controls with highly sensitive... (More)
Background: Ovarian cancer presents with non-specific symptoms that make early diagnosis challenging and the prognosis poor. Ovarian and endometrial cancers exhibit similar genomic mutations and biomarker profiles. Endogenous volatile organic compounds (VOCs) are products of metabolic activity. In cancer, metabolites increase due to tumour necrosis, leading to cancer-specific VOC patterns. The aim of this study was to evaluate VOC analyses in plasma as diagnostic tests for early diagnosis of ovarian and endometrial cancer. Methods: Preoperative plasma from 133 women with ovarian cancer (stages I–IV or borderline ovarian tumors) and 41 women with endometrial cancer (stages I-IV) was compared to 115 healthy controls with highly sensitive gas sensors. Data analyses were performed using feature extraction from 32 gas sensors per sample. 85 features were extracted from each signal (including statistical, time-domain, and frequency-domain features), and training and test datasets were formed. The features underwent an iterative redundancy removal process for stepwise optimization of models. Model robustness was assessed using a train/test split scheme with unique datasets, leading to a model optimized for diagnostic performance. By implementing sequential binary classification boosting-based models, it was possible to determine not only the presence or not of cancer, but also to distinguish between ovarian- and endometrial cancer, and the stage of the cancer. Findings: The VOC analysis, powered by a 5-fold cross-validated ensemble classifier, achieved exceptional diagnostic performance. It correctly identified all 133 patients with ovarian cancer and borderline ovarian tumors, all 41 cases of endometrial cancer, and all 115 healthy controls. For staging, the model accurately classified 172 out of 174 (98.9%) cancer cases as stage I vs. II–IV. On validation data, the classifier yielded an accuracy of 96.63% (95% CI: 96.56%–96.70%), sensitivity of 96.42% (95% CI: 96.29%–96.54%), and specificity of 96.88% (95% CI: 96.76%–97.01%). These metrics were robustly replicated on the independent test set, with an accuracy of 97.13% (95% CI: 96.80%–97.45%), sensitivity of 96.92% (95% CI: 96.49%–97.35%), and specificity of 97.37% (95% CI: 96.97%–97.77%). Complementing this, four additional classifiers (each with accuracy exceeding 90%) were developed and integrated into a cascaded algorithm to enable multi-class discrimination (ovarian cancer and endometrial cancer vs. healthy controls), cancer typing (ovarian vs. endometrial), and staging (stage I vs. later stages). The analysis of VOCs correctly identified 133 out of 133 patients with ovarian cancer and borderline ovarian tumour. All 41 cases of endometrial cancer were correctly identified, as were all the 115 healthy controls. In 172 out of 174 (98.9%) cancer cases the model correctly classified stage I vs. II-IV. Interpretation: VOC analysis emitted to gas-phase from plasma demonstrates high sensitivity and specificity for diagnosing ovarian cancer, including borderline ovarian tumors and endometrial cancers, compared to healthy controls. VOC analyses accurately differentiated between early and advanced stages of both cancer types. Future external validation needs to be performed. Funding: The Strategic Innovation Programs Swelife and MedTech4Health, a joint venture of Vinnova, Formas and the Energy Agency (grant No. 2022-03464 and grant No. 2023-03874) and within the Convergence Accelerator Program (Track L – Real-World Chemical Sensing Applications), funded by the Swedish Research Council, Vetenskapsrådet (grant No. 2023-07219), and Sweden's Innovation Agency, Vinnova (grant No. 2023-04186), in collaboration with the US National Science Foundation (NSF). The computations, data handling, and machine learning model training and testing were conducted in the MATLAB environment and enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022–06725. This work received funding from the European Union's Horizon Europe Research and Innovation Programme under Grant Agreement No 101214318 (DISARM). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the Health and Digital Executive Agency (HaDEA). Neither the European Union nor HaDEA can be held responsible for them.
(Less)
- author
- Eriksson, Jens ; Puglisi, Donatella ; Herbst, Filip LU ; Dobilas, Arturas LU ; Shtepliuk, Ivan ; Joneborg, Ulrika ; Falconer, Henrik ; Rådestad, Angelique Flöter and Borgfeldt, Christer LU
- organization
- publishing date
- 2025-12
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- Electronic nose, Endometrial cancer, Metabolomics, Ovarian cancer, Plasma analysis, Volatile organic compounds
- in
- EBioMedicine
- volume
- 122
- article number
- 106027
- publisher
- Elsevier
- external identifiers
-
- pmid:41237666
- scopus:105021332806
- ISSN
- 2352-3964
- DOI
- 10.1016/j.ebiom.2025.106027
- language
- English
- LU publication?
- yes
- id
- 471b21ca-2c2a-4565-b06c-39c8881d0c01
- date added to LUP
- 2025-12-08 14:31:00
- date last changed
- 2025-12-09 03:00:07
@article{471b21ca-2c2a-4565-b06c-39c8881d0c01,
abstract = {{<p>Background: Ovarian cancer presents with non-specific symptoms that make early diagnosis challenging and the prognosis poor. Ovarian and endometrial cancers exhibit similar genomic mutations and biomarker profiles. Endogenous volatile organic compounds (VOCs) are products of metabolic activity. In cancer, metabolites increase due to tumour necrosis, leading to cancer-specific VOC patterns. The aim of this study was to evaluate VOC analyses in plasma as diagnostic tests for early diagnosis of ovarian and endometrial cancer. Methods: Preoperative plasma from 133 women with ovarian cancer (stages I–IV or borderline ovarian tumors) and 41 women with endometrial cancer (stages I-IV) was compared to 115 healthy controls with highly sensitive gas sensors. Data analyses were performed using feature extraction from 32 gas sensors per sample. 85 features were extracted from each signal (including statistical, time-domain, and frequency-domain features), and training and test datasets were formed. The features underwent an iterative redundancy removal process for stepwise optimization of models. Model robustness was assessed using a train/test split scheme with unique datasets, leading to a model optimized for diagnostic performance. By implementing sequential binary classification boosting-based models, it was possible to determine not only the presence or not of cancer, but also to distinguish between ovarian- and endometrial cancer, and the stage of the cancer. Findings: The VOC analysis, powered by a 5-fold cross-validated ensemble classifier, achieved exceptional diagnostic performance. It correctly identified all 133 patients with ovarian cancer and borderline ovarian tumors, all 41 cases of endometrial cancer, and all 115 healthy controls. For staging, the model accurately classified 172 out of 174 (98.9%) cancer cases as stage I vs. II–IV. On validation data, the classifier yielded an accuracy of 96.63% (95% CI: 96.56%–96.70%), sensitivity of 96.42% (95% CI: 96.29%–96.54%), and specificity of 96.88% (95% CI: 96.76%–97.01%). These metrics were robustly replicated on the independent test set, with an accuracy of 97.13% (95% CI: 96.80%–97.45%), sensitivity of 96.92% (95% CI: 96.49%–97.35%), and specificity of 97.37% (95% CI: 96.97%–97.77%). Complementing this, four additional classifiers (each with accuracy exceeding 90%) were developed and integrated into a cascaded algorithm to enable multi-class discrimination (ovarian cancer and endometrial cancer vs. healthy controls), cancer typing (ovarian vs. endometrial), and staging (stage I vs. later stages). The analysis of VOCs correctly identified 133 out of 133 patients with ovarian cancer and borderline ovarian tumour. All 41 cases of endometrial cancer were correctly identified, as were all the 115 healthy controls. In 172 out of 174 (98.9%) cancer cases the model correctly classified stage I vs. II-IV. Interpretation: VOC analysis emitted to gas-phase from plasma demonstrates high sensitivity and specificity for diagnosing ovarian cancer, including borderline ovarian tumors and endometrial cancers, compared to healthy controls. VOC analyses accurately differentiated between early and advanced stages of both cancer types. Future external validation needs to be performed. Funding: The Strategic Innovation Programs Swelife and MedTech4Health, a joint venture of Vinnova, Formas and the Energy Agency (grant No. 2022-03464 and grant No. 2023-03874) and within the Convergence Accelerator Program (Track L – Real-World Chemical Sensing Applications), funded by the Swedish Research Council, Vetenskapsrådet (grant No. 2023-07219), and Sweden's Innovation Agency, Vinnova (grant No. 2023-04186), in collaboration with the US National Science Foundation (NSF). The computations, data handling, and machine learning model training and testing were conducted in the MATLAB environment and enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022–06725. This work received funding from the European Union's Horizon Europe Research and Innovation Programme under Grant Agreement No 101214318 (DISARM). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the Health and Digital Executive Agency (HaDEA). Neither the European Union nor HaDEA can be held responsible for them.</p>}},
author = {{Eriksson, Jens and Puglisi, Donatella and Herbst, Filip and Dobilas, Arturas and Shtepliuk, Ivan and Joneborg, Ulrika and Falconer, Henrik and Rådestad, Angelique Flöter and Borgfeldt, Christer}},
issn = {{2352-3964}},
keywords = {{Electronic nose; Endometrial cancer; Metabolomics; Ovarian cancer; Plasma analysis; Volatile organic compounds}},
language = {{eng}},
publisher = {{Elsevier}},
series = {{EBioMedicine}},
title = {{Machine learning-enhanced gas sensor technology identifies ovarian and endometrial cancer of all stages through plasma volatile organic compound patterns}},
url = {{http://dx.doi.org/10.1016/j.ebiom.2025.106027}},
doi = {{10.1016/j.ebiom.2025.106027}},
volume = {{122}},
year = {{2025}},
}