Uncertainty quantification metrics for deep regression
(2024) In Pattern Recognition Letters 186. p.91-97- Abstract
When deploying deep neural networks on robots or other physical systems, the learned model should reliably quantify predictive uncertainty. A reliable uncertainty allows downstream modules to reason about the safety of its actions. In this work, we address metrics for uncertainty quantification. Specifically, we focus on regression tasks, and investigate Area Under Sparsification Error (AUSE), Calibration Error (CE), Spearman's Rank Correlation, and Negative Log-Likelihood (NLL). Using multiple datasets, we look into how those metrics behave under four typical types of uncertainty, their stability regarding the size of the test set, and reveal their strengths and weaknesses. Our results indicate that Calibration Error is the most stable... (More)
When deploying deep neural networks on robots or other physical systems, the learned model should reliably quantify predictive uncertainty. A reliable uncertainty allows downstream modules to reason about the safety of its actions. In this work, we address metrics for uncertainty quantification. Specifically, we focus on regression tasks, and investigate Area Under Sparsification Error (AUSE), Calibration Error (CE), Spearman's Rank Correlation, and Negative Log-Likelihood (NLL). Using multiple datasets, we look into how those metrics behave under four typical types of uncertainty, their stability regarding the size of the test set, and reveal their strengths and weaknesses. Our results indicate that Calibration Error is the most stable and interpretable metric, but AUSE and NLL also have their respective use cases. We discourage the usage of Spearman's Rank Correlation for evaluating uncertainties and recommend replacing it with AUSE.
(Less)
- author
- Kristoffersson Lind, Simon LU ; Xiong, Ziliang ; Forssén, Per Erik and Krüger, Volker LU
- organization
- publishing date
- 2024-10
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- Evaluation, Metrics, Regression, Uncertainty
- in
- Pattern Recognition Letters
- volume
- 186
- pages
- 7 pages
- publisher
- Elsevier
- external identifiers
-
- scopus:85204805492
- ISSN
- 0167-8655
- DOI
- 10.1016/j.patrec.2024.09.011
- language
- English
- LU publication?
- yes
- id
- de4539c2-46be-4f9f-b41b-19d580b8a41d
- date added to LUP
- 2024-11-15 11:34:18
- date last changed
- 2024-11-15 11:34:40
@article{de4539c2-46be-4f9f-b41b-19d580b8a41d, abstract = {{<p>When deploying deep neural networks on robots or other physical systems, the learned model should reliably quantify predictive uncertainty. A reliable uncertainty allows downstream modules to reason about the safety of its actions. In this work, we address metrics for uncertainty quantification. Specifically, we focus on regression tasks, and investigate Area Under Sparsification Error (AUSE), Calibration Error (CE), Spearman's Rank Correlation, and Negative Log-Likelihood (NLL). Using multiple datasets, we look into how those metrics behave under four typical types of uncertainty, their stability regarding the size of the test set, and reveal their strengths and weaknesses. Our results indicate that Calibration Error is the most stable and interpretable metric, but AUSE and NLL also have their respective use cases. We discourage the usage of Spearman's Rank Correlation for evaluating uncertainties and recommend replacing it with AUSE.</p>}}, author = {{Kristoffersson Lind, Simon and Xiong, Ziliang and Forssén, Per Erik and Krüger, Volker}}, issn = {{0167-8655}}, keywords = {{Evaluation; Metrics; Regression; Uncertainty}}, language = {{eng}}, pages = {{91--97}}, publisher = {{Elsevier}}, series = {{Pattern Recognition Letters}}, title = {{Uncertainty quantification metrics for deep regression}}, url = {{http://dx.doi.org/10.1016/j.patrec.2024.09.011}}, doi = {{10.1016/j.patrec.2024.09.011}}, volume = {{186}}, year = {{2024}}, }