Towards Trustworthy Machine Learning in High-Stakes Decision Making
(2026)- Abstract
- The reliable integration of Deep Neural Networks (DNNs) into safety-critical settings requires guarantees that extend beyond empirical accuracy. Despite strong prediction performance, DNNs remain vulnerable to adversarial perturbations, information leakage, and disparities across different populations. Addressing these limitations demands scalable methods for formally analyzing network behavior under uncertainty.
This thesis develops a principled perspective on the verification and characterization of trustworthiness properties in DNNs. Robustness, privacy leakage, and fairness are formulated as optimization problems over network outputs subject to variations in inputs, models, or datasets.
A primary contribution is the... (More) - The reliable integration of Deep Neural Networks (DNNs) into safety-critical settings requires guarantees that extend beyond empirical accuracy. Despite strong prediction performance, DNNs remain vulnerable to adversarial perturbations, information leakage, and disparities across different populations. Addressing these limitations demands scalable methods for formally analyzing network behavior under uncertainty.
This thesis develops a principled perspective on the verification and characterization of trustworthiness properties in DNNs. Robustness, privacy leakage, and fairness are formulated as optimization problems over network outputs subject to variations in inputs, models, or datasets.
A primary contribution is the development of methods for robustness verification based on refinement of relaxation-based approximations. In particular, a layer-wise refinement strategy is introduced that tightens convex relaxations of nonlinear activation constraints. The approach progressively tightens relaxation-based approximations, yielding improvements of sound bounds while avoiding full combinatorial enumeration of activation patterns. In addition, the thesis establishes a perspective on how model properties influence verifiability and introduces verification-friendly network transformations that improve bound tightness without degrading prediction performance.
Beyond single network analysis, the thesis proposes verification frameworks for comparing model variants over shared input regions. This formulation enables verification of local implication and behavioral preservation under model transformations, with joint optimization providing strictly tighter guarantees than independent analyses.
The thesis further analyzes cross-dimensional relations between robustness, privacy, and fairness. In personalized learning settings, it demonstrates that robustness-induced representations can encode identity-specific patterns, enabling patient membership inference without access to training samples. It also introduces a locally-persistent counterfactual bias formulation that captures fairness disparities within perturbation regions and can be incorporated as a training regularizer.
In summary, this thesis establishes the formal ground for analyzing and improving trustworthiness properties of DNNs. The results contribute to optimization-based verification, model transformation analysis, and the study of interactions between robustness, privacy, and fairness, contributing to more reliable machine learning. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/63b87b43-6ab6-459b-9f90-d73758dd1ad4
- author
- Baninajjar, Anahita
LU
- supervisor
- opponent
-
- Assoc. Prof. Shoukry, Yasser, University of California, USA.
- organization
- publishing date
- 2026
- type
- Thesis
- publication status
- published
- subject
- publisher
- Electrical and Information Technology, Lund University
- defense location
- Lecture Hall E:1406, building E, Ole Römers väg 3, Faculty of Engineering LTH, Lund University, Lund. The dissertation will be live streamed, but part of the premises is to be excluded from the live stream.
- defense date
- 2026-06-11 09:15:00
- ISBN
- 978-91-8104-992-3
- 978-91-8104-993-0
- language
- English
- LU publication?
- yes
- id
- 63b87b43-6ab6-459b-9f90-d73758dd1ad4
- date added to LUP
- 2026-05-12 14:38:06
- date last changed
- 2026-05-20 08:47:41
@phdthesis{63b87b43-6ab6-459b-9f90-d73758dd1ad4,
abstract = {{The reliable integration of Deep Neural Networks (DNNs) into safety-critical settings requires guarantees that extend beyond empirical accuracy. Despite strong prediction performance, DNNs remain vulnerable to adversarial perturbations, information leakage, and disparities across different populations. Addressing these limitations demands scalable methods for formally analyzing network behavior under uncertainty.<br/><br/>This thesis develops a principled perspective on the verification and characterization of trustworthiness properties in DNNs. Robustness, privacy leakage, and fairness are formulated as optimization problems over network outputs subject to variations in inputs, models, or datasets.<br/><br/>A primary contribution is the development of methods for robustness verification based on refinement of relaxation-based approximations. In particular, a layer-wise refinement strategy is introduced that tightens convex relaxations of nonlinear activation constraints. The approach progressively tightens relaxation-based approximations, yielding improvements of sound bounds while avoiding full combinatorial enumeration of activation patterns. In addition, the thesis establishes a perspective on how model properties influence verifiability and introduces verification-friendly network transformations that improve bound tightness without degrading prediction performance.<br/><br/>Beyond single network analysis, the thesis proposes verification frameworks for comparing model variants over shared input regions. This formulation enables verification of local implication and behavioral preservation under model transformations, with joint optimization providing strictly tighter guarantees than independent analyses.<br/><br/>The thesis further analyzes cross-dimensional relations between robustness, privacy, and fairness. In personalized learning settings, it demonstrates that robustness-induced representations can encode identity-specific patterns, enabling patient membership inference without access to training samples. It also introduces a locally-persistent counterfactual bias formulation that captures fairness disparities within perturbation regions and can be incorporated as a training regularizer.<br/><br/>In summary, this thesis establishes the formal ground for analyzing and improving trustworthiness properties of DNNs. The results contribute to optimization-based verification, model transformation analysis, and the study of interactions between robustness, privacy, and fairness, contributing to more reliable machine learning.}},
author = {{Baninajjar, Anahita}},
isbn = {{978-91-8104-992-3}},
language = {{eng}},
publisher = {{Electrical and Information Technology, Lund University}},
school = {{Lund University}},
title = {{Towards Trustworthy Machine Learning in High-Stakes Decision Making}},
url = {{https://lup.lub.lu.se/search/files/249941660/Towards_Trustworthy_Machine_Learning_in_High-Stakes_Decision_Making.pdf}},
year = {{2026}},
}