Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Towards Trustworthy Machine Learning in High-Stakes Decision Making

Baninajjar, Anahita LU orcid (2026)
Abstract
The reliable integration of Deep Neural Networks (DNNs) into safety-critical settings requires guarantees that extend beyond empirical accuracy. Despite strong prediction performance, DNNs remain vulnerable to adversarial perturbations, information leakage, and disparities across different populations. Addressing these limitations demands scalable methods for formally analyzing network behavior under uncertainty.

This thesis develops a principled perspective on the verification and characterization of trustworthiness properties in DNNs. Robustness, privacy leakage, and fairness are formulated as optimization problems over network outputs subject to variations in inputs, models, or datasets.

A primary contribution is the... (More)
The reliable integration of Deep Neural Networks (DNNs) into safety-critical settings requires guarantees that extend beyond empirical accuracy. Despite strong prediction performance, DNNs remain vulnerable to adversarial perturbations, information leakage, and disparities across different populations. Addressing these limitations demands scalable methods for formally analyzing network behavior under uncertainty.

This thesis develops a principled perspective on the verification and characterization of trustworthiness properties in DNNs. Robustness, privacy leakage, and fairness are formulated as optimization problems over network outputs subject to variations in inputs, models, or datasets.

A primary contribution is the development of methods for robustness verification based on refinement of relaxation-based approximations. In particular, a layer-wise refinement strategy is introduced that tightens convex relaxations of nonlinear activation constraints. The approach progressively tightens relaxation-based approximations, yielding improvements of sound bounds while avoiding full combinatorial enumeration of activation patterns. In addition, the thesis establishes a perspective on how model properties influence verifiability and introduces verification-friendly network transformations that improve bound tightness without degrading prediction performance.

Beyond single network analysis, the thesis proposes verification frameworks for comparing model variants over shared input regions. This formulation enables verification of local implication and behavioral preservation under model transformations, with joint optimization providing strictly tighter guarantees than independent analyses.

The thesis further analyzes cross-dimensional relations between robustness, privacy, and fairness. In personalized learning settings, it demonstrates that robustness-induced representations can encode identity-specific patterns, enabling patient membership inference without access to training samples. It also introduces a locally-persistent counterfactual bias formulation that captures fairness disparities within perturbation regions and can be incorporated as a training regularizer.

In summary, this thesis establishes the formal ground for analyzing and improving trustworthiness properties of DNNs. The results contribute to optimization-based verification, model transformation analysis, and the study of interactions between robustness, privacy, and fairness, contributing to more reliable machine learning. (Less)
Please use this url to cite or link to this publication:
author
supervisor
opponent
  • Assoc. Prof. Shoukry, Yasser, University of California, USA.
organization
publishing date
type
Thesis
publication status
published
subject
publisher
Electrical and Information Technology, Lund University
defense location
Lecture Hall E:1406, building E, Ole Römers väg 3, Faculty of Engineering LTH, Lund University, Lund. The dissertation will be live streamed, but part of the premises is to be excluded from the live stream.
defense date
2026-06-11 09:15:00
ISBN
978-91-8104-992-3
978-91-8104-993-0
language
English
LU publication?
yes
id
63b87b43-6ab6-459b-9f90-d73758dd1ad4
date added to LUP
2026-05-12 14:38:06
date last changed
2026-05-20 08:47:41
@phdthesis{63b87b43-6ab6-459b-9f90-d73758dd1ad4,
  abstract     = {{The reliable integration of Deep Neural Networks (DNNs) into safety-critical settings requires guarantees that extend beyond empirical accuracy. Despite strong prediction performance, DNNs remain vulnerable to adversarial perturbations, information leakage, and disparities across different populations. Addressing these limitations demands scalable methods for formally analyzing network behavior under uncertainty.<br/><br/>This thesis develops a principled perspective on the verification and characterization of trustworthiness properties in DNNs. Robustness, privacy leakage, and fairness are formulated as optimization problems over network outputs subject to variations in inputs, models, or datasets.<br/><br/>A primary contribution is the development of methods for robustness verification based on refinement of relaxation-based approximations. In particular, a layer-wise refinement strategy is introduced that tightens convex relaxations of nonlinear activation constraints. The approach progressively tightens relaxation-based approximations, yielding improvements of sound bounds while avoiding full combinatorial enumeration of activation patterns. In addition, the thesis establishes a perspective on how model properties influence verifiability and introduces verification-friendly network transformations that improve bound tightness without degrading prediction performance.<br/><br/>Beyond single network analysis, the thesis proposes verification frameworks for comparing model variants over shared input regions. This formulation enables verification of local implication and behavioral preservation under model transformations, with joint optimization providing strictly tighter guarantees than independent analyses.<br/><br/>The thesis further analyzes cross-dimensional relations between robustness, privacy, and fairness. In personalized learning settings, it demonstrates that robustness-induced representations can encode identity-specific patterns, enabling patient membership inference without access to training samples. It also introduces a locally-persistent counterfactual bias formulation that captures fairness disparities within perturbation regions and can be incorporated as a training regularizer.<br/><br/>In summary, this thesis establishes the formal ground for analyzing and improving trustworthiness properties of DNNs. The results contribute to optimization-based verification, model transformation analysis, and the study of interactions between robustness, privacy, and fairness, contributing to more reliable machine learning.}},
  author       = {{Baninajjar, Anahita}},
  isbn         = {{978-91-8104-992-3}},
  language     = {{eng}},
  publisher    = {{Electrical and Information Technology, Lund University}},
  school       = {{Lund University}},
  title        = {{Towards Trustworthy Machine Learning in High-Stakes Decision Making}},
  url          = {{https://lup.lub.lu.se/search/files/249941660/Towards_Trustworthy_Machine_Learning_in_High-Stakes_Decision_Making.pdf}},
  year         = {{2026}},
}