Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Implementation of Vision Transformers for the Matching of Fingerprints in Biometric Verification

Svensson, Kalle LU and Cewers, Julius LU (2025) In Master's Theses in Mathematical Sciences FMAM05 20251
Mathematics (Faculty of Engineering)
Abstract
Biometric fingerprint recognition is a cornerstone of modern authentication systems, yet conventional convolutional neural network approaches often struggle with challenging cases, subjected to noise and partial overlap. This thesis explores the potential of transformer-based architectures to address these challenges. We design, implement, and evaluate five models on a large-scale dataset of over five million annotated fingerprint pairs: ViT-Small, a lightweight Vision Transformer trained from scratch; ViT-Large, a high-capacity variant boosting and weighting implemented in training; ViT-Large-Without, with the same structure as ViT-Large but without boosting or weighting; ViT-Large-Combined that combines the approaches of the earlier two;... (More)
Biometric fingerprint recognition is a cornerstone of modern authentication systems, yet conventional convolutional neural network approaches often struggle with challenging cases, subjected to noise and partial overlap. This thesis explores the potential of transformer-based architectures to address these challenges. We design, implement, and evaluate five models on a large-scale dataset of over five million annotated fingerprint pairs: ViT-Small, a lightweight Vision Transformer trained from scratch; ViT-Large, a high-capacity variant boosting and weighting implemented in training; ViT-Large-Without, with the same structure as ViT-Large but without boosting or weighting; ViT-Large-Combined that combines the approaches of the earlier two; and Swin Transformer, which employs hierarchical, shifted-window attention for efficient multi-scale feature extraction. To simulate real-world variability, we include noise reflecting wet, dry, and environmental conditions into our training. Our experiments demonstrate that especially ViT-Large-Without performs well, achieving high performance in the relatively short development time. Beyond accuracy gains, we analyze inference latency and GPU utilization, revealing that transformer models can be deployed with practical resource requirements. These findings demonstrate that a high-capacity Vision Transformer trained without complex weighting schemes strikes an optimal balance between robustness, accuracy, and computational efficiency. ViT-Large-Without emerges as the most promising candidate for real-world biometric systems, challenging conventional assumptions that elaborate training strategies are necessary for state-of-the-art fingerprint matching. (Less)
Please use this url to cite or link to this publication:
author
Svensson, Kalle LU and Cewers, Julius LU
supervisor
organization
alternative title
Implementering av Vision Transformers för Matchning av Fingeravtryck vid Biometrisk Verifiering
course
FMAM05 20251
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Vision Transformers, Fingerprint Recognition, Biometric Verification, Swin Transformer, Deep Learning
publication/series
Master's Theses in Mathematical Sciences
report number
LUTFMA-3576-2025
ISSN
1404-6342
other publication id
2025:E25
language
English
additional info
Upplandning godkänd av utbildningsadministratör på Matematikcentrum. (Hanne Nordkvist)
id
9201642
date added to LUP
2025-09-15 11:10:08
date last changed
2025-09-15 11:10:08
@misc{9201642,
  abstract     = {{Biometric fingerprint recognition is a cornerstone of modern authentication systems, yet conventional convolutional neural network approaches often struggle with challenging cases, subjected to noise and partial overlap. This thesis explores the potential of transformer-based architectures to address these challenges. We design, implement, and evaluate five models on a large-scale dataset of over five million annotated fingerprint pairs: ViT-Small, a lightweight Vision Transformer trained from scratch; ViT-Large, a high-capacity variant boosting and weighting implemented in training; ViT-Large-Without, with the same structure as ViT-Large but without boosting or weighting; ViT-Large-Combined that combines the approaches of the earlier two; and Swin Transformer, which employs hierarchical, shifted-window attention for efficient multi-scale feature extraction. To simulate real-world variability, we include noise reflecting wet, dry, and environmental conditions into our training. Our experiments demonstrate that especially ViT-Large-Without performs well, achieving high performance in the relatively short development time. Beyond accuracy gains, we analyze inference latency and GPU utilization, revealing that transformer models can be deployed with practical resource requirements. These findings demonstrate that a high-capacity Vision Transformer trained without complex weighting schemes strikes an optimal balance between robustness, accuracy, and computational efficiency. ViT-Large-Without emerges as the most promising candidate for real-world biometric systems, challenging conventional assumptions that elaborate training strategies are necessary for state-of-the-art fingerprint matching.}},
  author       = {{Svensson, Kalle and Cewers, Julius}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Implementation of Vision Transformers for the Matching of Fingerprints in Biometric Verification}},
  year         = {{2025}},
}