Implementation of Vision Transformers for the Matching of Fingerprints in Biometric Verification
(2025) In Master's Theses in Mathematical Sciences FMAM05 20251Mathematics (Faculty of Engineering)
- Abstract
- Biometric fingerprint recognition is a cornerstone of modern authentication systems, yet conventional convolutional neural network approaches often struggle with challenging cases, subjected to noise and partial overlap. This thesis explores the potential of transformer-based architectures to address these challenges. We design, implement, and evaluate five models on a large-scale dataset of over five million annotated fingerprint pairs: ViT-Small, a lightweight Vision Transformer trained from scratch; ViT-Large, a high-capacity variant boosting and weighting implemented in training; ViT-Large-Without, with the same structure as ViT-Large but without boosting or weighting; ViT-Large-Combined that combines the approaches of the earlier two;... (More)
- Biometric fingerprint recognition is a cornerstone of modern authentication systems, yet conventional convolutional neural network approaches often struggle with challenging cases, subjected to noise and partial overlap. This thesis explores the potential of transformer-based architectures to address these challenges. We design, implement, and evaluate five models on a large-scale dataset of over five million annotated fingerprint pairs: ViT-Small, a lightweight Vision Transformer trained from scratch; ViT-Large, a high-capacity variant boosting and weighting implemented in training; ViT-Large-Without, with the same structure as ViT-Large but without boosting or weighting; ViT-Large-Combined that combines the approaches of the earlier two; and Swin Transformer, which employs hierarchical, shifted-window attention for efficient multi-scale feature extraction. To simulate real-world variability, we include noise reflecting wet, dry, and environmental conditions into our training. Our experiments demonstrate that especially ViT-Large-Without performs well, achieving high performance in the relatively short development time. Beyond accuracy gains, we analyze inference latency and GPU utilization, revealing that transformer models can be deployed with practical resource requirements. These findings demonstrate that a high-capacity Vision Transformer trained without complex weighting schemes strikes an optimal balance between robustness, accuracy, and computational efficiency. ViT-Large-Without emerges as the most promising candidate for real-world biometric systems, challenging conventional assumptions that elaborate training strategies are necessary for state-of-the-art fingerprint matching. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9201642
- author
- Svensson, Kalle LU and Cewers, Julius LU
- supervisor
- organization
- alternative title
- Implementering av Vision Transformers för Matchning av Fingeravtryck vid Biometrisk Verifiering
- course
- FMAM05 20251
- year
- 2025
- type
- H2 - Master's Degree (Two Years)
- subject
- keywords
- Vision Transformers, Fingerprint Recognition, Biometric Verification, Swin Transformer, Deep Learning
- publication/series
- Master's Theses in Mathematical Sciences
- report number
- LUTFMA-3576-2025
- ISSN
- 1404-6342
- other publication id
- 2025:E25
- language
- English
- additional info
- Upplandning godkänd av utbildningsadministratör på Matematikcentrum. (Hanne Nordkvist)
- id
- 9201642
- date added to LUP
- 2025-09-15 11:10:08
- date last changed
- 2025-09-15 11:10:08
@misc{9201642, abstract = {{Biometric fingerprint recognition is a cornerstone of modern authentication systems, yet conventional convolutional neural network approaches often struggle with challenging cases, subjected to noise and partial overlap. This thesis explores the potential of transformer-based architectures to address these challenges. We design, implement, and evaluate five models on a large-scale dataset of over five million annotated fingerprint pairs: ViT-Small, a lightweight Vision Transformer trained from scratch; ViT-Large, a high-capacity variant boosting and weighting implemented in training; ViT-Large-Without, with the same structure as ViT-Large but without boosting or weighting; ViT-Large-Combined that combines the approaches of the earlier two; and Swin Transformer, which employs hierarchical, shifted-window attention for efficient multi-scale feature extraction. To simulate real-world variability, we include noise reflecting wet, dry, and environmental conditions into our training. Our experiments demonstrate that especially ViT-Large-Without performs well, achieving high performance in the relatively short development time. Beyond accuracy gains, we analyze inference latency and GPU utilization, revealing that transformer models can be deployed with practical resource requirements. These findings demonstrate that a high-capacity Vision Transformer trained without complex weighting schemes strikes an optimal balance between robustness, accuracy, and computational efficiency. ViT-Large-Without emerges as the most promising candidate for real-world biometric systems, challenging conventional assumptions that elaborate training strategies are necessary for state-of-the-art fingerprint matching.}}, author = {{Svensson, Kalle and Cewers, Julius}}, issn = {{1404-6342}}, language = {{eng}}, note = {{Student Paper}}, series = {{Master's Theses in Mathematical Sciences}}, title = {{Implementation of Vision Transformers for the Matching of Fingerprints in Biometric Verification}}, year = {{2025}}, }