Implementation of Vision Transformers for the Matching of Fingerprints in Biometric Verification

Svensson, Kalle; Cewers, Julius

Implementation of Vision Transformers for the Matching of Fingerprints in Biometric Verification

Mark

Svensson, Kalle ^LU and Cewers, Julius ^LU (2025) In Master's Theses in Mathematical Sciences FMAM05 20251
Mathematics (Faculty of Engineering)

Abstract: Biometric fingerprint recognition is a cornerstone of modern authentication systems, yet conventional convolutional neural network approaches often struggle with challenging cases, subjected to noise and partial overlap. This thesis explores the potential of transformer-based architectures to address these challenges. We design, implement, and evaluate five models on a large-scale dataset of over five million annotated fingerprint pairs: ViT-Small, a lightweight Vision Transformer trained from scratch; ViT-Large, a high-capacity variant boosting and weighting implemented in training; ViT-Large-Without, with the same structure as ViT-Large but without boosting or weighting; ViT-Large-Combined that combines the approaches of the earlier two;... (More); Biometric fingerprint recognition is a cornerstone of modern authentication systems, yet conventional convolutional neural network approaches often struggle with challenging cases, subjected to noise and partial overlap. This thesis explores the potential of transformer-based architectures to address these challenges. We design, implement, and evaluate five models on a large-scale dataset of over five million annotated fingerprint pairs: ViT-Small, a lightweight Vision Transformer trained from scratch; ViT-Large, a high-capacity variant boosting and weighting implemented in training; ViT-Large-Without, with the same structure as ViT-Large but without boosting or weighting; ViT-Large-Combined that combines the approaches of the earlier two; and Swin Transformer, which employs hierarchical, shifted-window attention for efficient multi-scale feature extraction. To simulate real-world variability, we include noise reflecting wet, dry, and environmental conditions into our training. Our experiments demonstrate that especially ViT-Large-Without performs well, achieving high performance in the relatively short development time. Beyond accuracy gains, we analyze inference latency and GPU utilization, revealing that transformer models can be deployed with practical resource requirements. These findings demonstrate that a high-capacity Vision Transformer trained without complex weighting schemes strikes an optimal balance between robustness, accuracy, and computational efficiency. ViT-Large-Without emerges as the most promising candidate for real-world biometric systems, challenging conventional assumptions that elaborate training strategies are necessary for state-of-the-art fingerprint matching. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9201642

author

Svensson, Kalle ^LU and Cewers, Julius ^LU

supervisor

organization

Mathematics (Faculty of Engineering)

alternative title

Implementering av Vision Transformers för Matchning av Fingeravtryck vid Biometrisk Verifiering

course

FMAM05 20251

year

2025

type

H2 - Master's Degree (Two Years)

subject

Mathematics and Statistics

keywords

Vision Transformers, Fingerprint Recognition, Biometric Verification, Swin Transformer, Deep Learning

publication/series

Master's Theses in Mathematical Sciences

report number

LUTFMA-3576-2025

ISSN

1404-6342

other publication id

2025:E25

language

English

additional info

Upplandning godkänd av utbildningsadministratör på Matematikcentrum. (Hanne Nordkvist)

id

9201642

date added to LUP

2025-09-15 11:10:08

date last changed

2025-09-15 11:10:08

@misc{9201642,
  abstract     = {{Biometric fingerprint recognition is a cornerstone of modern authentication systems, yet conventional convolutional neural network approaches often struggle with challenging cases, subjected to noise and partial overlap. This thesis explores the potential of transformer-based architectures to address these challenges. We design, implement, and evaluate five models on a large-scale dataset of over five million annotated fingerprint pairs: ViT-Small, a lightweight Vision Transformer trained from scratch; ViT-Large, a high-capacity variant boosting and weighting implemented in training; ViT-Large-Without, with the same structure as ViT-Large but without boosting or weighting; ViT-Large-Combined that combines the approaches of the earlier two; and Swin Transformer, which employs hierarchical, shifted-window attention for efficient multi-scale feature extraction. To simulate real-world variability, we include noise reflecting wet, dry, and environmental conditions into our training. Our experiments demonstrate that especially ViT-Large-Without performs well, achieving high performance in the relatively short development time. Beyond accuracy gains, we analyze inference latency and GPU utilization, revealing that transformer models can be deployed with practical resource requirements. These findings demonstrate that a high-capacity Vision Transformer trained without complex weighting schemes strikes an optimal balance between robustness, accuracy, and computational efficiency. ViT-Large-Without emerges as the most promising candidate for real-world biometric systems, challenging conventional assumptions that elaborate training strategies are necessary for state-of-the-art fingerprint matching.}},
  author       = {{Svensson, Kalle and Cewers, Julius}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Implementation of Vision Transformers for the Matching of Fingerprints in Biometric Verification}},
  year         = {{2025}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Implementation of Vision Transformers for the Matching of Fingerprints in Biometric Verification