Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Zero-Shot Pupil Segmentation with SAM 2 : A Case Study of Over 14 Million Images

Maquiling, Virmarie ; Byrne, Sean Anthony ; Niehorster, Diederick C. LU orcid ; Carminati, Marco and Kasneci, Enkelejda (2025) In Proceedings of the ACM on Computer Graphics and Interactive Techniques 8(2). p.1-16
Abstract
We explore the transformative potential of SAM 2, a vision foundation model, in advancing gaze estimation. SAM 2 addresses key challenges in gaze estimation by significantly reducing annotation time, simplifying deployment, and enhancing segmentation accuracy. Utilizing its zero-shot capabilities with minimal user input—a single click per video—we tested SAM 2 on over 14 million eye images from a diverse range of datasets, including the EDS challenge datasets and Labelled Pupils in the Wild. This is the first application of SAM 2 to the gaze estimation domain. Remarkably, SAM 2 matches the performance of domain-specific models in pupil segmentation, achieving competitive mIOU scores of up to 93% without fine-tuning. We argue that SAM 2... (More)
We explore the transformative potential of SAM 2, a vision foundation model, in advancing gaze estimation. SAM 2 addresses key challenges in gaze estimation by significantly reducing annotation time, simplifying deployment, and enhancing segmentation accuracy. Utilizing its zero-shot capabilities with minimal user input—a single click per video—we tested SAM 2 on over 14 million eye images from a diverse range of datasets, including the EDS challenge datasets and Labelled Pupils in the Wild. This is the first application of SAM 2 to the gaze estimation domain. Remarkably, SAM 2 matches the performance of domain-specific models in pupil segmentation, achieving competitive mIOU scores of up to 93% without fine-tuning. We argue that SAM 2 achieves the sought-after standard of domain generalization, with consistent mIOU scores (89.71%-93.74%) across diverse datasets, from virtual reality to "gaze-in-the-wild" scenarios. We provide our code and segmentation masks for these datasets to promote further research. (Less)
Please use this url to cite or link to this publication:
author
; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
Proceedings of the ACM on Computer Graphics and Interactive Techniques
volume
8
issue
2
article number
23
pages
1 - 16
publisher
Association for Computing Machinery (ACM)
ISSN
2577-6193
DOI
10.1145/3729409
language
English
LU publication?
yes
id
8ff9fd0f-38de-4a82-bd02-500b02414d4a
date added to LUP
2025-05-31 19:16:35
date last changed
2025-06-13 14:41:14
@article{8ff9fd0f-38de-4a82-bd02-500b02414d4a,
  abstract     = {{We explore the transformative potential of SAM 2, a vision foundation model, in advancing gaze estimation. SAM 2 addresses key challenges in gaze estimation by significantly reducing annotation time, simplifying deployment, and enhancing segmentation accuracy. Utilizing its zero-shot capabilities with minimal user input—a single click per video—we tested SAM 2 on over 14 million eye images from a diverse range of datasets, including the EDS challenge datasets and Labelled Pupils in the Wild. This is the first application of SAM 2 to the gaze estimation domain. Remarkably, SAM 2 matches the performance of domain-specific models in pupil segmentation, achieving competitive mIOU scores of up to 93% without fine-tuning. We argue that SAM 2 achieves the sought-after standard of domain generalization, with consistent mIOU scores (89.71%-93.74%) across diverse datasets, from virtual reality to "gaze-in-the-wild" scenarios. We provide our code and segmentation masks for these datasets to promote further research.}},
  author       = {{Maquiling, Virmarie and Byrne, Sean Anthony and Niehorster, Diederick C. and Carminati, Marco and Kasneci, Enkelejda}},
  issn         = {{2577-6193}},
  language     = {{eng}},
  month        = {{06}},
  number       = {{2}},
  pages        = {{1--16}},
  publisher    = {{Association for Computing Machinery (ACM)}},
  series       = {{Proceedings of the ACM on Computer Graphics and Interactive Techniques}},
  title        = {{Zero-Shot Pupil Segmentation with SAM 2 : A Case Study of Over 14 Million Images}},
  url          = {{http://dx.doi.org/10.1145/3729409}},
  doi          = {{10.1145/3729409}},
  volume       = {{8}},
  year         = {{2025}},
}