Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Exploring promptable foundation models for high-resolution video eye tracking in the lab

Niehorster, Diederick C. LU orcid ; Maquiling, Virmarie ; Byrne, Sean ; Kasneci, Enkelejda and Nyström, Marcus LU orcid (2025) p.1-8
Abstract
We explore whether SAM2, a vision foundation model, can be used for accurate localization of eye image features that are used in lab-based eye tracking: corneal reflections (CRs), the pupil, and the iris. We prompted SAM2 via a typical hand annotation process that consisted of clicking on the pupil, CR, iris and sclera for only one image per participant. SAM2 was found to support better spatial precision in the resulting gaze signals for the pupil (> 44% lower RMS-S2S), but not the CR and iris, than traditional image-processing methods or two state-of-the-art deep-learning tools. Providing more frames with prompts to initialize SAM2 did not improve performance. We conclude that SAM2’s powerful zero-shot segmentation capabilities provide... (More)
We explore whether SAM2, a vision foundation model, can be used for accurate localization of eye image features that are used in lab-based eye tracking: corneal reflections (CRs), the pupil, and the iris. We prompted SAM2 via a typical hand annotation process that consisted of clicking on the pupil, CR, iris and sclera for only one image per participant. SAM2 was found to support better spatial precision in the resulting gaze signals for the pupil (> 44% lower RMS-S2S), but not the CR and iris, than traditional image-processing methods or two state-of-the-art deep-learning tools. Providing more frames with prompts to initialize SAM2 did not improve performance. We conclude that SAM2’s powerful zero-shot segmentation capabilities provide an interesting new avenue to explore in high-resolution lab-based eye tracking. We provide our adaptation of SAM2’s codebase that allows segmenting videos of arbitrary duration and prepending arbitrary prompting frames. (Less)
Please use this url to cite or link to this publication:
author
; ; ; and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
Eye tracking, Feature localization, Gaze estimation, Foundation models, Methods, Pupil, Corneal reflection, Iris, Sclera
host publication
ETETRA '25 : Proceedings of the 2025 Symposium on Eye Tracking Research and Applications - Proceedings of the 2025 Symposium on Eye Tracking Research and Applications
editor
Sugano, Yusuke ; Khamis, Mohamed ; Chetouani, Aladine ; Sidenmark, Ludwig and Bruno, Alessandro
article number
8
pages
1 - 8
publisher
Association for Computing Machinery (ACM)
ISBN
9798400714870
DOI
10.1145/3715669.3723118
language
English
LU publication?
yes
id
f5ddb07a-4032-497f-89fd-d7b13b75e79d
date added to LUP
2025-05-31 19:18:20
date last changed
2025-06-12 15:56:49
@inproceedings{f5ddb07a-4032-497f-89fd-d7b13b75e79d,
  abstract     = {{We explore whether SAM2, a vision foundation model, can be used for accurate localization of eye image features that are used in lab-based eye tracking: corneal reflections (CRs), the pupil, and the iris. We prompted SAM2 via a typical hand annotation process that consisted of clicking on the pupil, CR, iris and sclera for only one image per participant. SAM2 was found to support better spatial precision in the resulting gaze signals for the pupil (> 44% lower RMS-S2S), but not the CR and iris, than traditional image-processing methods or two state-of-the-art deep-learning tools. Providing more frames with prompts to initialize SAM2 did not improve performance. We conclude that SAM2’s powerful zero-shot segmentation capabilities provide an interesting new avenue to explore in high-resolution lab-based eye tracking. We provide our adaptation of SAM2’s codebase that allows segmenting videos of arbitrary duration and prepending arbitrary prompting frames.}},
  author       = {{Niehorster, Diederick C. and Maquiling, Virmarie and Byrne, Sean and Kasneci, Enkelejda and Nyström, Marcus}},
  booktitle    = {{ETETRA '25 : Proceedings of the 2025 Symposium on Eye Tracking Research and Applications}},
  editor       = {{Sugano, Yusuke and Khamis, Mohamed and Chetouani, Aladine and Sidenmark, Ludwig and Bruno, Alessandro}},
  isbn         = {{9798400714870}},
  keywords     = {{Eye tracking; Feature localization; Gaze estimation; Foundation models; Methods; Pupil; Corneal reflection; Iris; Sclera}},
  language     = {{eng}},
  month        = {{05}},
  pages        = {{1--8}},
  publisher    = {{Association for Computing Machinery (ACM)}},
  title        = {{Exploring promptable foundation models for high-resolution video eye tracking in the lab}},
  url          = {{http://dx.doi.org/10.1145/3715669.3723118}},
  doi          = {{10.1145/3715669.3723118}},
  year         = {{2025}},
}