Exploring promptable foundation models for high-resolution video eye tracking in the lab
(2025) p.1-8- Abstract
- We explore whether SAM2, a vision foundation model, can be used for accurate localization of eye image features that are used in lab-based eye tracking: corneal reflections (CRs), the pupil, and the iris. We prompted SAM2 via a typical hand annotation process that consisted of clicking on the pupil, CR, iris and sclera for only one image per participant. SAM2 was found to support better spatial precision in the resulting gaze signals for the pupil (> 44% lower RMS-S2S), but not the CR and iris, than traditional image-processing methods or two state-of-the-art deep-learning tools. Providing more frames with prompts to initialize SAM2 did not improve performance. We conclude that SAM2’s powerful zero-shot segmentation capabilities provide... (More)
- We explore whether SAM2, a vision foundation model, can be used for accurate localization of eye image features that are used in lab-based eye tracking: corneal reflections (CRs), the pupil, and the iris. We prompted SAM2 via a typical hand annotation process that consisted of clicking on the pupil, CR, iris and sclera for only one image per participant. SAM2 was found to support better spatial precision in the resulting gaze signals for the pupil (> 44% lower RMS-S2S), but not the CR and iris, than traditional image-processing methods or two state-of-the-art deep-learning tools. Providing more frames with prompts to initialize SAM2 did not improve performance. We conclude that SAM2’s powerful zero-shot segmentation capabilities provide an interesting new avenue to explore in high-resolution lab-based eye tracking. We provide our adaptation of SAM2’s codebase that allows segmenting videos of arbitrary duration and prepending arbitrary prompting frames. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/f5ddb07a-4032-497f-89fd-d7b13b75e79d
- author
- Niehorster, Diederick C.
LU
; Maquiling, Virmarie ; Byrne, Sean ; Kasneci, Enkelejda and Nyström, Marcus LU
- organization
- publishing date
- 2025-05-25
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- keywords
- Eye tracking, Feature localization, Gaze estimation, Foundation models, Methods, Pupil, Corneal reflection, Iris, Sclera
- host publication
- ETETRA '25 : Proceedings of the 2025 Symposium on Eye Tracking Research and Applications - Proceedings of the 2025 Symposium on Eye Tracking Research and Applications
- editor
- Sugano, Yusuke ; Khamis, Mohamed ; Chetouani, Aladine ; Sidenmark, Ludwig and Bruno, Alessandro
- article number
- 8
- pages
- 1 - 8
- publisher
- Association for Computing Machinery (ACM)
- ISBN
- 9798400714870
- DOI
- 10.1145/3715669.3723118
- language
- English
- LU publication?
- yes
- id
- f5ddb07a-4032-497f-89fd-d7b13b75e79d
- date added to LUP
- 2025-05-31 19:18:20
- date last changed
- 2025-06-12 15:56:49
@inproceedings{f5ddb07a-4032-497f-89fd-d7b13b75e79d, abstract = {{We explore whether SAM2, a vision foundation model, can be used for accurate localization of eye image features that are used in lab-based eye tracking: corneal reflections (CRs), the pupil, and the iris. We prompted SAM2 via a typical hand annotation process that consisted of clicking on the pupil, CR, iris and sclera for only one image per participant. SAM2 was found to support better spatial precision in the resulting gaze signals for the pupil (> 44% lower RMS-S2S), but not the CR and iris, than traditional image-processing methods or two state-of-the-art deep-learning tools. Providing more frames with prompts to initialize SAM2 did not improve performance. We conclude that SAM2’s powerful zero-shot segmentation capabilities provide an interesting new avenue to explore in high-resolution lab-based eye tracking. We provide our adaptation of SAM2’s codebase that allows segmenting videos of arbitrary duration and prepending arbitrary prompting frames.}}, author = {{Niehorster, Diederick C. and Maquiling, Virmarie and Byrne, Sean and Kasneci, Enkelejda and Nyström, Marcus}}, booktitle = {{ETETRA '25 : Proceedings of the 2025 Symposium on Eye Tracking Research and Applications}}, editor = {{Sugano, Yusuke and Khamis, Mohamed and Chetouani, Aladine and Sidenmark, Ludwig and Bruno, Alessandro}}, isbn = {{9798400714870}}, keywords = {{Eye tracking; Feature localization; Gaze estimation; Foundation models; Methods; Pupil; Corneal reflection; Iris; Sclera}}, language = {{eng}}, month = {{05}}, pages = {{1--8}}, publisher = {{Association for Computing Machinery (ACM)}}, title = {{Exploring promptable foundation models for high-resolution video eye tracking in the lab}}, url = {{http://dx.doi.org/10.1145/3715669.3723118}}, doi = {{10.1145/3715669.3723118}}, year = {{2025}}, }