Exploring promptable foundation models for high-resolution video eye tracking in the lab

Niehorster, Diederick C.; Maquiling, Virmarie; Byrne, Sean; Kasneci, Enkelejda; Nyström, Marcus

Exploring promptable foundation models for high-resolution video eye tracking in the lab

Mark

Niehorster, Diederick C. ^LU

; Maquiling, Virmarie ; Byrne, Sean ; Kasneci, Enkelejda and Nyström, Marcus ^LU

(2025) p.1-8

Abstract: We explore whether SAM2, a vision foundation model, can be used for accurate localization of eye image features that are used in lab-based eye tracking: corneal reflections (CRs), the pupil, and the iris. We prompted SAM2 via a typical hand annotation process that consisted of clicking on the pupil, CR, iris and sclera for only one image per participant. SAM2 was found to support better spatial precision in the resulting gaze signals for the pupil (> 44% lower RMS-S2S), but not the CR and iris, than traditional image-processing methods or two state-of-the-art deep-learning tools. Providing more frames with prompts to initialize SAM2 did not improve performance. We conclude that SAM2’s powerful zero-shot segmentation capabilities provide... (More); We explore whether SAM2, a vision foundation model, can be used for accurate localization of eye image features that are used in lab-based eye tracking: corneal reflections (CRs), the pupil, and the iris. We prompted SAM2 via a typical hand annotation process that consisted of clicking on the pupil, CR, iris and sclera for only one image per participant. SAM2 was found to support better spatial precision in the resulting gaze signals for the pupil (> 44% lower RMS-S2S), but not the CR and iris, than traditional image-processing methods or two state-of-the-art deep-learning tools. Providing more frames with prompts to initialize SAM2 did not improve performance. We conclude that SAM2’s powerful zero-shot segmentation capabilities provide an interesting new avenue to explore in high-resolution lab-based eye tracking. We provide our adaptation of SAM2’s codebase that allows segmenting videos of arbitrary duration and prepending arbitrary prompting frames. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/f5ddb07a-4032-497f-89fd-d7b13b75e79d

author

Niehorster, Diederick C. ^LU

; Maquiling, Virmarie ; Byrne, Sean ; Kasneci, Enkelejda and Nyström, Marcus ^LU

organization

publishing date

2025-05-25

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

Human Computer Interaction

keywords

Eye tracking, Feature localization, Gaze estimation, Foundation models, Methods, Pupil, Corneal reflection, Iris, Sclera

host publication

ETETRA '25 : Proceedings of the 2025 Symposium on Eye Tracking Research and Applications - Proceedings of the 2025 Symposium on Eye Tracking Research and Applications

editor

Sugano, Yusuke ; Khamis, Mohamed ; Chetouani, Aladine ; Sidenmark, Ludwig and Bruno, Alessandro

article number

8

pages

1 - 8

publisher

Association for Computing Machinery (ACM)

ISBN

9798400714870

DOI

10.1145/3715669.3723118

language

English

LU publication?

yes

id

f5ddb07a-4032-497f-89fd-d7b13b75e79d

date added to LUP

2025-05-31 19:18:20

date last changed

2025-06-12 15:56:49

@inproceedings{f5ddb07a-4032-497f-89fd-d7b13b75e79d,
  abstract     = {{We explore whether SAM2, a vision foundation model, can be used for accurate localization of eye image features that are used in lab-based eye tracking: corneal reflections (CRs), the pupil, and the iris. We prompted SAM2 via a typical hand annotation process that consisted of clicking on the pupil, CR, iris and sclera for only one image per participant. SAM2 was found to support better spatial precision in the resulting gaze signals for the pupil (&gt; 44% lower RMS-S2S), but not the CR and iris, than traditional image-processing methods or two state-of-the-art deep-learning tools. Providing more frames with prompts to initialize SAM2 did not improve performance. We conclude that SAM2’s powerful zero-shot segmentation capabilities provide an interesting new avenue to explore in high-resolution lab-based eye tracking. We provide our adaptation of SAM2’s codebase that allows segmenting videos of arbitrary duration and prepending arbitrary prompting frames.}},
  author       = {{Niehorster, Diederick C. and Maquiling, Virmarie and Byrne, Sean and Kasneci, Enkelejda and Nyström, Marcus}},
  booktitle    = {{ETETRA '25 : Proceedings of the 2025 Symposium on Eye Tracking Research and Applications}},
  editor       = {{Sugano, Yusuke and Khamis, Mohamed and Chetouani, Aladine and Sidenmark, Ludwig and Bruno, Alessandro}},
  isbn         = {{9798400714870}},
  keywords     = {{Eye tracking; Feature localization; Gaze estimation; Foundation models; Methods; Pupil; Corneal reflection; Iris; Sclera}},
  language     = {{eng}},
  month        = {{05}},
  pages        = {{1--8}},
  publisher    = {{Association for Computing Machinery (ACM)}},
  title        = {{Exploring promptable foundation models for high-resolution video eye tracking in the lab}},
  url          = {{http://dx.doi.org/10.1145/3715669.3723118}},
  doi          = {{10.1145/3715669.3723118}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Exploring promptable foundation models for high-resolution video eye tracking in the lab