Exploring the Guessing-Game Experimental Paradigm : Inferences From Closed- Versus Open-Ended Semantic Space

Kuleshova, Svetlana; Ćwiek, Aleksandra; Hartmann, Stefan; Pleyer, Michael; Sibierska, Marta; Placiński, Marek; Blomberg, Johan; Żywiczyński, Przemysław; Wacewicz, Sławomir

Exploring the Guessing-Game Experimental Paradigm : Inferences From Closed- Versus Open-Ended Semantic Space

Mark

Kuleshova, Svetlana ; Ćwiek, Aleksandra ; Hartmann, Stefan ; Pleyer, Michael ; Sibierska, Marta ; Placiński, Marek ; Blomberg, Johan ^LU ; Żywiczyński, Przemysław and Wacewicz, Sławomir (2026) In Cognitive Science 50(3).

Abstract: How we measure success in signal comprehension experiments fundamentally shapes our conclusions. Two recent studies have demonstrated that humans can guess the meanings of novel vocalizations and ape gestures above chance when selecting from limited alternatives. We replicated both experiments using open-ended responses instead of multiple choice. For the vocalization data, where participants provided single-word or short-phrase responses, we systematically compared three evaluation methods applied to the same responses: exact matching, graded similarity ratings, and computational semantic similarity. For the gesture data, we applied graded similarity ratings. Each evaluation method revealed a different semantic landscape. Participants’... (More); How we measure success in signal comprehension experiments fundamentally shapes our conclusions. Two recent studies have demonstrated that humans can guess the meanings of novel vocalizations and ape gestures above chance when selecting from limited alternatives. We replicated both experiments using open-ended responses instead of multiple choice. For the vocalization data, where participants provided single-word or short-phrase responses, we systematically compared three evaluation methods applied to the same responses: exact matching, graded similarity ratings, and computational semantic similarity. For the gesture data, we applied graded similarity ratings. Each evaluation method revealed a different semantic landscape. Participants’ success was very low when measured by exact matching, moderate by similarity ratings, and substantially greater by computational measures, which capture broader thematic connections. Despite these differences, a consistent pattern emerged across both datasets and all evaluation methods: success was determined primarily by properties of the signals (their semantic category and degree of transparency) rather than individual participant abilities. Participants often reliably distinguished broad categories (actions vs. objects, animals vs. artifacts) but rarely identified specific concepts—and these distinct patterns only became visible through a combination of evaluation methods. In sum, our results partly align with the original studies yet also diverge in ways conducive to different conclusions about naïve humans’ ability to understand novel vocalizations or ape gestures. We show that closed- versus open-ended response formats, and different evaluation scales, function as complementary research tools rather than competing approaches. Each reveals different aspects of how humans navigate semantic space when interpreting novel signals. Experimental and evaluation designs are, therefore, not a technical detail but a theoretical choice about which semantic relationships we seek to expose.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/27348df7-880a-49bf-89ea-c9f38d9eea78

author

Kuleshova, Svetlana ; Ćwiek, Aleksandra ; Hartmann, Stefan ; Pleyer, Michael ; Sibierska, Marta ; Placiński, Marek ; Blomberg, Johan ^LU ; Żywiczyński, Przemysław and Wacewicz, Sławomir

organization

publishing date

2026-03

type

Contribution to journal

publication status

published

subject

Comparative Language Studies and Linguistics

keywords

Bayesian hierarchical modeling, Conceptual replication, Ecological validity, Experimental semiotics, Semantic space, Understanding

in

Cognitive Science

volume

50

issue

3

article number

e70199

publisher

Wiley-Blackwell

external identifiers

scopus:105033984609
pmid:41870092

ISSN

0364-0213

DOI

10.1111/cogs.70199

language

English

LU publication?

yes

id

27348df7-880a-49bf-89ea-c9f38d9eea78

date added to LUP

2026-05-21 15:58:52

date last changed

2026-06-04 16:56:00

@article{27348df7-880a-49bf-89ea-c9f38d9eea78,
  abstract     = {{<p>How we measure success in signal comprehension experiments fundamentally shapes our conclusions. Two recent studies have demonstrated that humans can guess the meanings of novel vocalizations and ape gestures above chance when selecting from limited alternatives. We replicated both experiments using open-ended responses instead of multiple choice. For the vocalization data, where participants provided single-word or short-phrase responses, we systematically compared three evaluation methods applied to the same responses: exact matching, graded similarity ratings, and computational semantic similarity. For the gesture data, we applied graded similarity ratings. Each evaluation method revealed a different semantic landscape. Participants’ success was very low when measured by exact matching, moderate by similarity ratings, and substantially greater by computational measures, which capture broader thematic connections. Despite these differences, a consistent pattern emerged across both datasets and all evaluation methods: success was determined primarily by properties of the signals (their semantic category and degree of transparency) rather than individual participant abilities. Participants often reliably distinguished broad categories (actions vs. objects, animals vs. artifacts) but rarely identified specific concepts—and these distinct patterns only became visible through a combination of evaluation methods. In sum, our results partly align with the original studies yet also diverge in ways conducive to different conclusions about naïve humans’ ability to understand novel vocalizations or ape gestures. We show that closed- versus open-ended response formats, and different evaluation scales, function as complementary research tools rather than competing approaches. Each reveals different aspects of how humans navigate semantic space when interpreting novel signals. Experimental and evaluation designs are, therefore, not a technical detail but a theoretical choice about which semantic relationships we seek to expose.</p>}},
  author       = {{Kuleshova, Svetlana and Ćwiek, Aleksandra and Hartmann, Stefan and Pleyer, Michael and Sibierska, Marta and Placiński, Marek and Blomberg, Johan and Żywiczyński, Przemysław and Wacewicz, Sławomir}},
  issn         = {{0364-0213}},
  keywords     = {{Bayesian hierarchical modeling; Conceptual replication; Ecological validity; Experimental semiotics; Semantic space; Understanding}},
  language     = {{eng}},
  number       = {{3}},
  publisher    = {{Wiley-Blackwell}},
  series       = {{Cognitive Science}},
  title        = {{Exploring the Guessing-Game Experimental Paradigm : Inferences From Closed- Versus Open-Ended Semantic Space}},
  url          = {{http://dx.doi.org/10.1111/cogs.70199}},
  doi          = {{10.1111/cogs.70199}},
  volume       = {{50}},
  year         = {{2026}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Exploring the Guessing-Game Experimental Paradigm : Inferences From Closed- Versus Open-Ended Semantic Space