Recall and perceived naturalness of asynchronous speech and gesture

Nirme, Jens

Recall and perceived naturalness of asynchronous speech and gesture

Mark

Nirme, Jens ^LU (2016) 7th Conference of the International Society for Gesture Studies p.257-257

Abstract: Part of the justification for an integrated view of speech and gestures ( (( is their temporal coordination. Gestures generally coincide with or precede, but rarely follow lexical affiliate (McNeill, 1992). How synchrony impacts listeners remains less explored, despite potential relevance for video communication and virtual conversational agents. ERP studies suggest that temporal alignment affects how words and gestures are integrated (Obermeier & Gunter, 2015) (Habets et al, 2011). Explicit perception of asynchrony is less sensitive and shifts longer than 1s can be tolerated (Kirchhof, 2014). However, gestures that are preceded by their lexical affiliates deviate from the expected pattern given regular exposure to speech which might... (More); Part of the justification for an integrated view of speech and gestures ( (( is their temporal coordination. Gestures generally coincide with or precede, but rarely follow lexical affiliate (McNeill, 1992). How synchrony impacts listeners remains less explored, despite potential relevance for video communication and virtual conversational agents. ERP studies suggest that temporal alignment affects how words and gestures are integrated (Obermeier & Gunter, 2015) (Habets et al, 2011). Explicit perception of asynchrony is less sensitive and shifts longer than 1s can be tolerated (Kirchhof, 2014). However, gestures that are preceded by their lexical affiliates deviate from the expected pattern given regular exposure to speech which might implicitly affect listeners. We investigated whether the asymmetry of timing observed in production was reflected in differential effects of gestures shifted in either direction on how listeners perceive the speakers behavior as natural (Exp1) and/or impairing their processing and subsequent recall of words. (Exp2) Using motion capture to animate virtual speakers (giving explanations) allowed shifting specific gesture strokes within longer segments while preserving synchronized lip movements. For 16 short segments we produced videos in 3 conditions defined by the timing of a target gesture stroke relative a target word; either overlapping (SYNC) or shifted 500ms earlier (GIBEFORE) or later (GIAFTER). We classified the verbal content overlapping with shifted strokes by (unequally frequent) categories ”congruent”, ”incongruent” or ”filled/unfilled pauses”. In Exp1, 32 participants saw a composition of 4 videos from each of the 3 mentioned conditions plus a variation of SYNC with distorted pitch during a few nonItarget words (AUDIO). After each video the participants rated their impression that it was based on a capture of natural or was artificially generated (by an undefined algorithm). We transformed each participant’s responses to the range between 0 (most artificial) and 1(most natural). Results revealed no significant differences between conditions. However, comparing the ratings between the categories of overlap revealed that strokes shifted to ”filled /unfilled pauses” were rated as more artificial. In Exp2, 79 participants saw all 16 videos in one of four conditions. SYNC, GIBEFORE and GIAFTER were contrasted by a condition with seamlessly extinguished target gestures. Following each video and a distraction task, participants attempted to repeat what they heard in the video. Results revealed impaired recall of target words with extinguished or delayed gestures. In summary, asynchronous gestures were not perceived as less natural if overlapping with any words. Synchronous and preceding, but not following, gestures facilitated recall, as expected if the processing of speech and gestures (involved in this particular task) would be tuned to temporal patterns common in natural speech. (Less)
Abstract (Swedish)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/d436d32c-5c80-4384-8123-b50c130f0f1d

author

Nirme, Jens ^LU

organization

publishing date

2016-07-18

type

Contribution to conference

publication status

published

subject

Media and Communication Studies

keywords

Co-speech gestures, multimodal integration, timing, animation, memory, comprehension

pages

1 pages

conference name

7th Conference of the International Society for Gesture Studies

conference location

Paris, France

conference dates

2016-07-18 - 2016-07-22

language

English

LU publication?

yes

id

d436d32c-5c80-4384-8123-b50c130f0f1d

alternative location

https://isgs7.sciencesconf.org/conference/isgs7/ISGS_BoA_16_07_19.pdf#page=257

date added to LUP

2016-12-09 14:10:31

date last changed

2025-04-04 14:17:51

@misc{d436d32c-5c80-4384-8123-b50c130f0f1d,
  abstract     = {{Part of the justification for an integrated view of speech and gestures ( (( is their temporal coordination. Gestures generally coincide with or precede, but rarely follow lexical affiliate (McNeill, 1992). How synchrony impacts listeners remains less explored, despite potential relevance for video communication and virtual conversational agents. ERP studies suggest that temporal alignment affects how words and gestures are integrated (Obermeier &amp; Gunter, 2015) (Habets et al, 2011). Explicit perception of asynchrony is less sensitive and shifts longer than 1s can be tolerated (Kirchhof, 2014). However, gestures that are preceded by their lexical affiliates deviate from the expected pattern given regular exposure to speech which might implicitly affect listeners. We investigated whether the asymmetry of timing observed in production was reflected in differential effects of gestures shifted in either direction on how listeners perceive the speakers behavior as natural (Exp1) and/or impairing their processing and subsequent recall of words. (Exp2) Using motion capture to animate virtual speakers (giving explanations) allowed shifting specific gesture strokes within longer segments while preserving synchronized lip movements. For 16 short segments we produced videos in 3 conditions defined by the timing of a target gesture stroke relative a target word; either overlapping (SYNC) or shifted 500ms earlier (GIBEFORE) or later (GIAFTER). We classified the verbal content overlapping with shifted strokes by (unequally frequent) categories ”congruent”, ”incongruent” or ”filled/unfilled pauses”. In Exp1, 32 participants saw a composition of 4 videos from each of the 3 mentioned conditions plus a variation of SYNC with distorted pitch during a few nonItarget words (AUDIO). After each video the participants rated their impression that it was based on a capture of natural or was artificially generated (by an undefined algorithm). We transformed each participant’s responses to the range between 0 (most artificial) and 1(most natural). Results revealed no significant differences between conditions. However, comparing the ratings between the categories of overlap revealed that strokes shifted to ”filled /unfilled pauses” were rated as more artificial. In Exp2, 79 participants saw all 16 videos in one of four conditions. SYNC, GIBEFORE and GIAFTER were contrasted by a condition with seamlessly extinguished target gestures. Following each video and a distraction task, participants attempted to repeat what they heard in the video. Results revealed impaired recall of target words with extinguished or delayed gestures. In summary, asynchronous gestures were not perceived as less natural if overlapping with any words. Synchronous and preceding, but not following, gestures facilitated recall, as expected if the processing of speech and gestures (involved in this particular task) would be tuned to temporal patterns common in natural speech.}},
  author       = {{Nirme, Jens}},
  keywords     = {{Co-speech gestures; multimodal integration; timing; animation; memory; comprehension}},
  language     = {{eng}},
  month        = {{07}},
  pages        = {{257--257}},
  title        = {{Recall and perceived naturalness of asynchronous speech and gesture}},
  url          = {{https://isgs7.sciencesconf.org/conference/isgs7/ISGS_BoA_16_07_19.pdf#page=257}},
  year         = {{2016}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Recall and perceived naturalness of asynchronous speech and gesture