Advanced

Understanding virtual speakers

Nirme, Jens LU (2020) In Lund University Cognitive Studies 177.
Abstract
This thesis addresses how verbal comprehension is affected by seeing the speaker and in particular when the speaker is an animated virtual speaker. Two people visually co-present – one talking and the other listening, trying to comprehend what is said – is a central and critical scenario whether one is interested in human cognition, communication or learning.
Papers I & II are focused on the effect on comprehension of seeing a virtual speaker displaying visual speech cues (lip and head movements accompanying speech). The results presented indicate a positive effect in the presence of background babble noise but no effect in its absence. The results presented in paper II also indicate that the effect of seeing the virtual speaker is... (More)
This thesis addresses how verbal comprehension is affected by seeing the speaker and in particular when the speaker is an animated virtual speaker. Two people visually co-present – one talking and the other listening, trying to comprehend what is said – is a central and critical scenario whether one is interested in human cognition, communication or learning.
Papers I & II are focused on the effect on comprehension of seeing a virtual speaker displaying visual speech cues (lip and head movements accompanying speech). The results presented indicate a positive effect in the presence of background babble noise but no effect in its absence. The results presented in paper II also indicate that the effect of seeing the virtual speaker is at least as effective as seeing a real speaker, that the exploitation of visual speech cues by a virtual speaker may require some adaptation but is not affected by subjective perception of the virtual speakers’ social traits.
Papers III & IV focus on the effect of the temporal coordination of speech and gesture on memory encoding of speech, and the feasibility of a novel methodology to address this question. The objective of the methodology is the precise manipulating of individual gestures within naturalistic speech and gesture sequences recorded by motion capture and reproduced by virtual speakers. Results in paper III indicate that such temporal manipulations can be realized without subjective perception of the animation as unnatural as long as the shifted (manipulated) gestural movements temporally overlap with some speech (not pause or hesitation). Results of paper IV were that words accompanied by associated gestures in their original synchrony or gestures arriving earlier were more likely to be recalled. This mirrors the temporal coordination patterns that are common in natural speech-gesture production.
Paper V explores how factual topics are comprehended and approached metacognitively when presented in different media, including a video of an animated virtual speaker with synthesized speech. They study made use of an interface where differences in information transience and navigation options are minimized between the media. Results indicate improved comprehension and a somewhat stronger tendency to repeat material when also seeing, compared to only listening to, the virtual speaker. Instances of navigation behaviours were, however, overall scarce and only tentative conclusions could be drawn regarding differences in metacognitive approaches between media.
Paper VI presents a virtual replication of a choice blindness experimental paradigm. The results show that the level of detail of the presentation of a virtual environment and a speaker may affect self-reported presence as well the level of trust exhibited towards the speaker.
The relevance of these findings is discussed with regards to how comprehension is affected by visible speakers in general and virtual speakers specifically, as well as possible consequences for the design and implementation of virtual speakers in educational applications and as research instruments. (Less)
Please use this url to cite or link to this publication:
author
supervisor
opponent
  • professor Catherine Pelachaud, Sorbonne Université
organization
publishing date
type
Thesis
publication status
published
subject
keywords
Verbal Comprehension, Multimodality, Audiovisual integration, Gesture, Educational Technology
in
Lund University Cognitive Studies
volume
177
pages
180 pages
publisher
Lund University (Media-Tryck)
defense location
LUX C121
defense date
2020-02-21 10:15:00
ISSN
1101-8453
ISBN
978-91-88899-84-3
978-91-88899-84-2
language
English
LU publication?
yes
id
bd79067e-bce2-46f4-84a2-6ee0807e8925
date added to LUP
2020-01-23 13:25:12
date last changed
2020-01-24 16:01:53
@phdthesis{bd79067e-bce2-46f4-84a2-6ee0807e8925,
  abstract     = {This thesis addresses how verbal comprehension is affected by seeing the speaker and in particular when the speaker is an animated virtual speaker. Two people visually co-present – one talking and the other listening, trying to comprehend what is said – is a central and critical scenario whether one is interested in human cognition, communication or learning.<br>
Papers I &amp; II are focused on the effect on comprehension of seeing a virtual speaker displaying visual speech cues (lip and head movements accompanying speech). The results presented indicate a positive effect in the presence of background babble noise but no effect in its absence. The results presented in paper II also indicate that the effect of seeing the virtual speaker is at least as effective as seeing a real speaker, that the exploitation of visual speech cues by a virtual speaker may require some adaptation but is not affected by subjective perception of the virtual speakers’ social traits.<br>
Papers III &amp; IV focus on the effect of the temporal coordination of speech and gesture on memory encoding of speech, and the feasibility of a novel methodology to address this question. The objective of the methodology is the precise manipulating of individual gestures within naturalistic speech and gesture sequences recorded by motion capture and reproduced by virtual speakers. Results in paper III indicate that such temporal manipulations can be realized without subjective perception of the animation as unnatural as long as the shifted (manipulated) gestural movements temporally overlap with some speech (not pause or hesitation). Results of paper IV were that words accompanied by associated gestures in their original synchrony or gestures arriving earlier were more likely to be recalled. This mirrors the temporal coordination patterns that are common in natural speech-gesture production.<br>
Paper V explores how factual topics are comprehended and approached metacognitively when presented in different media, including a video of an animated virtual speaker with synthesized speech. They study made use of an interface where differences in information transience and navigation options are minimized between the media. Results indicate improved comprehension and a somewhat stronger tendency to repeat material when also seeing, compared to only listening to, the virtual speaker. Instances of navigation behaviours were, however, overall scarce and only tentative conclusions could be drawn regarding differences in metacognitive approaches between media.<br>
Paper VI presents a virtual replication of a choice blindness experimental paradigm. The results show that the level of detail of the presentation of a virtual environment and a speaker may affect self-reported presence as well the level of trust exhibited towards the speaker. <br>
The relevance of these findings is discussed with regards to how comprehension is affected by visible speakers in general and virtual speakers specifically, as well as possible consequences for the design and implementation of virtual speakers in educational applications and as research instruments.},
  author       = {Nirme, Jens},
  isbn         = {978-91-88899-84-3},
  issn         = {1101-8453},
  language     = {eng},
  month        = {01},
  publisher    = {Lund University (Media-Tryck)},
  school       = {Lund University},
  series       = {Lund University Cognitive Studies },
  title        = {Understanding virtual speakers},
  url          = {https://lup.lub.lu.se/search/ws/files/75448803/Jens_Nirme_HELA.pdf},
  volume       = {177},
  year         = {2020},
}