Understanding virtual speakers

Nirme, Jens

Understanding virtual speakers

Mark

Nirme, Jens ^LU (2020) In Lund University Cognitive Studies 177.

Abstract: This thesis addresses how verbal comprehension is affected by seeing the speaker and in particular when the speaker is an animated virtual speaker. Two people visually co-present – one talking and the other listening, trying to comprehend what is said – is a central and critical scenario whether one is interested in human cognition, communication or learning.
Papers I & II are focused on the effect on comprehension of seeing a virtual speaker displaying visual speech cues (lip and head movements accompanying speech). The results presented indicate a positive effect in the presence of background babble noise but no effect in its absence. The results presented in paper II also indicate that the effect of seeing the virtual speaker is... (More); This thesis addresses how verbal comprehension is affected by seeing the speaker and in particular when the speaker is an animated virtual speaker. Two people visually co-present – one talking and the other listening, trying to comprehend what is said – is a central and critical scenario whether one is interested in human cognition, communication or learning.
Papers I & II are focused on the effect on comprehension of seeing a virtual speaker displaying visual speech cues (lip and head movements accompanying speech). The results presented indicate a positive effect in the presence of background babble noise but no effect in its absence. The results presented in paper II also indicate that the effect of seeing the virtual speaker is at least as effective as seeing a real speaker, that the exploitation of visual speech cues by a virtual speaker may require some adaptation but is not affected by subjective perception of the virtual speakers’ social traits.
Papers III & IV focus on the effect of the temporal coordination of speech and gesture on memory encoding of speech, and the feasibility of a novel methodology to address this question. The objective of the methodology is the precise manipulating of individual gestures within naturalistic speech and gesture sequences recorded by motion capture and reproduced by virtual speakers. Results in paper III indicate that such temporal manipulations can be realized without subjective perception of the animation as unnatural as long as the shifted (manipulated) gestural movements temporally overlap with some speech (not pause or hesitation). Results of paper IV were that words accompanied by associated gestures in their original synchrony or gestures arriving earlier were more likely to be recalled. This mirrors the temporal coordination patterns that are common in natural speech-gesture production.
Paper V explores how factual topics are comprehended and approached metacognitively when presented in different media, including a video of an animated virtual speaker with synthesized speech. They study made use of an interface where differences in information transience and navigation options are minimized between the media. Results indicate improved comprehension and a somewhat stronger tendency to repeat material when also seeing, compared to only listening to, the virtual speaker. Instances of navigation behaviours were, however, overall scarce and only tentative conclusions could be drawn regarding differences in metacognitive approaches between media.
Paper VI presents a virtual replication of a choice blindness experimental paradigm. The results show that the level of detail of the presentation of a virtual environment and a speaker may affect self-reported presence as well the level of trust exhibited towards the speaker.
The relevance of these findings is discussed with regards to how comprehension is affected by visible speakers in general and virtual speakers specifically, as well as possible consequences for the design and implementation of virtual speakers in educational applications and as research instruments. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/bd79067e-bce2-46f4-84a2-6ee0807e8925

author

Nirme, Jens ^LU

supervisor

opponent

professor Catherine Pelachaud, Sorbonne Université

organization

publishing date

2020-01-24

type

Thesis

publication status

published

subject

keywords

Verbal Comprehension, Multimodality, Audiovisual integration, Gesture, Educational Technology

in

Lund University Cognitive Studies

volume

177

pages

180 pages

publisher

Lund University (Media-Tryck)

defense location

LUX C121

defense date

2020-02-21 10:15:00

ISSN

1101-8453

ISBN

978-91-88899-84-2

language

English

LU publication?

yes

id

bd79067e-bce2-46f4-84a2-6ee0807e8925

date added to LUP

2020-01-23 13:25:12

date last changed

2025-04-04 14:38:59

@phdthesis{bd79067e-bce2-46f4-84a2-6ee0807e8925,
  abstract     = {{This thesis addresses how verbal comprehension is affected by seeing the speaker and in particular when the speaker is an animated virtual speaker. Two people visually co-present – one talking and the other listening, trying to comprehend what is said – is a central and critical scenario whether one is interested in human cognition, communication or learning.<br>
Papers I &amp; II are focused on the effect on comprehension of seeing a virtual speaker displaying visual speech cues (lip and head movements accompanying speech). The results presented indicate a positive effect in the presence of background babble noise but no effect in its absence. The results presented in paper II also indicate that the effect of seeing the virtual speaker is at least as effective as seeing a real speaker, that the exploitation of visual speech cues by a virtual speaker may require some adaptation but is not affected by subjective perception of the virtual speakers’ social traits.<br>
Papers III &amp; IV focus on the effect of the temporal coordination of speech and gesture on memory encoding of speech, and the feasibility of a novel methodology to address this question. The objective of the methodology is the precise manipulating of individual gestures within naturalistic speech and gesture sequences recorded by motion capture and reproduced by virtual speakers. Results in paper III indicate that such temporal manipulations can be realized without subjective perception of the animation as unnatural as long as the shifted (manipulated) gestural movements temporally overlap with some speech (not pause or hesitation). Results of paper IV were that words accompanied by associated gestures in their original synchrony or gestures arriving earlier were more likely to be recalled. This mirrors the temporal coordination patterns that are common in natural speech-gesture production.<br>
Paper V explores how factual topics are comprehended and approached metacognitively when presented in different media, including a video of an animated virtual speaker with synthesized speech. They study made use of an interface where differences in information transience and navigation options are minimized between the media. Results indicate improved comprehension and a somewhat stronger tendency to repeat material when also seeing, compared to only listening to, the virtual speaker. Instances of navigation behaviours were, however, overall scarce and only tentative conclusions could be drawn regarding differences in metacognitive approaches between media.<br>
Paper VI presents a virtual replication of a choice blindness experimental paradigm. The results show that the level of detail of the presentation of a virtual environment and a speaker may affect self-reported presence as well the level of trust exhibited towards the speaker. <br>
The relevance of these findings is discussed with regards to how comprehension is affected by visible speakers in general and virtual speakers specifically, as well as possible consequences for the design and implementation of virtual speakers in educational applications and as research instruments.}},
  author       = {{Nirme, Jens}},
  isbn         = {{978-91-88899-84-2}},
  issn         = {{1101-8453}},
  keywords     = {{Verbal Comprehension; Multimodality; Audiovisual integration; Gesture; Educational Technology}},
  language     = {{eng}},
  month        = {{01}},
  publisher    = {{Lund University (Media-Tryck)}},
  school       = {{Lund University}},
  series       = {{Lund University Cognitive Studies}},
  title        = {{Understanding virtual speakers}},
  volume       = {{177}},
  year         = {{2020}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Understanding virtual speakers