Understanding virtual speakers
(2020) In Lund University Cognitive Studies 177.- Abstract
- This thesis addresses how verbal comprehension is affected by seeing the speaker and in particular when the speaker is an animated virtual speaker. Two people visually co-present – one talking and the other listening, trying to comprehend what is said – is a central and critical scenario whether one is interested in human cognition, communication or learning.
Papers I & II are focused on the effect on comprehension of seeing a virtual speaker displaying visual speech cues (lip and head movements accompanying speech). The results presented indicate a positive effect in the presence of background babble noise but no effect in its absence. The results presented in paper II also indicate that the effect of seeing the virtual speaker is... (More) - This thesis addresses how verbal comprehension is affected by seeing the speaker and in particular when the speaker is an animated virtual speaker. Two people visually co-present – one talking and the other listening, trying to comprehend what is said – is a central and critical scenario whether one is interested in human cognition, communication or learning.
Papers I & II are focused on the effect on comprehension of seeing a virtual speaker displaying visual speech cues (lip and head movements accompanying speech). The results presented indicate a positive effect in the presence of background babble noise but no effect in its absence. The results presented in paper II also indicate that the effect of seeing the virtual speaker is at least as effective as seeing a real speaker, that the exploitation of visual speech cues by a virtual speaker may require some adaptation but is not affected by subjective perception of the virtual speakers’ social traits.
Papers III & IV focus on the effect of the temporal coordination of speech and gesture on memory encoding of speech, and the feasibility of a novel methodology to address this question. The objective of the methodology is the precise manipulating of individual gestures within naturalistic speech and gesture sequences recorded by motion capture and reproduced by virtual speakers. Results in paper III indicate that such temporal manipulations can be realized without subjective perception of the animation as unnatural as long as the shifted (manipulated) gestural movements temporally overlap with some speech (not pause or hesitation). Results of paper IV were that words accompanied by associated gestures in their original synchrony or gestures arriving earlier were more likely to be recalled. This mirrors the temporal coordination patterns that are common in natural speech-gesture production.
Paper V explores how factual topics are comprehended and approached metacognitively when presented in different media, including a video of an animated virtual speaker with synthesized speech. They study made use of an interface where differences in information transience and navigation options are minimized between the media. Results indicate improved comprehension and a somewhat stronger tendency to repeat material when also seeing, compared to only listening to, the virtual speaker. Instances of navigation behaviours were, however, overall scarce and only tentative conclusions could be drawn regarding differences in metacognitive approaches between media.
Paper VI presents a virtual replication of a choice blindness experimental paradigm. The results show that the level of detail of the presentation of a virtual environment and a speaker may affect self-reported presence as well the level of trust exhibited towards the speaker.
The relevance of these findings is discussed with regards to how comprehension is affected by visible speakers in general and virtual speakers specifically, as well as possible consequences for the design and implementation of virtual speakers in educational applications and as research instruments. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/bd79067e-bce2-46f4-84a2-6ee0807e8925
- author
- Nirme, Jens LU
- supervisor
-
- Agneta Gulz LU
- Magnus Haake LU
- Marianne Gullberg LU
- opponent
-
- professor Catherine Pelachaud, Sorbonne Université
- organization
- publishing date
- 2020-01-24
- type
- Thesis
- publication status
- published
- subject
- keywords
- Verbal Comprehension, Multimodality, Audiovisual integration, Gesture, Educational Technology
- in
- Lund University Cognitive Studies
- volume
- 177
- pages
- 180 pages
- publisher
- Lund University (Media-Tryck)
- defense location
- LUX C121
- defense date
- 2020-02-21 10:15:00
- ISSN
- 1101-8453
- ISBN
- 978-91-88899-84-2
- language
- English
- LU publication?
- yes
- id
- bd79067e-bce2-46f4-84a2-6ee0807e8925
- date added to LUP
- 2020-01-23 13:25:12
- date last changed
- 2022-10-13 19:13:41
@phdthesis{bd79067e-bce2-46f4-84a2-6ee0807e8925, abstract = {{This thesis addresses how verbal comprehension is affected by seeing the speaker and in particular when the speaker is an animated virtual speaker. Two people visually co-present – one talking and the other listening, trying to comprehend what is said – is a central and critical scenario whether one is interested in human cognition, communication or learning.<br> Papers I & II are focused on the effect on comprehension of seeing a virtual speaker displaying visual speech cues (lip and head movements accompanying speech). The results presented indicate a positive effect in the presence of background babble noise but no effect in its absence. The results presented in paper II also indicate that the effect of seeing the virtual speaker is at least as effective as seeing a real speaker, that the exploitation of visual speech cues by a virtual speaker may require some adaptation but is not affected by subjective perception of the virtual speakers’ social traits.<br> Papers III & IV focus on the effect of the temporal coordination of speech and gesture on memory encoding of speech, and the feasibility of a novel methodology to address this question. The objective of the methodology is the precise manipulating of individual gestures within naturalistic speech and gesture sequences recorded by motion capture and reproduced by virtual speakers. Results in paper III indicate that such temporal manipulations can be realized without subjective perception of the animation as unnatural as long as the shifted (manipulated) gestural movements temporally overlap with some speech (not pause or hesitation). Results of paper IV were that words accompanied by associated gestures in their original synchrony or gestures arriving earlier were more likely to be recalled. This mirrors the temporal coordination patterns that are common in natural speech-gesture production.<br> Paper V explores how factual topics are comprehended and approached metacognitively when presented in different media, including a video of an animated virtual speaker with synthesized speech. They study made use of an interface where differences in information transience and navigation options are minimized between the media. Results indicate improved comprehension and a somewhat stronger tendency to repeat material when also seeing, compared to only listening to, the virtual speaker. Instances of navigation behaviours were, however, overall scarce and only tentative conclusions could be drawn regarding differences in metacognitive approaches between media.<br> Paper VI presents a virtual replication of a choice blindness experimental paradigm. The results show that the level of detail of the presentation of a virtual environment and a speaker may affect self-reported presence as well the level of trust exhibited towards the speaker. <br> The relevance of these findings is discussed with regards to how comprehension is affected by visible speakers in general and virtual speakers specifically, as well as possible consequences for the design and implementation of virtual speakers in educational applications and as research instruments.}}, author = {{Nirme, Jens}}, isbn = {{978-91-88899-84-2}}, issn = {{1101-8453}}, keywords = {{Verbal Comprehension; Multimodality; Audiovisual integration; Gesture; Educational Technology}}, language = {{eng}}, month = {{01}}, publisher = {{Lund University (Media-Tryck)}}, school = {{Lund University}}, series = {{Lund University Cognitive Studies}}, title = {{Understanding virtual speakers}}, volume = {{177}}, year = {{2020}}, }