Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

A practical guide to calculating vocal tract length and scale-invariant formant patterns

Anikin, Andrey LU orcid ; Barreda, Santiago and Reby, David (2023) In Behavior Research Methods
Abstract
Formants (vocal tract resonances) are increasingly analyzed not only by phoneticians in speech but also by behavioral scientists studying diverse phenomena such as acoustic size exaggeration and articulatory abilities of non-human animals. This often involves estimating vocal tract length acoustically and producing scale-invariant representations of formant patterns. We present a theoretical framework and practical tools for carrying out this work, including open-source software solutions included in R packages soundgen and phonTools. Automatic formant measurement with linear predictive coding is error-prone, but formant_app provides an integrated environment for formant annotation and correction with visual and auditory feedback. Once... (More)
Formants (vocal tract resonances) are increasingly analyzed not only by phoneticians in speech but also by behavioral scientists studying diverse phenomena such as acoustic size exaggeration and articulatory abilities of non-human animals. This often involves estimating vocal tract length acoustically and producing scale-invariant representations of formant patterns. We present a theoretical framework and practical tools for carrying out this work, including open-source software solutions included in R packages soundgen and phonTools. Automatic formant measurement with linear predictive coding is error-prone, but formant_app provides an integrated environment for formant annotation and correction with visual and auditory feedback. Once measured, formants can be normalized using a single recording (intrinsic methods) or multiple recordings from the same individual (extrinsic methods). Intrinsic speaker normalization can be as simple as taking formant ratios and calculating the geometric mean as a measure of overall scale. The regression method implemented in the function estimateVTL calculates the apparent vocal tract length assuming a single-tube model, while its residuals provide a scale-invariant vowel space based on how far each formant deviates from equal spacing (the schwa function). Extrinsic speaker normalization provides more accurate estimates of speaker- and vowel-specific scale factors by pooling information across recordings with simple averaging or mixed models, which we illustrate with example datasets and R code. The take-home messages are to record several calls or vowels per individual, measure at least three or four formants, check formant measurements manually, treat uncertain values as missing, and use the statistical tools best suited to each modeling context. (Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Formants, Speaker normalization, Vocal tract length normalization, Vowel, Body size
in
Behavior Research Methods
pages
17 pages
publisher
Springer
external identifiers
  • scopus:85180849618
  • pmid:38158551
ISSN
1554-3528
DOI
10.3758/s13428-023-02288-x
language
English
LU publication?
yes
id
b1ecf2af-e1c3-47ae-bb32-6970bde8421a
date added to LUP
2023-12-30 07:01:15
date last changed
2024-03-31 03:00:05
@article{b1ecf2af-e1c3-47ae-bb32-6970bde8421a,
  abstract     = {{Formants (vocal tract resonances) are increasingly analyzed not only by phoneticians in speech but also by behavioral scientists studying diverse phenomena such as acoustic size exaggeration and articulatory abilities of non-human animals. This often involves estimating vocal tract length acoustically and producing scale-invariant representations of formant patterns. We present a theoretical framework and practical tools for carrying out this work, including open-source software solutions included in R packages soundgen and phonTools. Automatic formant measurement with linear predictive coding is error-prone, but formant_app provides an integrated environment for formant annotation and correction with visual and auditory feedback. Once measured, formants can be normalized using a single recording (intrinsic methods) or multiple recordings from the same individual (extrinsic methods). Intrinsic speaker normalization can be as simple as taking formant ratios and calculating the geometric mean as a measure of overall scale. The regression method implemented in the function estimateVTL calculates the apparent vocal tract length assuming a single-tube model, while its residuals provide a scale-invariant vowel space based on how far each formant deviates from equal spacing (the schwa function). Extrinsic speaker normalization provides more accurate estimates of speaker- and vowel-specific scale factors by pooling information across recordings with simple averaging or mixed models, which we illustrate with example datasets and R code. The take-home messages are to record several calls or vowels per individual, measure at least three or four formants, check formant measurements manually, treat uncertain values as missing, and use the statistical tools best suited to each modeling context.}},
  author       = {{Anikin, Andrey and Barreda, Santiago and Reby, David}},
  issn         = {{1554-3528}},
  keywords     = {{Formants; Speaker normalization; Vocal tract length normalization; Vowel; Body size}},
  language     = {{eng}},
  publisher    = {{Springer}},
  series       = {{Behavior Research Methods}},
  title        = {{A practical guide to calculating vocal tract length and scale-invariant formant patterns}},
  url          = {{http://dx.doi.org/10.3758/s13428-023-02288-x}},
  doi          = {{10.3758/s13428-023-02288-x}},
  year         = {{2023}},
}