Acoustic estimation of voice roughness
(2025) In Attention, Perception & Psychophysics- Abstract
- Roughness is a perceptual characteristic of sound that was first applied to musical consonance and dissonance, but it is increasingly recognized as a central aspect of voice quality in human and animal communication. It may be particularly important for asserting social dominance or attracting attention in urgent signals such as screams. To ensure that the results of roughness research are valid and consistent across studies, we need standard methodology for measuring it. I review the literature on roughness estimation, from classic psychoacoustics to more recent approaches, and present two collections of 602 human vocal samples whose roughness was rated by 162 listeners in perceptual experiments. Two algorithms for estimating roughness... (More)
- Roughness is a perceptual characteristic of sound that was first applied to musical consonance and dissonance, but it is increasingly recognized as a central aspect of voice quality in human and animal communication. It may be particularly important for asserting social dominance or attracting attention in urgent signals such as screams. To ensure that the results of roughness research are valid and consistent across studies, we need standard methodology for measuring it. I review the literature on roughness estimation, from classic psychoacoustics to more recent approaches, and present two collections of 602 human vocal samples whose roughness was rated by 162 listeners in perceptual experiments. Two algorithms for estimating roughness acoustically from modulation spectra are then presented and optimized to match the human ratings. One uses a bank of gammatone or Butterworth filters to obtain an auditory spectrogram, and a faster algorithm begins with a conventional spectrogram obtained with Short-Time Fourier transform; both explain ~ 50% of variance in average human ratings per stimulus. The range of modulation frequencies most relevant to roughness perception is [50, 200] Hz; this range can be selected with simple cutoff points or with a lognormal weighting function. Modulation and roughness spectrograms are proposed as visual aids for studying the dynamics of roughness in longer recordings. The described algorithms are implemented in the function modulationSpectrum() from the open-source R library soundgen. The audio recordings and their ratings are freely available from https://osf.io/gvcpx/ and can be used for benchmarking other algorithms. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/22ecc638-e811-4a67-bd5c-484b886a6e12
- author
- Anikin, Andrey
LU
- organization
- publishing date
- 2025
- type
- Contribution to journal
- publication status
- published
- subject
- in
- Attention, Perception & Psychophysics
- pages
- 17 pages
- publisher
- Springer
- external identifiers
-
- pmid:40295423
- ISSN
- 1943-3921
- DOI
- 10.3758/s13414-025-03060-3
- language
- English
- LU publication?
- yes
- id
- 22ecc638-e811-4a67-bd5c-484b886a6e12
- date added to LUP
- 2025-04-29 07:37:50
- date last changed
- 2025-05-09 03:00:02
@article{22ecc638-e811-4a67-bd5c-484b886a6e12, abstract = {{Roughness is a perceptual characteristic of sound that was first applied to musical consonance and dissonance, but it is increasingly recognized as a central aspect of voice quality in human and animal communication. It may be particularly important for asserting social dominance or attracting attention in urgent signals such as screams. To ensure that the results of roughness research are valid and consistent across studies, we need standard methodology for measuring it. I review the literature on roughness estimation, from classic psychoacoustics to more recent approaches, and present two collections of 602 human vocal samples whose roughness was rated by 162 listeners in perceptual experiments. Two algorithms for estimating roughness acoustically from modulation spectra are then presented and optimized to match the human ratings. One uses a bank of gammatone or Butterworth filters to obtain an auditory spectrogram, and a faster algorithm begins with a conventional spectrogram obtained with Short-Time Fourier transform; both explain ~ 50% of variance in average human ratings per stimulus. The range of modulation frequencies most relevant to roughness perception is [50, 200] Hz; this range can be selected with simple cutoff points or with a lognormal weighting function. Modulation and roughness spectrograms are proposed as visual aids for studying the dynamics of roughness in longer recordings. The described algorithms are implemented in the function modulationSpectrum() from the open-source R library soundgen. The audio recordings and their ratings are freely available from https://osf.io/gvcpx/ and can be used for benchmarking other algorithms.}}, author = {{Anikin, Andrey}}, issn = {{1943-3921}}, language = {{eng}}, publisher = {{Springer}}, series = {{Attention, Perception & Psychophysics}}, title = {{Acoustic estimation of voice roughness}}, url = {{http://dx.doi.org/10.3758/s13414-025-03060-3}}, doi = {{10.3758/s13414-025-03060-3}}, year = {{2025}}, }