Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Acoustic estimation of voice roughness

Anikin, Andrey LU orcid (2025) In Attention, Perception & Psychophysics
Abstract
Roughness is a perceptual characteristic of sound that was first applied to musical consonance and dissonance, but it is increasingly recognized as a central aspect of voice quality in human and animal communication. It may be particularly important for asserting social dominance or attracting attention in urgent signals such as screams. To ensure that the results of roughness research are valid and consistent across studies, we need standard methodology for measuring it. I review the literature on roughness estimation, from classic psychoacoustics to more recent approaches, and present two collections of 602 human vocal samples whose roughness was rated by 162 listeners in perceptual experiments. Two algorithms for estimating roughness... (More)
Roughness is a perceptual characteristic of sound that was first applied to musical consonance and dissonance, but it is increasingly recognized as a central aspect of voice quality in human and animal communication. It may be particularly important for asserting social dominance or attracting attention in urgent signals such as screams. To ensure that the results of roughness research are valid and consistent across studies, we need standard methodology for measuring it. I review the literature on roughness estimation, from classic psychoacoustics to more recent approaches, and present two collections of 602 human vocal samples whose roughness was rated by 162 listeners in perceptual experiments. Two algorithms for estimating roughness acoustically from modulation spectra are then presented and optimized to match the human ratings. One uses a bank of gammatone or Butterworth filters to obtain an auditory spectrogram, and a faster algorithm begins with a conventional spectrogram obtained with Short-Time Fourier transform; both explain ~ 50% of variance in average human ratings per stimulus. The range of modulation frequencies most relevant to roughness perception is [50, 200] Hz; this range can be selected with simple cutoff points or with a lognormal weighting function. Modulation and roughness spectrograms are proposed as visual aids for studying the dynamics of roughness in longer recordings. The described algorithms are implemented in the function modulationSpectrum() from the open-source R library soundgen. The audio recordings and their ratings are freely available from https://osf.io/gvcpx/ and can be used for benchmarking other algorithms. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
Attention, Perception & Psychophysics
pages
17 pages
publisher
Springer
external identifiers
  • pmid:40295423
ISSN
1943-3921
DOI
10.3758/s13414-025-03060-3
language
English
LU publication?
yes
id
22ecc638-e811-4a67-bd5c-484b886a6e12
date added to LUP
2025-04-29 07:37:50
date last changed
2025-05-09 03:00:02
@article{22ecc638-e811-4a67-bd5c-484b886a6e12,
  abstract     = {{Roughness is a perceptual characteristic of sound that was first applied to musical consonance and dissonance, but it is increasingly recognized as a central aspect of voice quality in human and animal communication. It may be particularly important for asserting social dominance or attracting attention in urgent signals such as screams. To ensure that the results of roughness research are valid and consistent across studies, we need standard methodology for measuring it. I review the literature on roughness estimation, from classic psychoacoustics to more recent approaches, and present two collections of 602 human vocal samples whose roughness was rated by 162 listeners in perceptual experiments. Two algorithms for estimating roughness acoustically from modulation spectra are then presented and optimized to match the human ratings. One uses a bank of gammatone or Butterworth filters to obtain an auditory spectrogram, and a faster algorithm begins with a conventional spectrogram obtained with Short-Time Fourier transform; both explain ~ 50% of variance in average human ratings per stimulus. The range of modulation frequencies most relevant to roughness perception is [50, 200] Hz; this range can be selected with simple cutoff points or with a lognormal weighting function. Modulation and roughness spectrograms are proposed as visual aids for studying the dynamics of roughness in longer recordings. The described algorithms are implemented in the function modulationSpectrum() from the open-source R library soundgen. The audio recordings and their ratings are freely available from https://osf.io/gvcpx/ and can be used for benchmarking other algorithms.}},
  author       = {{Anikin, Andrey}},
  issn         = {{1943-3921}},
  language     = {{eng}},
  publisher    = {{Springer}},
  series       = {{Attention, Perception & Psychophysics}},
  title        = {{Acoustic estimation of voice roughness}},
  url          = {{http://dx.doi.org/10.3758/s13414-025-03060-3}},
  doi          = {{10.3758/s13414-025-03060-3}},
  year         = {{2025}},
}