Acoustic estimation of voice roughness

Anikin, Andrey

Acoustic estimation of voice roughness

Mark

Anikin, Andrey ^LU

(2025) In Attention, Perception & Psychophysics

Abstract: Roughness is a perceptual characteristic of sound that was first applied to musical consonance and dissonance, but it is increasingly recognized as a central aspect of voice quality in human and animal communication. It may be particularly important for asserting social dominance or attracting attention in urgent signals such as screams. To ensure that the results of roughness research are valid and consistent across studies, we need standard methodology for measuring it. I review the literature on roughness estimation, from classic psychoacoustics to more recent approaches, and present two collections of 602 human vocal samples whose roughness was rated by 162 listeners in perceptual experiments. Two algorithms for estimating roughness... (More); Roughness is a perceptual characteristic of sound that was first applied to musical consonance and dissonance, but it is increasingly recognized as a central aspect of voice quality in human and animal communication. It may be particularly important for asserting social dominance or attracting attention in urgent signals such as screams. To ensure that the results of roughness research are valid and consistent across studies, we need standard methodology for measuring it. I review the literature on roughness estimation, from classic psychoacoustics to more recent approaches, and present two collections of 602 human vocal samples whose roughness was rated by 162 listeners in perceptual experiments. Two algorithms for estimating roughness acoustically from modulation spectra are then presented and optimized to match the human ratings. One uses a bank of gammatone or Butterworth filters to obtain an auditory spectrogram, and a faster algorithm begins with a conventional spectrogram obtained with Short-Time Fourier transform; both explain ~ 50% of variance in average human ratings per stimulus. The range of modulation frequencies most relevant to roughness perception is [50, 200] Hz; this range can be selected with simple cutoff points or with a lognormal weighting function. Modulation and roughness spectrograms are proposed as visual aids for studying the dynamics of roughness in longer recordings. The described algorithms are implemented in the function modulationSpectrum() from the open-source R library soundgen. The audio recordings and their ratings are freely available from https://osf.io/gvcpx/ and can be used for benchmarking other algorithms. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/22ecc638-e811-4a67-bd5c-484b886a6e12

author

Anikin, Andrey ^LU

organization

publishing date

2025

type

Contribution to journal

publication status

published

subject

in

Attention, Perception & Psychophysics

pages

17 pages

publisher

Springer

external identifiers

pmid:40295423
scopus:105003769864

ISSN

1943-3921

DOI

10.3758/s13414-025-03060-3

language

English

LU publication?

yes

id

22ecc638-e811-4a67-bd5c-484b886a6e12

date added to LUP

2025-04-29 07:37:50

date last changed

2025-07-03 04:03:06

@article{22ecc638-e811-4a67-bd5c-484b886a6e12,
  abstract     = {{Roughness is a perceptual characteristic of sound that was first applied to musical consonance and dissonance, but it is increasingly recognized as a central aspect of voice quality in human and animal communication. It may be particularly important for asserting social dominance or attracting attention in urgent signals such as screams. To ensure that the results of roughness research are valid and consistent across studies, we need standard methodology for measuring it. I review the literature on roughness estimation, from classic psychoacoustics to more recent approaches, and present two collections of 602 human vocal samples whose roughness was rated by 162 listeners in perceptual experiments. Two algorithms for estimating roughness acoustically from modulation spectra are then presented and optimized to match the human ratings. One uses a bank of gammatone or Butterworth filters to obtain an auditory spectrogram, and a faster algorithm begins with a conventional spectrogram obtained with Short-Time Fourier transform; both explain ~ 50% of variance in average human ratings per stimulus. The range of modulation frequencies most relevant to roughness perception is [50, 200] Hz; this range can be selected with simple cutoff points or with a lognormal weighting function. Modulation and roughness spectrograms are proposed as visual aids for studying the dynamics of roughness in longer recordings. The described algorithms are implemented in the function modulationSpectrum() from the open-source R library soundgen. The audio recordings and their ratings are freely available from https://osf.io/gvcpx/ and can be used for benchmarking other algorithms.}},
  author       = {{Anikin, Andrey}},
  issn         = {{1943-3921}},
  language     = {{eng}},
  publisher    = {{Springer}},
  series       = {{Attention, Perception & Psychophysics}},
  title        = {{Acoustic estimation of voice roughness}},
  url          = {{http://dx.doi.org/10.3758/s13414-025-03060-3}},
  doi          = {{10.3758/s13414-025-03060-3}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Acoustic estimation of voice roughness