Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

What Else is New Than the Hamming Window? Robust MFCCs for Speaker Recognition via Multitapering

Kinnunen, Tomi ; Saeidi, Rahim ; Sandberg, Johan LU and Sandsten, Maria LU (2010) Interspeech 2010 p.2734-2737
Abstract
Usually the mel-frequency cepstral coefficients (MFCCs) are derived via Hamming windowed DFT spectrum. In this paper, we advocate to use a so-called multitaper method instead. Multitaper methods form a spectrum estimate using multiple window functions and frequency-domain averaging. Multitapers provide a robust spectrum estimate but have not received much attention in speech processing. Our speaker recognition experiment on NIST 2002 yields equal error rates (EERs) of 9.66 % (clean data) and 16.41 % (-10 dB SNR) for the conventional Hamming method and 8.13 % (clean data) and 14.63 % (-10 dB SNR) using multitapers. Multitapering is a simple and robust alternative to the Hamming window method.
Please use this url to cite or link to this publication:
author
; ; and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
speaker verification, multiple window method
host publication
InterSpecch 2010
pages
2734 - 2737
conference name
Interspeech 2010
conference location
Makuhari, Japan
conference dates
0001-01-02
external identifiers
  • scopus:79959826333
language
English
LU publication?
yes
id
ab9b6427-8f65-4cd1-8918-91a68f028072 (old id 1718661)
alternative location
http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/MultiTaper_Interspeech2010.pdf
date added to LUP
2016-04-04 14:38:17
date last changed
2022-02-21 17:05:50
@inproceedings{ab9b6427-8f65-4cd1-8918-91a68f028072,
  abstract     = {{Usually the mel-frequency cepstral coefficients (MFCCs) are derived via Hamming windowed DFT spectrum. In this paper, we advocate to use a so-called multitaper method instead. Multitaper methods form a spectrum estimate using multiple window functions and frequency-domain averaging. Multitapers provide a robust spectrum estimate but have not received much attention in speech processing. Our speaker recognition experiment on NIST 2002 yields equal error rates (EERs) of 9.66 % (clean data) and 16.41 % (-10 dB SNR) for the conventional Hamming method and 8.13 % (clean data) and 14.63 % (-10 dB SNR) using multitapers. Multitapering is a simple and robust alternative to the Hamming window method.}},
  author       = {{Kinnunen, Tomi and Saeidi, Rahim and Sandberg, Johan and Sandsten, Maria}},
  booktitle    = {{InterSpecch 2010}},
  keywords     = {{speaker verification; multiple window method}},
  language     = {{eng}},
  pages        = {{2734--2737}},
  title        = {{What Else is New Than the Hamming Window? Robust MFCCs for Speaker Recognition via Multitapering}},
  url          = {{http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/MultiTaper_Interspeech2010.pdf}},
  year         = {{2010}},
}