Advanced

What Else is New Than the Hamming Window? Robust MFCCs for Speaker Recognition via Multitapering

Kinnunen, Tomi; Saeidi, Rahim; Sandberg, Johan LU and Sandsten, Maria LU (2010) Interspeech 2010 In InterSpecch 2010 p.2734-2737
Abstract
Usually the mel-frequency cepstral coefficients (MFCCs) are derived via Hamming windowed DFT spectrum. In this paper, we advocate to use a so-called multitaper method instead. Multitaper methods form a spectrum estimate using multiple window functions and frequency-domain averaging. Multitapers provide a robust spectrum estimate but have not received much attention in speech processing. Our speaker recognition experiment on NIST 2002 yields equal error rates (EERs) of 9.66 % (clean data) and 16.41 % (-10 dB SNR) for the conventional Hamming method and 8.13 % (clean data) and 14.63 % (-10 dB SNR) using multitapers. Multitapering is a simple and robust alternative to the Hamming window method.
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
speaker verification, multiple window method
in
InterSpecch 2010
pages
2734 - 2737
conference name
Interspeech 2010
external identifiers
  • Scopus:79959826333
language
English
LU publication?
yes
id
ab9b6427-8f65-4cd1-8918-91a68f028072 (old id 1718661)
alternative location
http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/MultiTaper_Interspeech2010.pdf
date added to LUP
2010-11-19 07:51:34
date last changed
2016-10-13 05:03:25
@misc{ab9b6427-8f65-4cd1-8918-91a68f028072,
  abstract     = {Usually the mel-frequency cepstral coefficients (MFCCs) are derived via Hamming windowed DFT spectrum. In this paper, we advocate to use a so-called multitaper method instead. Multitaper methods form a spectrum estimate using multiple window functions and frequency-domain averaging. Multitapers provide a robust spectrum estimate but have not received much attention in speech processing. Our speaker recognition experiment on NIST 2002 yields equal error rates (EERs) of 9.66 % (clean data) and 16.41 % (-10 dB SNR) for the conventional Hamming method and 8.13 % (clean data) and 14.63 % (-10 dB SNR) using multitapers. Multitapering is a simple and robust alternative to the Hamming window method.},
  author       = {Kinnunen, Tomi and Saeidi, Rahim and Sandberg, Johan and Sandsten, Maria},
  keyword      = {speaker verification,multiple window method},
  language     = {eng},
  pages        = {2734--2737},
  series       = {InterSpecch 2010},
  title        = {What Else is New Than the Hamming Window? Robust MFCCs for Speaker Recognition via Multitapering},
  year         = {2010},
}