What Else is New Than the Hamming Window? Robust MFCCs for Speaker Recognition via Multitapering
(2010) Interspeech 2010 p.2734-2737- Abstract
- Usually the mel-frequency cepstral coefficients (MFCCs) are derived via Hamming windowed DFT spectrum. In this paper, we advocate to use a so-called multitaper method instead. Multitaper methods form a spectrum estimate using multiple window functions and frequency-domain averaging. Multitapers provide a robust spectrum estimate but have not received much attention in speech processing. Our speaker recognition experiment on NIST 2002 yields equal error rates (EERs) of 9.66 % (clean data) and 16.41 % (-10 dB SNR) for the conventional Hamming method and 8.13 % (clean data) and 14.63 % (-10 dB SNR) using multitapers. Multitapering is a simple and robust alternative to the Hamming window method.
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/1718661
- author
- Kinnunen, Tomi ; Saeidi, Rahim ; Sandberg, Johan LU and Sandsten, Maria LU
- organization
- publishing date
- 2010
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- keywords
- speaker verification, multiple window method
- host publication
- InterSpecch 2010
- pages
- 2734 - 2737
- conference name
- Interspeech 2010
- conference location
- Makuhari, Japan
- conference dates
- 0001-01-02
- external identifiers
-
- scopus:79959826333
- language
- English
- LU publication?
- yes
- id
- ab9b6427-8f65-4cd1-8918-91a68f028072 (old id 1718661)
- alternative location
- http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/MultiTaper_Interspeech2010.pdf
- date added to LUP
- 2016-04-04 14:38:17
- date last changed
- 2022-02-21 17:05:50
@inproceedings{ab9b6427-8f65-4cd1-8918-91a68f028072, abstract = {{Usually the mel-frequency cepstral coefficients (MFCCs) are derived via Hamming windowed DFT spectrum. In this paper, we advocate to use a so-called multitaper method instead. Multitaper methods form a spectrum estimate using multiple window functions and frequency-domain averaging. Multitapers provide a robust spectrum estimate but have not received much attention in speech processing. Our speaker recognition experiment on NIST 2002 yields equal error rates (EERs) of 9.66 % (clean data) and 16.41 % (-10 dB SNR) for the conventional Hamming method and 8.13 % (clean data) and 14.63 % (-10 dB SNR) using multitapers. Multitapering is a simple and robust alternative to the Hamming window method.}}, author = {{Kinnunen, Tomi and Saeidi, Rahim and Sandberg, Johan and Sandsten, Maria}}, booktitle = {{InterSpecch 2010}}, keywords = {{speaker verification; multiple window method}}, language = {{eng}}, pages = {{2734--2737}}, title = {{What Else is New Than the Hamming Window? Robust MFCCs for Speaker Recognition via Multitapering}}, url = {{http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/MultiTaper_Interspeech2010.pdf}}, year = {{2010}}, }