Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Keyword Transformer: A Self-Attention Model for Keyword Spotting

Berg, Axel LU orcid ; O'Connor, Mark and Cruz, Miguel Tairum (2021) Interspeech 2021 In Interspeech p.4249-4253
Abstract
The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark... (More)
The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark records on the Google Speech Commands dataset with 98.6% and 97.7% accuracy on the 12 and 35-command tasks respectively.
(Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
keywords
keyword spotting, machine learning, machine learning, keyword spotting, transformer, speech recognition
host publication
Proc. Interspeech 2021
series title
Interspeech
pages
5 pages
publisher
ISCA
conference name
Interspeech 2021
conference location
Brno, Czech Republic
conference dates
2021-08-30 - 2021-09-03
external identifiers
  • scopus:85118515087
DOI
10.21437/Interspeech.2021-1286
project
Deep Learning for Simultaneous Localization and Mapping
language
English
LU publication?
yes
id
e40ec581-1249-4263-9b7a-2113427f6f05
alternative location
https://arxiv.org/abs/2104.00769
date added to LUP
2021-05-04 15:31:17
date last changed
2022-04-27 04:26:44
@inproceedings{e40ec581-1249-4263-9b7a-2113427f6f05,
  abstract     = {{The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark records on the Google Speech Commands dataset with 98.6% and 97.7% accuracy on the 12 and 35-command tasks respectively.<br/>}},
  author       = {{Berg, Axel and O'Connor, Mark and Cruz, Miguel Tairum}},
  booktitle    = {{Proc. Interspeech 2021}},
  keywords     = {{keyword spotting; machine learning; machine learning; keyword spotting; transformer; speech recognition}},
  language     = {{eng}},
  month        = {{08}},
  pages        = {{4249--4253}},
  publisher    = {{ISCA}},
  series       = {{Interspeech}},
  title        = {{Keyword Transformer: A Self-Attention Model for Keyword Spotting}},
  url          = {{http://dx.doi.org/10.21437/Interspeech.2021-1286}},
  doi          = {{10.21437/Interspeech.2021-1286}},
  year         = {{2021}},
}