Keyword Transformer: A Self-Attention Model for Keyword Spotting
(2021) Interspeech 2021 In Interspeech p.4249-4253- Abstract
- The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark... (More)
- The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark records on the Google Speech Commands dataset with 98.6% and 97.7% accuracy on the 12 and 35-command tasks respectively.
(Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/e40ec581-1249-4263-9b7a-2113427f6f05
- author
- Berg, Axel
LU
; O'Connor, Mark and Cruz, Miguel Tairum
- organization
- publishing date
- 2021-08-30
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- keywords
- keyword spotting, machine learning, machine learning, keyword spotting, transformer, speech recognition
- host publication
- Proc. Interspeech 2021
- series title
- Interspeech
- pages
- 5 pages
- publisher
- ISCA
- conference name
- Interspeech 2021
- conference location
- Brno, Czech Republic
- conference dates
- 2021-08-30 - 2021-09-03
- external identifiers
-
- scopus:85118515087
- DOI
- 10.21437/Interspeech.2021-1286
- project
- Deep Learning for Simultaneous Localization and Mapping
- language
- English
- LU publication?
- yes
- id
- e40ec581-1249-4263-9b7a-2113427f6f05
- alternative location
- https://arxiv.org/abs/2104.00769
- date added to LUP
- 2021-05-04 15:31:17
- date last changed
- 2025-04-04 15:24:22
@inproceedings{e40ec581-1249-4263-9b7a-2113427f6f05, abstract = {{The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark records on the Google Speech Commands dataset with 98.6% and 97.7% accuracy on the 12 and 35-command tasks respectively.<br/>}}, author = {{Berg, Axel and O'Connor, Mark and Cruz, Miguel Tairum}}, booktitle = {{Proc. Interspeech 2021}}, keywords = {{keyword spotting; machine learning; machine learning; keyword spotting; transformer; speech recognition}}, language = {{eng}}, month = {{08}}, pages = {{4249--4253}}, publisher = {{ISCA}}, series = {{Interspeech}}, title = {{Keyword Transformer: A Self-Attention Model for Keyword Spotting}}, url = {{http://dx.doi.org/10.21437/Interspeech.2021-1286}}, doi = {{10.21437/Interspeech.2021-1286}}, year = {{2021}}, }