Keyword Transformer: A Self-Attention Model for Keyword Spotting

Berg, Axel; O'Connor, Mark; Cruz, Miguel Tairum

Keyword Transformer: A Self-Attention Model for Keyword Spotting

Mark

Berg, Axel ^LU

; O'Connor, Mark and Cruz, Miguel Tairum (2021) Interspeech 2021 In Interspeech p.4249-4253

Abstract: The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark... (More); The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark records on the Google Speech Commands dataset with 98.6% and 97.7% accuracy on the 12 and 35-command tasks respectively.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/e40ec581-1249-4263-9b7a-2113427f6f05

author

Berg, Axel ^LU

; O'Connor, Mark and Cruz, Miguel Tairum

organization

Mathematics (Faculty of Engineering)

publishing date

2021-08-30

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

keywords

keyword spotting, machine learning, machine learning, keyword spotting, transformer, speech recognition

host publication

Proc. Interspeech 2021

series title

Interspeech

pages

5 pages

publisher

ISCA

conference name

Interspeech 2021

conference location

Brno, Czech Republic

conference dates

2021-08-30 - 2021-09-03

external identifiers

scopus:85118515087

DOI

10.21437/Interspeech.2021-1286

project

Deep Learning for Simultaneous Localization and Mapping

language

English

LU publication?

yes

id

e40ec581-1249-4263-9b7a-2113427f6f05

alternative location

https://arxiv.org/abs/2104.00769

date added to LUP

2021-05-04 15:31:17

date last changed

2025-10-14 11:33:48

@inproceedings{e40ec581-1249-4263-9b7a-2113427f6f05,
  abstract     = {{The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark records on the Google Speech Commands dataset with 98.6% and 97.7% accuracy on the 12 and 35-command tasks respectively.<br/>}},
  author       = {{Berg, Axel and O'Connor, Mark and Cruz, Miguel Tairum}},
  booktitle    = {{Proc. Interspeech 2021}},
  keywords     = {{keyword spotting; machine learning; machine learning; keyword spotting; transformer; speech recognition}},
  language     = {{eng}},
  month        = {{08}},
  pages        = {{4249--4253}},
  publisher    = {{ISCA}},
  series       = {{Interspeech}},
  title        = {{Keyword Transformer: A Self-Attention Model for Keyword Spotting}},
  url          = {{http://dx.doi.org/10.21437/Interspeech.2021-1286}},
  doi          = {{10.21437/Interspeech.2021-1286}},
  year         = {{2021}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Keyword Transformer: A Self-Attention Model for Keyword Spotting