DMEL: THE DIFFERENTIABLE LOG-MEL SPECTROGRAM AS A TRAINABLE LAYER IN NEURAL NETWORKS

Martinsson, John; Sandsten, Maria

DMEL: THE DIFFERENTIABLE LOG-MEL SPECTROGRAM AS A TRAINABLE LAYER IN NEURAL NETWORKS

Mark

Martinsson, John ^LU and Sandsten, Maria ^LU (2024) 49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings p.5005-5009

Abstract: In this paper we present the differentiable log-Mel spectrogram (DMEL) for audio classification. DMEL uses a Gaussian window, with a window length that can be jointly optimized with the neural network. DMEL is used as the input layer in different neural networks and evaluated on standard audio datasets. We show that DMEL achieves a higher average test accuracy for sub-optimal initial choices of the window length when compared to a baseline with a fixed window length. In addition, we analyse the computational cost of DMEL and compare to a standard hyperparameter search over different window lengths, showing favorable results for DMEL. Finally, an empirical evaluation on a carefully designed dataset is performed to investigate if the... (More); In this paper we present the differentiable log-Mel spectrogram (DMEL) for audio classification. DMEL uses a Gaussian window, with a window length that can be jointly optimized with the neural network. DMEL is used as the input layer in different neural networks and evaluated on standard audio datasets. We show that DMEL achieves a higher average test accuracy for sub-optimal initial choices of the window length when compared to a baseline with a fixed window length. In addition, we analyse the computational cost of DMEL and compare to a standard hyperparameter search over different window lengths, showing favorable results for DMEL. Finally, an empirical evaluation on a carefully designed dataset is performed to investigate if the differentiable spectrogram actually learns the optimal window length. The design of the dataset relies on the theory of spectrogram resolution. We also empirically evaluate the convergence rate to the optimal window length.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/32c15a29-82d8-4558-9537-0ece70f446fc

author

Martinsson, John ^LU and Sandsten, Maria ^LU

organization

publishing date

2024

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

Telecommunications

keywords

adaptive transforms, audio classification, Deep learning, learnable Mel spectrogram, STFT

host publication

2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings

series title

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

pages

5 pages

publisher

IEEE - Institute of Electrical and Electronics Engineers Inc.

conference name

49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024

conference location

Seoul, Korea, Republic of

conference dates

2024-04-14 - 2024-04-19

external identifiers

scopus:85195408870

ISSN

1520-6149

ISBN

9798350344851

DOI

10.1109/ICASSP48485.2024.10446816

language

English

LU publication?

yes

id

32c15a29-82d8-4558-9537-0ece70f446fc

date added to LUP

2024-09-12 14:27:35

date last changed

2025-10-14 09:29:20

@inproceedings{32c15a29-82d8-4558-9537-0ece70f446fc,
  abstract     = {{<p>In this paper we present the differentiable log-Mel spectrogram (DMEL) for audio classification. DMEL uses a Gaussian window, with a window length that can be jointly optimized with the neural network. DMEL is used as the input layer in different neural networks and evaluated on standard audio datasets. We show that DMEL achieves a higher average test accuracy for sub-optimal initial choices of the window length when compared to a baseline with a fixed window length. In addition, we analyse the computational cost of DMEL and compare to a standard hyperparameter search over different window lengths, showing favorable results for DMEL. Finally, an empirical evaluation on a carefully designed dataset is performed to investigate if the differentiable spectrogram actually learns the optimal window length. The design of the dataset relies on the theory of spectrogram resolution. We also empirically evaluate the convergence rate to the optimal window length.</p>}},
  author       = {{Martinsson, John and Sandsten, Maria}},
  booktitle    = {{2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings}},
  isbn         = {{9798350344851}},
  issn         = {{1520-6149}},
  keywords     = {{adaptive transforms; audio classification; Deep learning; learnable Mel spectrogram; STFT}},
  language     = {{eng}},
  pages        = {{5005--5009}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings}},
  title        = {{DMEL: THE DIFFERENTIABLE LOG-MEL SPECTROGRAM AS A TRAINABLE LAYER IN NEURAL NETWORKS}},
  url          = {{http://dx.doi.org/10.1109/ICASSP48485.2024.10446816}},
  doi          = {{10.1109/ICASSP48485.2024.10446816}},
  year         = {{2024}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

DMEL: THE DIFFERENTIABLE LOG-MEL SPECTROGRAM AS A TRAINABLE LAYER IN NEURAL NETWORKS