Learning an interpretable end-to-end network for real-time acoustic beamforming

Liang, Hao; Zhou, Guanxing; Tu, Xiaotong; Jakobsson, Andreas; Ding, Xinghao; Huang, Yue

Learning an interpretable end-to-end network for real-time acoustic beamforming

Mark

Liang, Hao ; Zhou, Guanxing ; Tu, Xiaotong ^LU

; Jakobsson, Andreas ^LU

; Ding, Xinghao and Huang, Yue (2024) In Journal of Sound and Vibration 591.

Abstract: Recently, many forms of audio industrial applications, such as sound monitoring and source localization, have begun exploiting smart multi-modal devices equipped with a microphone array. Regrettably, model-based methods are often difficult to employ for such devices due to their high computational complexity, as well as the difficulty of appropriately selecting the user-determined parameters. As an alternative, one may use deep network-based methods, but these are often difficult to generalize, nor can they generate the desired beamforming map directly. In this paper, a computationally efficient acoustic beamforming algorithm is proposed, which may be unrolled to form a model-based deep learning network for real-time imaging, here... (More); Recently, many forms of audio industrial applications, such as sound monitoring and source localization, have begun exploiting smart multi-modal devices equipped with a microphone array. Regrettably, model-based methods are often difficult to employ for such devices due to their high computational complexity, as well as the difficulty of appropriately selecting the user-determined parameters. As an alternative, one may use deep network-based methods, but these are often difficult to generalize, nor can they generate the desired beamforming map directly. In this paper, a computationally efficient acoustic beamforming algorithm is proposed, which may be unrolled to form a model-based deep learning network for real-time imaging, here termed the DAMAS-FISTA-Net. By exploiting the natural structure of an acoustic beamformer, the proposed network inherits the physical knowledge of the acoustic system, and thus learns the underlying physical properties of the propagation. As a result, all the network parameters may be learned end-to-end, guided by a model-based prior using back-propagation. Notably, the proposed network enables an excellent interpretability and the ability of being able to process the raw data directly. Extensive numerical experiments using both simulated and real-world data illustrate the preferable performance of the DAMAS-FISTA-Net as compared to alternative approaches.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/7a5a6121-3616-420b-bf3d-6a0cc6c8d189

author

Liang, Hao ; Zhou, Guanxing ; Tu, Xiaotong ^LU

; Jakobsson, Andreas ^LU

; Ding, Xinghao and Huang, Yue

organization

publishing date

2024-11

type

Contribution to journal

publication status

published

subject

Signal Processing

keywords

Acoustic beamforming, Acoustic imaging, Array signal processing, Interpretable network, Model-based deep learning, Source localization

in

Journal of Sound and Vibration

volume

591

article number

118620

publisher

Academic Press

external identifiers

scopus:85198609264

ISSN

0022-460X

DOI

10.1016/j.jsv.2024.118620

project

Statistical Signal Processing Group

Biomedical Modelling and Computation

language

English

LU publication?

yes

id

7a5a6121-3616-420b-bf3d-6a0cc6c8d189

date added to LUP

2024-08-26 15:47:59

date last changed

2026-02-11 07:30:16

@article{7a5a6121-3616-420b-bf3d-6a0cc6c8d189,
  abstract     = {{<p>Recently, many forms of audio industrial applications, such as sound monitoring and source localization, have begun exploiting smart multi-modal devices equipped with a microphone array. Regrettably, model-based methods are often difficult to employ for such devices due to their high computational complexity, as well as the difficulty of appropriately selecting the user-determined parameters. As an alternative, one may use deep network-based methods, but these are often difficult to generalize, nor can they generate the desired beamforming map directly. In this paper, a computationally efficient acoustic beamforming algorithm is proposed, which may be unrolled to form a model-based deep learning network for real-time imaging, here termed the DAMAS-FISTA-Net. By exploiting the natural structure of an acoustic beamformer, the proposed network inherits the physical knowledge of the acoustic system, and thus learns the underlying physical properties of the propagation. As a result, all the network parameters may be learned end-to-end, guided by a model-based prior using back-propagation. Notably, the proposed network enables an excellent interpretability and the ability of being able to process the raw data directly. Extensive numerical experiments using both simulated and real-world data illustrate the preferable performance of the DAMAS-FISTA-Net as compared to alternative approaches.</p>}},
  author       = {{Liang, Hao and Zhou, Guanxing and Tu, Xiaotong and Jakobsson, Andreas and Ding, Xinghao and Huang, Yue}},
  issn         = {{0022-460X}},
  keywords     = {{Acoustic beamforming; Acoustic imaging; Array signal processing; Interpretable network; Model-based deep learning; Source localization}},
  language     = {{eng}},
  publisher    = {{Academic Press}},
  series       = {{Journal of Sound and Vibration}},
  title        = {{Learning an interpretable end-to-end network for real-time acoustic beamforming}},
  url          = {{http://dx.doi.org/10.1016/j.jsv.2024.118620}},
  doi          = {{10.1016/j.jsv.2024.118620}},
  volume       = {{591}},
  year         = {{2024}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Learning an interpretable end-to-end network for real-time acoustic beamforming