Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Implementing Audio Localization using a Network of Body Worn Cameras

Åhlund, Jim LU and Weidow, Eric LU (2025) EITM01 20251
Department of Electrical and Information Technology
Abstract
Audio localization often utilizes arrays of microphones to estimate the origin of sounds, where the recorded audio is usually processed on one central machine.

In this thesis, the goal is instead to perform audio localization with dual microphones on distributed body worn cameras, where transmission of recorded audio is not permitted. Therefore, two separate algorithms are created: One performs direction estimations and represents this as metadata, and the other algorithm uses metadata for a sound to estimate the audio location.

The direction estimation is implemented with time difference of arrival (TDOA), using cross-correlation (CC) to determine time differences between microphone channels. The TDOA is combined with position,... (More)
Audio localization often utilizes arrays of microphones to estimate the origin of sounds, where the recorded audio is usually processed on one central machine.

In this thesis, the goal is instead to perform audio localization with dual microphones on distributed body worn cameras, where transmission of recorded audio is not permitted. Therefore, two separate algorithms are created: One performs direction estimations and represents this as metadata, and the other algorithm uses metadata for a sound to estimate the audio location.

The direction estimation is implemented with time difference of arrival (TDOA), using cross-correlation (CC) to determine time differences between microphone channels. The TDOA is combined with position, compass and timestamp data to create metadata. The location estimation implements a numerical approach where a grid of probabilities is used to determine the most probable audio location. The concept of virtual cameras is also introduced, where TDOA is done between the timestamps of sounds registered by pairs of cameras to provide additional direction estimations.

Several configurations are investigated and compared. The results show that the direction estimation has a high accuracy right in front and behind the camera, but lower accuracy at angles of incidence close to ±90◦, resulting in an average error of 8 − 9◦ irrespective of which CC-method is used. The accuracy of the entire system varies significantly with the configuration used. The configuration with highest accuracy in this report uses cross-correlation with virtual cameras, resulting in an accuracy between 2.3 m and 19.2 m in the different test setups, when using data from six cameras. For this configuration, using six cameras results in the average error being 44% of the error when using only two cameras.

In conclusion, the system shows potential of accurate sound source localization but also high volatility as the accuracy can vary significantly depending on the configuration and camera setup. (Less)
Popular Abstract
Body worn cameras (BWCs) are used by security and police officers worldwide. Determining the location of gunshots, screams, and other loud sounds can be crucial in emergency situations. Using microphones in existing BWCs to automatically localize these sounds in real-time can enable faster and more accurate responses. This report implements and tests this solution to investigate the possibility of performing this localization in real-world scenarios.

To illustrate the technicalities of this implementation, consider a scenario where two police officers hear a gunshot and want to locate its origin. A person's brain can estimate sound direction based on the difference in a sound's arrival times between the left and right ears. For example,... (More)
Body worn cameras (BWCs) are used by security and police officers worldwide. Determining the location of gunshots, screams, and other loud sounds can be crucial in emergency situations. Using microphones in existing BWCs to automatically localize these sounds in real-time can enable faster and more accurate responses. This report implements and tests this solution to investigate the possibility of performing this localization in real-world scenarios.

To illustrate the technicalities of this implementation, consider a scenario where two police officers hear a gunshot and want to locate its origin. A person's brain can estimate sound direction based on the difference in a sound's arrival times between the left and right ears. For example, a sound coming from the left would arrive at the left ear before the right one, and a sound from in front would arrive at both ears at the same time. Just knowing one direction is insufficient to determine a location if the distance is unknown. However, if both officers share their positions when they heard a gunshot, what directions they were facing, and what directions they estimated that the sound came from, an intersection point can be determined. However, any errors in their reported locations, directions or direction estimations would result in this intersection point being different from the real sound origin.

With more than two officers, any errors instead result in multiple intersections. To address this, the officers might acknowledge that their direction estimations were not perfect and that directions to the left or right of their estimated direction are less probable but still possible. Compiling these estimates instead yields an area of probabilities, where the most probable point can be selected as the estimated audio location.

The implementation in this report works in the same way. Dual microphones on each BWC can be used to determine differences in arrival times, as in the case with a person's ears. Knowing the BWCs' positions and orientations via a Global Positioning System receiver and compass sensor, it is then possible to estimate the sound's location. Importantly, the direction estimations occur within each BWC, instead of sending all audio recordings to one computer and performing both the direction- and location estimations in this computer. This ensures that the implementation does not require potentially sensitive audio recordings to be available to anyone other than the customer.

To evaluate the implementation, three things were investigated: First of all, how accurately does each BWC estimate the direction of sounds? Secondly, assuming several BWCs register the same sound, how accurately can these estimate the location of the sound? And finally, does the accuracy of this estimation increase if more BWCs registered the same sound? These evaluations provide valuable insights into the effectiveness of the system and the potential for being applied in real-world scenarios. (Less)
Please use this url to cite or link to this publication:
author
Åhlund, Jim LU and Weidow, Eric LU
supervisor
organization
course
EITM01 20251
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Sound Source Localization (SSL), Direction Estimation, Location Estimation, Body Worn Cameras (BWC), Dual Microphone Arrays, Distributed Microphone Arrays, Time Difference of Arrival (TDOA), Cross-Correlation (CC), Generalized Cross-Correlation (GCC), Phase Transform (PHAT)
report number
LU/LTH-EIT 2025-1083
language
English
id
9203989
date added to LUP
2025-07-02 14:07:15
date last changed
2025-07-02 14:07:15
@misc{9203989,
  abstract     = {{Audio localization often utilizes arrays of microphones to estimate the origin of sounds, where the recorded audio is usually processed on one central machine.

In this thesis, the goal is instead to perform audio localization with dual microphones on distributed body worn cameras, where transmission of recorded audio is not permitted. Therefore, two separate algorithms are created: One performs direction estimations and represents this as metadata, and the other algorithm uses metadata for a sound to estimate the audio location.

The direction estimation is implemented with time difference of arrival (TDOA), using cross-correlation (CC) to determine time differences between microphone channels. The TDOA is combined with position, compass and timestamp data to create metadata. The location estimation implements a numerical approach where a grid of probabilities is used to determine the most probable audio location. The concept of virtual cameras is also introduced, where TDOA is done between the timestamps of sounds registered by pairs of cameras to provide additional direction estimations.

Several configurations are investigated and compared. The results show that the direction estimation has a high accuracy right in front and behind the camera, but lower accuracy at angles of incidence close to ±90◦, resulting in an average error of 8 − 9◦ irrespective of which CC-method is used. The accuracy of the entire system varies significantly with the configuration used. The configuration with highest accuracy in this report uses cross-correlation with virtual cameras, resulting in an accuracy between 2.3 m and 19.2 m in the different test setups, when using data from six cameras. For this configuration, using six cameras results in the average error being 44% of the error when using only two cameras.

In conclusion, the system shows potential of accurate sound source localization but also high volatility as the accuracy can vary significantly depending on the configuration and camera setup.}},
  author       = {{Åhlund, Jim and Weidow, Eric}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Implementing Audio Localization using a Network of Body Worn Cameras}},
  year         = {{2025}},
}