Implementing Audio Localization using a Network of Body Worn Cameras
(2025) EITM01 20251Department of Electrical and Information Technology
- Abstract
- Audio localization often utilizes arrays of microphones to estimate the origin of sounds, where the recorded audio is usually processed on one central machine.
In this thesis, the goal is instead to perform audio localization with dual microphones on distributed body worn cameras, where transmission of recorded audio is not permitted. Therefore, two separate algorithms are created: One performs direction estimations and represents this as metadata, and the other algorithm uses metadata for a sound to estimate the audio location.
The direction estimation is implemented with time difference of arrival (TDOA), using cross-correlation (CC) to determine time differences between microphone channels. The TDOA is combined with position,... (More) - Audio localization often utilizes arrays of microphones to estimate the origin of sounds, where the recorded audio is usually processed on one central machine.
In this thesis, the goal is instead to perform audio localization with dual microphones on distributed body worn cameras, where transmission of recorded audio is not permitted. Therefore, two separate algorithms are created: One performs direction estimations and represents this as metadata, and the other algorithm uses metadata for a sound to estimate the audio location.
The direction estimation is implemented with time difference of arrival (TDOA), using cross-correlation (CC) to determine time differences between microphone channels. The TDOA is combined with position, compass and timestamp data to create metadata. The location estimation implements a numerical approach where a grid of probabilities is used to determine the most probable audio location. The concept of virtual cameras is also introduced, where TDOA is done between the timestamps of sounds registered by pairs of cameras to provide additional direction estimations.
Several configurations are investigated and compared. The results show that the direction estimation has a high accuracy right in front and behind the camera, but lower accuracy at angles of incidence close to ±90◦, resulting in an average error of 8 − 9◦ irrespective of which CC-method is used. The accuracy of the entire system varies significantly with the configuration used. The configuration with highest accuracy in this report uses cross-correlation with virtual cameras, resulting in an accuracy between 2.3 m and 19.2 m in the different test setups, when using data from six cameras. For this configuration, using six cameras results in the average error being 44% of the error when using only two cameras.
In conclusion, the system shows potential of accurate sound source localization but also high volatility as the accuracy can vary significantly depending on the configuration and camera setup. (Less) - Popular Abstract
- Body worn cameras (BWCs) are used by security and police officers worldwide. Determining the location of gunshots, screams, and other loud sounds can be crucial in emergency situations. Using microphones in existing BWCs to automatically localize these sounds in real-time can enable faster and more accurate responses. This report implements and tests this solution to investigate the possibility of performing this localization in real-world scenarios.
To illustrate the technicalities of this implementation, consider a scenario where two police officers hear a gunshot and want to locate its origin. A person's brain can estimate sound direction based on the difference in a sound's arrival times between the left and right ears. For example,... (More) - Body worn cameras (BWCs) are used by security and police officers worldwide. Determining the location of gunshots, screams, and other loud sounds can be crucial in emergency situations. Using microphones in existing BWCs to automatically localize these sounds in real-time can enable faster and more accurate responses. This report implements and tests this solution to investigate the possibility of performing this localization in real-world scenarios.
To illustrate the technicalities of this implementation, consider a scenario where two police officers hear a gunshot and want to locate its origin. A person's brain can estimate sound direction based on the difference in a sound's arrival times between the left and right ears. For example, a sound coming from the left would arrive at the left ear before the right one, and a sound from in front would arrive at both ears at the same time. Just knowing one direction is insufficient to determine a location if the distance is unknown. However, if both officers share their positions when they heard a gunshot, what directions they were facing, and what directions they estimated that the sound came from, an intersection point can be determined. However, any errors in their reported locations, directions or direction estimations would result in this intersection point being different from the real sound origin.
With more than two officers, any errors instead result in multiple intersections. To address this, the officers might acknowledge that their direction estimations were not perfect and that directions to the left or right of their estimated direction are less probable but still possible. Compiling these estimates instead yields an area of probabilities, where the most probable point can be selected as the estimated audio location.
The implementation in this report works in the same way. Dual microphones on each BWC can be used to determine differences in arrival times, as in the case with a person's ears. Knowing the BWCs' positions and orientations via a Global Positioning System receiver and compass sensor, it is then possible to estimate the sound's location. Importantly, the direction estimations occur within each BWC, instead of sending all audio recordings to one computer and performing both the direction- and location estimations in this computer. This ensures that the implementation does not require potentially sensitive audio recordings to be available to anyone other than the customer.
To evaluate the implementation, three things were investigated: First of all, how accurately does each BWC estimate the direction of sounds? Secondly, assuming several BWCs register the same sound, how accurately can these estimate the location of the sound? And finally, does the accuracy of this estimation increase if more BWCs registered the same sound? These evaluations provide valuable insights into the effectiveness of the system and the potential for being applied in real-world scenarios. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9203989
- author
- Åhlund, Jim LU and Weidow, Eric LU
- supervisor
-
- Ove Edfors LU
- organization
- course
- EITM01 20251
- year
- 2025
- type
- H2 - Master's Degree (Two Years)
- subject
- keywords
- Sound Source Localization (SSL), Direction Estimation, Location Estimation, Body Worn Cameras (BWC), Dual Microphone Arrays, Distributed Microphone Arrays, Time Difference of Arrival (TDOA), Cross-Correlation (CC), Generalized Cross-Correlation (GCC), Phase Transform (PHAT)
- report number
- LU/LTH-EIT 2025-1083
- language
- English
- id
- 9203989
- date added to LUP
- 2025-07-02 14:07:15
- date last changed
- 2025-07-02 14:07:15
@misc{9203989, abstract = {{Audio localization often utilizes arrays of microphones to estimate the origin of sounds, where the recorded audio is usually processed on one central machine. In this thesis, the goal is instead to perform audio localization with dual microphones on distributed body worn cameras, where transmission of recorded audio is not permitted. Therefore, two separate algorithms are created: One performs direction estimations and represents this as metadata, and the other algorithm uses metadata for a sound to estimate the audio location. The direction estimation is implemented with time difference of arrival (TDOA), using cross-correlation (CC) to determine time differences between microphone channels. The TDOA is combined with position, compass and timestamp data to create metadata. The location estimation implements a numerical approach where a grid of probabilities is used to determine the most probable audio location. The concept of virtual cameras is also introduced, where TDOA is done between the timestamps of sounds registered by pairs of cameras to provide additional direction estimations. Several configurations are investigated and compared. The results show that the direction estimation has a high accuracy right in front and behind the camera, but lower accuracy at angles of incidence close to ±90◦, resulting in an average error of 8 − 9◦ irrespective of which CC-method is used. The accuracy of the entire system varies significantly with the configuration used. The configuration with highest accuracy in this report uses cross-correlation with virtual cameras, resulting in an accuracy between 2.3 m and 19.2 m in the different test setups, when using data from six cameras. For this configuration, using six cameras results in the average error being 44% of the error when using only two cameras. In conclusion, the system shows potential of accurate sound source localization but also high volatility as the accuracy can vary significantly depending on the configuration and camera setup.}}, author = {{Åhlund, Jim and Weidow, Eric}}, language = {{eng}}, note = {{Student Paper}}, title = {{Implementing Audio Localization using a Network of Body Worn Cameras}}, year = {{2025}}, }