Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Gunshot Detection from Audio Streams in Portable Devices

Bokelund, Linnea LU and Grane, Ellen LU (2022) EITM01 20221
Department of Electrical and Information Technology
Abstract
Machine learning and artificial neural networks can be used to classify or detect specific sound events in audio signals. Gunshot detection is one use case for such networks and can be used to help law enforcement by alerting officers or triggering camera recordings.

However, artificial neural networks with a high performance usually require large amounts of computational power, meaning that they do not work on smaller portable devices. This thesis shows that a small convolutional neural network (CNN) can be used for real-time gunshot detection on a portable camera without requiring too much memory, battery consumption, or CPU power.

We implemented a CNN with four layers and 100k trainable parameters to detect gunshots. We could... (More)
Machine learning and artificial neural networks can be used to classify or detect specific sound events in audio signals. Gunshot detection is one use case for such networks and can be used to help law enforcement by alerting officers or triggering camera recordings.

However, artificial neural networks with a high performance usually require large amounts of computational power, meaning that they do not work on smaller portable devices. This thesis shows that a small convolutional neural network (CNN) can be used for real-time gunshot detection on a portable camera without requiring too much memory, battery consumption, or CPU power.

We implemented a CNN with four layers and 100k trainable parameters to detect gunshots. We could reach an average precision of 0.98 and an F1 score of 0.95. We benchmarked the runtime performance of this architecture on the Axis Body Worn Cameras (BWCs). For real-time gunshot detection, our system uses 11.9 MB RAM and requires 4.9 MB persistent memory; it decreases the battery time by only 8.4% and uses approximately 11.5% of the CPU. With our configuration, the real-time detection has a latency of 3.6 seconds on the BWC.

The results of our Master’s thesis show that audio-based gunshot detection on portable devices is indeed viable. We hope it will encourage the research on simpler features for audio classification. (Less)
Popular Abstract
When a threatening situation suddenly arises, it may be difficult to quickly get an opportunity to alert the authorities. During such circumstances, it would be beneficial if a nearby electronic device could be set to automatically detect the threat, and trigger appropriate actions to get help.

In this master thesis, we focus on detecting gunshot sounds on the Axis Body Worn Camera (BWC) that is used by police officers and guards. It is not hard to imagine that in a situation where shots are fired, the police officer does not have the possibility to start a recording manually. If the camera could identify gunshots, it could trigger a recording automatically and thus ensure that evidence is captured. The BWC has a prebuffer functionality... (More)
When a threatening situation suddenly arises, it may be difficult to quickly get an opportunity to alert the authorities. During such circumstances, it would be beneficial if a nearby electronic device could be set to automatically detect the threat, and trigger appropriate actions to get help.

In this master thesis, we focus on detecting gunshot sounds on the Axis Body Worn Camera (BWC) that is used by police officers and guards. It is not hard to imagine that in a situation where shots are fired, the police officer does not have the possibility to start a recording manually. If the camera could identify gunshots, it could trigger a recording automatically and thus ensure that evidence is captured. The BWC has a prebuffer functionality that allows the last 90 seconds before a recording is started to be included in the resulting video, meaning that the events leading up to the gunshot will also be included.

Gunshot detection has been done before, but often the detection is performed on a powerful device such as a server. Our target device is a small, portable camera with limited battery life, CPU and memory. Therefore, our main challenge was to find a solution that uses as little energy and memory as possible, while at the same time identifies gunshots with a high accuracy.

Within machine learning, image recognition has been researched and developed widely over the last decades. Recognition of particular sounds in audio recordings has not been explored to the same extent. However, by transforming the audio signals into spectrograms that can be viewed as image representations of the signals, the advances in image recognition can be applied to sound identification as well. A common technique used for image recognition is convolutional neural networks (CNNs), and this is what we use for our gunshot detection.

A very important aspect of machine learning is the data. Regardless of the chosen technique, the model will need a lot of data to train on. In our case, we needed data in the form of sound recordings containing gunshots. We also needed other sounds, for the model to learn what a gunshot does not sound like. We used a public dataset called FSD50K containing tens of thousands of clips with sounds of people, traffic, animals, and much more. However, this dataset did not have enough gunshot sounds. Therefore, we recorded our own dataset in cooperation with a local pistol club.

By implementing a small CNN that we trained on the FSD50K dataset and our gunshot sounds, we were able to create a prototype that works on the BWC. With a smaller CNN, the computations needed to analyze one audio clip is reduced and thereby also the energy consumption. We found that to detect gunshot sounds in audio signals, a small CNN is sufficient, and thus our conclusion is that it is indeed viable to implement a machine learning gunshot detection on a small device such as the BWC. (Less)
Please use this url to cite or link to this publication:
author
Bokelund, Linnea LU and Grane, Ellen LU
supervisor
organization
course
EITM01 20221
year
type
H2 - Master's Degree (Two Years)
subject
keywords
machine learning, gunshot detection, convolutional neural networks, sound event detection, portable devices
report number
LU/LTH-EIT 2022-873
language
English
id
9090317
date added to LUP
2022-06-21 10:40:45
date last changed
2022-06-21 10:40:45
@misc{9090317,
  abstract     = {{Machine learning and artificial neural networks can be used to classify or detect specific sound events in audio signals. Gunshot detection is one use case for such networks and can be used to help law enforcement by alerting officers or triggering camera recordings.

However, artificial neural networks with a high performance usually require large amounts of computational power, meaning that they do not work on smaller portable devices. This thesis shows that a small convolutional neural network (CNN) can be used for real-time gunshot detection on a portable camera without requiring too much memory, battery consumption, or CPU power.

We implemented a CNN with four layers and 100k trainable parameters to detect gunshots. We could reach an average precision of 0.98 and an F1 score of 0.95. We benchmarked the runtime performance of this architecture on the Axis Body Worn Cameras (BWCs). For real-time gunshot detection, our system uses 11.9 MB RAM and requires 4.9 MB persistent memory; it decreases the battery time by only 8.4% and uses approximately 11.5% of the CPU. With our configuration, the real-time detection has a latency of 3.6 seconds on the BWC.

The results of our Master’s thesis show that audio-based gunshot detection on portable devices is indeed viable. We hope it will encourage the research on simpler features for audio classification.}},
  author       = {{Bokelund, Linnea and Grane, Ellen}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Gunshot Detection from Audio Streams in Portable Devices}},
  year         = {{2022}},
}