Classification of Acoustic Scenes Using Convolutional Neural Networks

Nordin Persson, Colin

Classification of Acoustic Scenes Using Convolutional Neural Networks

Mark

Nordin Persson, Colin (2017) In Master's Theses in Mathematical Sciences FMS820 20171
Mathematical Statistics

Abstract: Minut is a startup company that builds a camera-free home monitor called Point. This thesis is about
investigating the possibilities for Point to be able to use machine learning techniques for classification of
acoustic scenes, in particular to detect if a party is ongoing in the home where Point is located. Machine
learning is a mathematical field that uses data to learn models from which one can – if successful – make
good predictions about the future. The interest in this field, and in particular a type of models called
artificial neural networks has the last few years become massive, the main reason being the recent access to
powerful hardware and lots of data, which has made these models exceptional at certain tasks. Artificial
... (More); Minut is a startup company that builds a camera-free home monitor called Point. This thesis is about
investigating the possibilities for Point to be able to use machine learning techniques for classification of
acoustic scenes, in particular to detect if a party is ongoing in the home where Point is located. Machine
learning is a mathematical field that uses data to learn models from which one can – if successful – make
good predictions about the future. The interest in this field, and in particular a type of models called
artificial neural networks has the last few years become massive, the main reason being the recent access to
powerful hardware and lots of data, which has made these models exceptional at certain tasks. Artificial
neural networks are huge mathematical functions with millions of tunable parameters, which makes them
very flexible. By showing the networks lots of data and specifying which output that is desired, the learning
algorithm of the network is able to learn the mapping between input and output. Convolutional neural

networks is in this thesis used to classify acoustic scenes, this is done by showing the network a time-
frequency representation of audio together with the correct label. One of the built networks, which we call

SlimNet, is a very small network, but yet it is able to distinguish parties from other acoustic scenes with 98
% accuracy. It is also found that the data representation of an acoustic scene does not have to be very large
for a neural network to be able to classify it correctly, which is desired since Point has hardware limitations. (Less)
Popular Abstract: Using Deep Learning to Detect Parties
Deep learning is a sub field of artificial intelligence (AI) and has the
last few years emerged as something that will eventually automate
everything from cars to computer programming. Another field in
which deep learning algorithms shows promising results is in different
kinds of audio recognition, one example is to detect ongoing parties.
The words “deep learning” refers to deep artificial neural networks, which
is a kind of large and complex mathematical function that is inspired by how
the human brain is structured. These deep artificial neural networks are used
to find, or learn, complicated patterns in data.
The Malm ̈o-based startup company Minut makes a smart home sensor called
Point... (More); Using Deep Learning to Detect Parties
Deep learning is a sub field of artificial intelligence (AI) and has the
last few years emerged as something that will eventually automate
everything from cars to computer programming. Another field in
which deep learning algorithms shows promising results is in different
kinds of audio recognition, one example is to detect ongoing parties.
The words “deep learning” refers to deep artificial neural networks, which
is a kind of large and complex mathematical function that is inspired by how
the human brain is structured. These deep artificial neural networks are used
to find, or learn, complicated patterns in data.
The Malm ̈o-based startup company Minut makes a smart home sensor called
Point that is able to measure and detect events in home environments that the
home owner might want to know about.
Some owners of the Point device rent out their homes and uses the device
to make sure that there are no violations against the rental agreement. One
common part of such an agreement is that no party is allowed in the rented
home. This is why Minut sees a need for having a party detection algorithm on
their device.
The party detection algorithm needs a couple of seconds of recorded audio
to tell whether it was recorded from a party environment or not. The recorded
audio snippet is first divided into a number of frames of equal length, then the
sound wave frequencies that are present in each of the frames are calculated.
This gives a spectrogram of the audio clip, the spectrogram is an image that
shows how the frequency content of the audio snippet varies over time. The
deep neural network then looks at this image and makes a decision if it comes
from a party or not based on thousands of spectrograms it has been training
on.

Training neural networks like this means showing them thousands of spec-
trograms and at the same time telling them the correct answer, i.e if they did

come from parties or not. In this way, the network will learn to distinguish the
“Party”-spectrograms from the “No party”-spectrograms.
The deep learning algorithm is able to give the correct answer 98% of the
time, this is however just on a very limited amount of test recordings. Time
will tell if party craving people will actually have to start worry about the AI
police. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/8924630

author

Nordin Persson, Colin

supervisor

Andreas Jakobsson ^LU

organization

Mathematical Statistics

course

FMS820 20171

year

2017

type

H2 - Master's Degree (Two Years)

subject

Mathematics and Statistics

publication/series

Master's Theses in Mathematical Sciences

report number

LUTFMS-3333-2017

ISSN

1404-6342

other publication id

2017:E58

language

English

id

8924630

date added to LUP

2017-09-04 14:07:03

date last changed

2024-09-26 16:41:17

@misc{8924630,
  abstract     = {{Minut is a startup company that builds a camera-free home monitor called Point. This thesis is about
investigating the possibilities for Point to be able to use machine learning techniques for classification of
acoustic scenes, in particular to detect if a party is ongoing in the home where Point is located. Machine
learning is a mathematical field that uses data to learn models from which one can – if successful – make
good predictions about the future. The interest in this field, and in particular a type of models called
artificial neural networks has the last few years become massive, the main reason being the recent access to
powerful hardware and lots of data, which has made these models exceptional at certain tasks. Artificial
neural networks are huge mathematical functions with millions of tunable parameters, which makes them
very flexible. By showing the networks lots of data and specifying which output that is desired, the learning
algorithm of the network is able to learn the mapping between input and output. Convolutional neural

networks is in this thesis used to classify acoustic scenes, this is done by showing the network a time-
frequency representation of audio together with the correct label. One of the built networks, which we call

SlimNet, is a very small network, but yet it is able to distinguish parties from other acoustic scenes with 98
% accuracy. It is also found that the data representation of an acoustic scene does not have to be very large
for a neural network to be able to classify it correctly, which is desired since Point has hardware limitations.}},
  author       = {{Nordin Persson, Colin}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Classification of Acoustic Scenes Using Convolutional Neural Networks}},
  year         = {{2017}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Classification of Acoustic Scenes Using Convolutional Neural Networks