Image classification for improving an IR-signal classification in a semi-supervised learning pipeline

Ivanov, Arseni

Image classification for improving an IR-signal classification in a semi-supervised learning pipeline

Mark

Ivanov, Arseni ^LU (2022) In Master's Theses in Mathematical Sciences FMAM05 20221
Mathematics (Faculty of Engineering)

Abstract: A part of a semi-supervised learning pipeline is built to improve the classification
of humans and animals in a home security detector. An AI teacher model that
automatically labels data, which is then in turn used to improve a light-weight AI
student model is created. The AI teacher is trained on existing labelled household
camera image data.
Approaches with leveraged convolutional neural nets and existing object detec-
tion frameworks, YOLO and DETR are tested and benchmarked. An accuracy of
83%, with 48% false positives on new data for human detection is achieved using a
leveraged YOLOv4 model. Other accuracy/false positives trade-offs are also possi-
ble. As a secondary aim, parameters are extracted from the found objects in... (More); A part of a semi-supervised learning pipeline is built to improve the classification
of humans and animals in a home security detector. An AI teacher model that
automatically labels data, which is then in turn used to improve a light-weight AI
student model is created. The AI teacher is trained on existing labelled household
camera image data.
Approaches with leveraged convolutional neural nets and existing object detec-
tion frameworks, YOLO and DETR are tested and benchmarked. An accuracy of
83%, with 48% false positives on new data for human detection is achieved using a
leveraged YOLOv4 model. Other accuracy/false positives trade-offs are also possi-
ble. As a secondary aim, parameters are extracted from the found objects in the
images using computer vision models. A distance estimation with an average error
of 1.15 meters and a direction estimation on 3 possible directions with 59% accuracy
is developed. (Less)
Popular Abstract: Training a teacher AI to automatically annotate training data for a stu-
dent AI that is used inside a camera to detect humans and pets
We explore different machine learning models focused on finding objects in low-
resolution image sequences. The purpose of this is to remove the need of an engineer
spending their time looking at images and marking object locations like cat on top
of cats in thousands of images with small cats. An example of a person being marked
this way can be seen in Figure 1 below. Another reason is to find the situations
where the currently existing student algorithm does not work as expected, describe
those situations using labels, and use them to improve the model. We benchmark
popular models on our data,... (More); Training a teacher AI to automatically annotate training data for a stu-
dent AI that is used inside a camera to detect humans and pets
We explore different machine learning models focused on finding objects in low-
resolution image sequences. The purpose of this is to remove the need of an engineer
spending their time looking at images and marking object locations like cat on top
of cats in thousands of images with small cats. An example of a person being marked
this way can be seen in Figure 1 below. Another reason is to find the situations
where the currently existing student algorithm does not work as expected, describe
those situations using labels, and use them to improve the model. We benchmark
popular models on our data, and find a sufficiently good match for one object type
in our situation. We also extend existing models by taking what they already know,
and training them with a part of our own data ”on-top” of what they already know.
This results in a model with matching capabilities of the benchmarked ones, but
that also leaves the possibility for further improvement if more data is acquired.
We also manage to find the object’s movement direction, speed, and distance from
camera with sufficient accuracy using traditional, non-AI mathematical techniques.
Our goal is to find and automatically label all the images with animals, as they
are the cases that triggered the camera falsely. However, we find that the animals
in the images are of too low resolution to be consistently found in our data. This
makes us reverse the problem, and instead try to find all humans in the data. If
we can find all humans, we can remove the human data, and end up with only the
interesting animal data. Although this will not automatically label the data, it will
reduce the amount of data that the engineer will have to go through. We manage to
develop a model that has 84% detection accuracy for humans images on our data.
Such a high accuracy is however reached by lowering the level of what counts as
a human. If the human has their back to the camera, and the model is only 15%
sure that it’s a human, we still need to accept that situation. This leads to us
getting many false positives, meaning that images without any humans find human-
resembling patterns, and accepting low probabilities means that they are predicted
as humans. Upon removing this 84% of humans, we also lose 48% of the interesting
false positive animal data. We are then left with 16% of the remaining human data
and 52% of the interesting scenarios where the current camera algorithm did not
work as expected. Because of this big loss of interesting data, it is up to the company
to decide if they want to apply the model to automatize the labelling. The object
distance, speed and direction estimation are however found with sufficient accuracy.
This creates an alternative use for the project, where the model could be used to
provide a good guess on the labels, making the work of the engineers a verification
instead of annotation. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9083637

author

Ivanov, Arseni ^LU

supervisor

organization

Mathematics (Faculty of Engineering)

course

FMAM05 20221

year

2022

type

H2 - Master's Degree (Two Years)

subject

Mathematics and Statistics

keywords

object dectection, YOLO, DETR, mAP, alarm, security, machine learning, ai, undistortion, verisure, CNN

publication/series

Master's Theses in Mathematical Sciences

report number

LUTFMA-3470-2022

ISSN

1404-6342

other publication id

2022:E21

language

English

id

9083637

date added to LUP

2022-06-17 17:04:57

date last changed

2022-06-17 17:04:57

@misc{9083637,
  abstract     = {{A part of a semi-supervised learning pipeline is built to improve the classification
of humans and animals in a home security detector. An AI teacher model that
automatically labels data, which is then in turn used to improve a light-weight AI
student model is created. The AI teacher is trained on existing labelled household
camera image data.
Approaches with leveraged convolutional neural nets and existing object detec-
tion frameworks, YOLO and DETR are tested and benchmarked. An accuracy of
83%, with 48% false positives on new data for human detection is achieved using a
leveraged YOLOv4 model. Other accuracy/false positives trade-offs are also possi-
ble. As a secondary aim, parameters are extracted from the found objects in the
images using computer vision models. A distance estimation with an average error
of 1.15 meters and a direction estimation on 3 possible directions with 59% accuracy
is developed.}},
  author       = {{Ivanov, Arseni}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Image classification for improving an IR-signal classification in a semi-supervised learning pipeline}},
  year         = {{2022}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Image classification for improving an IR-signal classification in a semi-supervised learning pipeline