Enhancing person re-identification: leveraging DensePose for improving occlusion handling and generalization

Elwin, Björn; Fredriksson, Anton

Enhancing person re-identification: leveraging DensePose for improving occlusion handling and generalization

Mark

Elwin, Björn ^LU and Fredriksson, Anton ^LU (2023) In Master’s Theses in Mathematical Sciences FMAM05 20231
Mathematics (Faculty of Engineering)

Abstract: In this master’s thesis we propose a DensePose-based person re-identification
(re-ID) machine learning algorithm building upon previous research on this
topic. DensePose, a deep neural network that performs human body part segmentation on images, forms the foundation of our approach. We investigate
whether utilization of DensePose can enhance performance on re-ID algorithms with the utilization of several different loss functions. Furthermore,
we examine if the segmentation can be of benefit when dealing with occluded
data samples. Our model uses DensePose as regularization through exploitation of the densely semantically aligned body part images (DSAP-images) the
segmentation network provides. We adapt terminology from previous work... (More); In this master’s thesis we propose a DensePose-based person re-identification
(re-ID) machine learning algorithm building upon previous research on this
topic. DensePose, a deep neural network that performs human body part segmentation on images, forms the foundation of our approach. We investigate
whether utilization of DensePose can enhance performance on re-ID algorithms with the utilization of several different loss functions. Furthermore,
we examine if the segmentation can be of benefit when dealing with occluded
data samples. Our model uses DensePose as regularization through exploitation of the densely semantically aligned body part images (DSAP-images) the
segmentation network provides. We adapt terminology from previous work
and use two deep convolutional neural network streams, a main full image
stream (MF-stream) which processes original images of the dataset, and a
densely semantically aligned guiding stream (DSAG-stream) which processes
the DSAP-images. The DSAG-stream is utilized as a regularizing stream
which helps training the MF-stream in learning relevant local features in the
full images. In the inference, the DSAG-stream is discarded, allowing the
MF-stream to independently evaluate on the test data. All model training
and testing is conducted on the Market-1501 dataset and our best performing
model (which uses a linear combination of triplet loss, ID loss and center
loss) obtains a CMC-Rank 1 score of 91.4 % and a mAP score of 78.1 %.
Our DensePose-based model is able to increase performance on re-ID in
comparison to similar non-DensePose-based models. It does however perform
worse on occluded samples but demonstrates significant potential in terms of
generalization abilities when applied to unfamiliar data. (Less)
Popular Abstract: Unveiling the Secrets of AI: How DensePose, a body segmenting neural network, could revolutionize person re-identification. Discover the
power of a DensePose-guided convolutional neural network designed
to enhance accuracy and adaptability in person re-identification, unlocking new frontiers in identity recognition generalization. Learn
how the model is improved through a regularizing DensePose module and a clever loss design. The best-performing model in this thesis
achieves an impressive score of 91.4% correct predictions when identifying identities in the Market-1501 dataset.

Person re-identification, or ”re-ID,” is a popular task in image analysis, wherein
each person is assigned an identity. The goal of this task is to... (More); Unveiling the Secrets of AI: How DensePose, a body segmenting neural network, could revolutionize person re-identification. Discover the
power of a DensePose-guided convolutional neural network designed
to enhance accuracy and adaptability in person re-identification, unlocking new frontiers in identity recognition generalization. Learn
how the model is improved through a regularizing DensePose module and a clever loss design. The best-performing model in this thesis
achieves an impressive score of 91.4% correct predictions when identifying identities in the Market-1501 dataset.

Person re-identification, or ”re-ID,” is a popular task in image analysis, wherein
each person is assigned an identity. The goal of this task is to determine whether
a person in a new image belongs to an identity that has been previously encountered or not. It is common to solve these types of problems using different
machine learning algorithms, primarily convolutional neural networks (CNN).
A CNN is a type of deep neural network used in image analysis and pattern
recognition. Convolutional neural networks usually contain many, often millions
of trainable parameters (or weights). These weights help process the pixels of
the input images to produce compact vector representations of the samples. To
optimize a CNN, one uses some sort of loss function to quantify an error. In
classification and re-identification, this error is usually based on a comparison
between the network’s predictions and the correct labels of the inputs. The
primary tasks of the loss functions are to group objects of the same class while
separating them from samples of other classes. The model weights are updated
to minimize the error provided by the loss function. By iteratively performing
this process of loss calculation and weight updates, the model gets trained to
perform its task. There exist many different loss functions with the purpose of
grouping and separating samples in different ways.
This thesis is written with the objective to investigate how the performance
of re-identification can be improved through the utilization of DensePose. We
delve deeper into how the re-identification on occluded samples is affected by
DensePose and how well DensePose-based re-identification models can generalize to diverse data in comparison to non-DensePose-based re-identification
models. Furthermore, we investigate several different loss functions and how
1
they can be combined to produce an even better-trained model.
Our first suggestion is to utilize another CNN-based algorithm called DensePose. DensePose is a neural network created by Facebook AI to extract crops
of body parts from people in images. These crops can be used as extra input
information to better re-identify a person. One of the main benefits of using
these crops is that background noise is removed, and the body parts are aligned
between samples. This lets the network focus on the relevant features in the
samples. We utilize these crops by having two CNN streams (parallel networks,
working with separate inputs) within our model. One stream processes the
original sample images, while the other stream processes the body part crops.
The output from the second stream is then used as extra information in the
loss calculations to guide the training of the first stream by regularizing the
optimization of the network weights.
Our second suggestion is to combine different loss functions to achieve distinct
vector representations for different identities. Some loss functions facilitate the
separation of samples from different classes and clustering of samples within
the same class, while others focus on predicting the identity of a given sample.
Combining these functions may help to group intra-class samples and separate
inter-class samples more efficiently.
In this thesis, we reach several interesting conclusions. DensePose seems to improve the re-identification process, and on average, the results show an increased
rate of correct predictions by a couple of percentage points. However, DensePose does not contribute to an improved algorithm when handling occluded
data samples; rather, the opposite is observed. This might be due to the fact
that DensePose regularizes the re-identification network to expect non-occluded
data samples as its input (DensePose proves weak when segmenting occluded
humans). On the other hand, DensePose removes a lot of irrelevant background
information from each sample. This yields a model that focuses more on the
relevant features within each image, ultimately leading to better generalization
abilities for the model, allowing it to re-identify more efficiently when exposed
for image samples with a more diverse range of background. It is evident by
the results in this thesis that combining loss functions that optimize in different
ways yields an overall better performance compared to only using a single loss
function.
This is the beauty of science, the contributions of many can combined be the
best solution. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9121265

author

Elwin, Björn ^LU and Fredriksson, Anton ^LU

supervisor

Karl Åström ^LU

organization

Mathematics (Faculty of Engineering)

alternative title

Improving re-identification using DensePose

course

FMAM05 20231

year

2023

type

H2 - Master's Degree (Two Years)

subject

Mathematics and Statistics

publication/series

Master’s Theses in Mathematical Sciences

report number

LUTFMA-3501-2023

ISSN

1404-6342

other publication id

2023:E20

language

English

id

9121265

date added to LUP

2023-06-27 09:52:39

date last changed

2023-06-27 09:52:39

@misc{9121265,
  abstract     = {{In this master’s thesis we propose a DensePose-based person re-identification
(re-ID) machine learning algorithm building upon previous research on this
topic. DensePose, a deep neural network that performs human body part segmentation on images, forms the foundation of our approach. We investigate
whether utilization of DensePose can enhance performance on re-ID algorithms with the utilization of several different loss functions. Furthermore,
we examine if the segmentation can be of benefit when dealing with occluded
data samples. Our model uses DensePose as regularization through exploitation of the densely semantically aligned body part images (DSAP-images) the
segmentation network provides. We adapt terminology from previous work
and use two deep convolutional neural network streams, a main full image
stream (MF-stream) which processes original images of the dataset, and a
densely semantically aligned guiding stream (DSAG-stream) which processes
the DSAP-images. The DSAG-stream is utilized as a regularizing stream
which helps training the MF-stream in learning relevant local features in the
full images. In the inference, the DSAG-stream is discarded, allowing the
MF-stream to independently evaluate on the test data. All model training
and testing is conducted on the Market-1501 dataset and our best performing
model (which uses a linear combination of triplet loss, ID loss and center
loss) obtains a CMC-Rank 1 score of 91.4 % and a mAP score of 78.1 %.
Our DensePose-based model is able to increase performance on re-ID in
comparison to similar non-DensePose-based models. It does however perform
worse on occluded samples but demonstrates significant potential in terms of
generalization abilities when applied to unfamiliar data.}},
  author       = {{Elwin, Björn and Fredriksson, Anton}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master’s Theses in Mathematical Sciences}},
  title        = {{Enhancing person re-identification: leveraging DensePose for improving occlusion handling and generalization}},
  year         = {{2023}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Enhancing person re-identification: leveraging DensePose for improving occlusion handling and generalization