Active and Physics-Based Human Pose Reconstruction

Gärtner, Erik

Active and Physics-Based Human Pose Reconstruction

Mark

Gärtner, Erik ^LU

(2023) In Dissertation

Abstract: Perceiving humans is an important and complex problem within computer
vision. Its significance is derived from its numerous applications, such
as human-robot interaction, virtual reality, markerless motion capture,
and human tracking for autonomous driving. The difficulty lies in the
variability in human appearance, physique, and plausible body poses. In
real-world scenes, this is further exacerbated by difficult lighting
conditions, partial occlusions, and the depth ambiguity stemming from
the loss of information during the 3d to 2d projection. Despite these
challenges, significant progress has been made in recent years,
primarily due to the expressive power of deep neural networks trained on
large... (More); Perceiving humans is an important and complex problem within computer
vision. Its significance is derived from its numerous applications, such
as human-robot interaction, virtual reality, markerless motion capture,
and human tracking for autonomous driving. The difficulty lies in the
variability in human appearance, physique, and plausible body poses. In
real-world scenes, this is further exacerbated by difficult lighting
conditions, partial occlusions, and the depth ambiguity stemming from
the loss of information during the 3d to 2d projection. Despite these
challenges, significant progress has been made in recent years,
primarily due to the expressive power of deep neural networks trained on
large datasets. However, creating large-scale datasets with 3d
annotations is expensive, and capturing the vast diversity of the real
world is demanding. Traditionally, 3d ground truth is captured using
motion capture laboratories that require large investments. Furthermore,
many laboratories cannot easily accommodate athletic and dynamic
motions. This thesis studies three approaches to improving visual
perception, with emphasis on human pose estimation, that can complement
improvements to the underlying predictor or training data.

The first two papers present active human pose estimation, where a
reinforcement learning agent is tasked with selecting informative
viewpoints to reconstruct subjects efficiently. The papers discard the
common assumption that the input is given and instead allow the agent to
move to observe subjects from desirable viewpoints, e.g., those which
avoid occlusions and for which the underlying pose estimator has a low
prediction error.

The third paper introduces the task of embodied visual active learning,
which goes further and assumes that the perceptual model is not
pre-trained. Instead, the agent is tasked with exploring its environment
and requesting annotations to refine its visual model. Learning to
explore novel scenarios and efficiently request annotation for new data
is a step towards life-long learning, where models can evolve beyond
what they learned during the initial training phase. We study the
problem for segmentation, though the idea is applicable to other
perception tasks.

Lastly, the final two papers propose improving human pose estimation by
integrating physical constraints. These regularize the reconstructed
motions to be physically plausible and serve as a complement to current
kinematic approaches. Whether a motion has been observed in the training
data or not, the predictions should obey the laws of physics. Through
integration with a physical simulator, we demonstrate that we can reduce
reconstruction artifacts and enforce, e.g., contact constraints. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/439c532f-bc74-4863-9057-74892d32d674

author

Gärtner, Erik ^LU

supervisor

opponent

Docent Khan, Fahad, Linköping University, Sweden.

organization

Mathematics (Faculty of Engineering)

publishing date

2023

type

Thesis

publication status

published

subject

Computer graphics and computer vision

keywords

computer vision, human pose estimation, reinforcement learning, physics-based human pose estimation, active learning

in

Dissertation

issue

70

publisher

Department of Computer Science, Lund University

defense location

Lecture Hall MH:Hörmander, Centre for Mathematical Sciences, Sölvegatan 18, Faculty of Engineering LTH, Lund University, Lund. The dissertation will be live streamed, but part of the premises is to be excludes from the live stream.

defense date

2023-01-13 10:15:00

ISSN

1404-1219

ISBN

978-91-8039-471-0

978-91-8039-472-7

project

Deep Learning for Understanding Humans

language

English

LU publication?

yes

id

439c532f-bc74-4863-9057-74892d32d674

date added to LUP

2022-12-06 14:52:03

date last changed

2025-04-04 14:48:09

@phdthesis{439c532f-bc74-4863-9057-74892d32d674,
  abstract     = {{Perceiving humans is an important and complex problem within computer<br/>vision. Its significance is derived from its numerous applications, such<br/>as human-robot interaction, virtual reality, markerless motion capture,<br/>and human tracking for autonomous driving. The difficulty lies in the<br/>variability in human appearance, physique, and plausible body poses. In<br/>real-world scenes, this is further exacerbated by difficult lighting<br/>conditions, partial occlusions, and the depth ambiguity stemming from<br/>the loss of information during the 3d to 2d projection. Despite these<br/>challenges, significant progress has been made in recent years,<br/>primarily due to the expressive power of deep neural networks trained on<br/>large datasets. However, creating large-scale datasets with 3d<br/>annotations is expensive, and capturing the vast diversity of the real<br/>world is demanding. Traditionally, 3d ground truth is captured using<br/>motion capture laboratories that require large investments. Furthermore,<br/>many laboratories cannot easily accommodate athletic and dynamic<br/>motions. This thesis studies three approaches to improving visual<br/>perception, with emphasis on human pose estimation, that can complement<br/>improvements to the underlying predictor or training data.<br/><br/>The first two papers present active human pose estimation, where a<br/>reinforcement learning agent is tasked with selecting informative<br/>viewpoints to reconstruct subjects efficiently. The papers discard the<br/>common assumption that the input is given and instead allow the agent to<br/>move to observe subjects from desirable viewpoints, e.g., those which<br/>avoid occlusions and for which the underlying pose estimator has a low<br/>prediction error.<br/><br/>The third paper introduces the task of embodied visual active learning,<br/>which goes further and assumes that the perceptual model is not<br/>pre-trained. Instead, the agent is tasked with exploring its environment<br/>and requesting annotations to refine its visual model. Learning to<br/>explore novel scenarios and efficiently request annotation for new data<br/>is a step towards life-long learning, where models can evolve beyond<br/>what they learned during the initial training phase. We study the<br/>problem for segmentation, though the idea is applicable to other<br/>perception tasks.<br/><br/>Lastly, the final two papers propose improving human pose estimation by<br/>integrating physical constraints. These regularize the reconstructed<br/>motions to be physically plausible and serve as a complement to current<br/>kinematic approaches. Whether a motion has been observed in the training<br/>data or not, the predictions should obey the laws of physics. Through<br/>integration with a physical simulator, we demonstrate that we can reduce<br/>reconstruction artifacts and enforce, e.g., contact constraints.}},
  author       = {{Gärtner, Erik}},
  isbn         = {{978-91-8039-471-0}},
  issn         = {{1404-1219}},
  keywords     = {{computer vision; human pose estimation; reinforcement learning; physics-based human pose estimation; active learning}},
  language     = {{eng}},
  number       = {{70}},
  publisher    = {{Department of Computer Science, Lund University}},
  school       = {{Lund University}},
  series       = {{Dissertation}},
  title        = {{Active and Physics-Based Human Pose Reconstruction}},
  url          = {{https://lup.lub.lu.se/search/files/130735662/Gartner_phd_thesis.pdf}},
  year         = {{2023}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Active and Physics-Based Human Pose Reconstruction