Deep Reinforcement Learning for Active Human Pose Estimation

Gärtner, Erik; Pirinen, Aleksis; Sminchisescu, Cristian

Deep Reinforcement Learning for Active Human Pose Estimation

Mark

; Pirinen, Aleksis ^LU and Sminchisescu, Cristian ^LU (2020) 34th AAAI Conference on Artificial Intelligence, AAAI 2020 In Proceedings of the AAAI Conference on Artificial Intelligence 34(07). p.10835-10844

Abstract: Most 3d human pose estimation methods assume that input – be it images of a scene collected from one or several viewpoints, or from a video – is given. Consequently, they focus on estimates leveraging prior knowledge and measurement by fusing information spatially and/or temporally, whenever available. In this paper we address the problem of an active observer with freedom to move and explore the scene spatially – in ‘time-freeze’ mode – and/or temporally, by selecting informative viewpoints that improve its estimation accuracy. Towards this end, we introduce Pose-DRL, a fully trainable deep reinforcement learning-based active pose estimation architecture which learns to select appropriate views, in space and time, to feed an underlying... (More); Most 3d human pose estimation methods assume that input – be it images of a scene collected from one or several viewpoints, or from a video – is given. Consequently, they focus on estimates leveraging prior knowledge and measurement by fusing information spatially and/or temporally, whenever available. In this paper we address the problem of an active observer with freedom to move and explore the scene spatially – in ‘time-freeze’ mode – and/or temporally, by selecting informative viewpoints that improve its estimation accuracy. Towards this end, we introduce Pose-DRL, a fully trainable deep reinforcement learning-based active pose estimation architecture which learns to select appropriate views, in space and time, to feed an underlying monocular pose estimator. We evaluate our model using single- and multi-target estimators with strong result in both settings. Our system further learns automatic stopping conditions in time and transition functions to the next temporal processing step in videos. In extensive experiments with the Panoptic multi-view setup, and for complex scenes containing multiple people, we show that our model learns to select viewpoints that yield significantly more accurate pose estimates compared to strong multi-view baselines. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/58668314-e07a-4641-8480-8c4ad50c8450

author

Gärtner, Erik ^LU

; Pirinen, Aleksis ^LU and Sminchisescu, Cristian ^LU

organization

publishing date

2020-04-03

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

Computer graphics and computer vision

host publication

AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

series title

Proceedings of the AAAI Conference on Artificial Intelligence

volume

34

issue

07

pages

10835 - 10844

publisher

The Association for the Advancement of Artificial Intelligence

conference name

34th AAAI Conference on Artificial Intelligence, AAAI 2020

conference location

New York, United States

conference dates

2020-02-07 - 2020-02-12

external identifiers

scopus:85095322803

ISSN

2159-5399

DOI

10.1609/aaai.v34i07.6714

project

Deep Learning for Understanding Humans

language

English

LU publication?

yes

id

58668314-e07a-4641-8480-8c4ad50c8450

date added to LUP

2021-04-08 10:52:40

date last changed

2025-10-14 20:05:12

@inproceedings{58668314-e07a-4641-8480-8c4ad50c8450,
  abstract     = {{Most 3d human pose estimation methods assume that input – be it images of a scene collected from one or several viewpoints, or from a video – is given. Consequently, they focus on estimates leveraging prior knowledge and measurement by fusing information spatially and/or temporally, whenever available. In this paper we address the problem of an active observer with freedom to move and explore the scene spatially – in ‘time-freeze’ mode – and/or temporally, by selecting informative viewpoints that improve its estimation accuracy. Towards this end, we introduce Pose-DRL, a fully trainable deep reinforcement learning-based active pose estimation architecture which learns to select appropriate views, in space and time, to feed an underlying monocular pose estimator. We evaluate our model using single- and multi-target estimators with strong result in both settings. Our system further learns automatic stopping conditions in time and transition functions to the next temporal processing step in videos. In extensive experiments with the Panoptic multi-view setup, and for complex scenes containing multiple people, we show that our model learns to select viewpoints that yield significantly more accurate pose estimates compared to strong multi-view baselines.}},
  author       = {{Gärtner, Erik and Pirinen, Aleksis and Sminchisescu, Cristian}},
  booktitle    = {{AAAI 2020 - 34th AAAI Conference on Artificial Intelligence}},
  issn         = {{2159-5399}},
  language     = {{eng}},
  month        = {{04}},
  number       = {{07}},
  pages        = {{10835--10844}},
  publisher    = {{The Association for the Advancement of Artificial Intelligence}},
  series       = {{Proceedings of the AAAI Conference on Artificial Intelligence}},
  title        = {{Deep Reinforcement Learning for Active Human Pose Estimation}},
  url          = {{http://dx.doi.org/10.1609/aaai.v34i07.6714}},
  doi          = {{10.1609/aaai.v34i07.6714}},
  volume       = {{34}},
  year         = {{2020}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Deep Reinforcement Learning for Active Human Pose Estimation