Domes to drones : Self-supervised active triangulation for 3d human pose reconstruction

Pirinen, Aleksis; Gärtner, Erik; Sminchisescu, Cristian

Domes to drones : Self-supervised active triangulation for 3d human pose reconstruction

Mark

Pirinen, Aleksis ^LU ; Gärtner, Erik ^LU

and Sminchisescu, Cristian ^LU (2019) 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 In Advances in Neural Information Processing Systems 32.

Abstract: Existing state-of-the-art estimation systems can detect 2d poses of multiple people in images quite reliably. In contrast, 3d pose estimation from a single image is ill-posed due to occlusion and depth ambiguities. Assuming access to multiple cameras, or given an active system able to position itself to observe the scene from multiple viewpoints, reconstructing 3d pose from 2d measurements becomes well-posed within the framework of standard multi-view geometry. Less clear is what is an informative set of viewpoints for accurate 3d reconstruction, particularly in complex scenes, where people are occluded by others or by scene objects. In order to address the view selection problem in a principled way, we here introduce ACTOR, an active... (More); Existing state-of-the-art estimation systems can detect 2d poses of multiple people in images quite reliably. In contrast, 3d pose estimation from a single image is ill-posed due to occlusion and depth ambiguities. Assuming access to multiple cameras, or given an active system able to position itself to observe the scene from multiple viewpoints, reconstructing 3d pose from 2d measurements becomes well-posed within the framework of standard multi-view geometry. Less clear is what is an informative set of viewpoints for accurate 3d reconstruction, particularly in complex scenes, where people are occluded by others or by scene objects. In order to address the view selection problem in a principled way, we here introduce ACTOR, an active triangulation agent for 3d human pose reconstruction. Our fully trainable agent consists of a 2d pose estimation network (any of which would work) and a deep reinforcement learning-based policy for camera viewpoint selection. The policy predicts observation viewpoints, the number of which varies adaptively depending on scene content, and the associated images are fed to an underlying pose estimator. Importantly, training the policy requires no annotations - given a 2d pose estimator, ACTOR is trained in a self-supervised manner. In extensive evaluations on complex multi-people scenes filmed in a Panoptic dome, under multiple viewpoints, we compare our active triangulation agent to strong multi-view baselines, and show that ACTOR produces significantly more accurate 3d pose reconstructions. We also provide a proof-of-concept experiment indicating the potential of connecting our view selection policy to a physical drone observer.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/e4f85a7c-7116-4f1b-b4a9-87ae4a617fb4

author

Pirinen, Aleksis ^LU ; Gärtner, Erik ^LU

and Sminchisescu, Cristian ^LU

organization

publishing date

2019-01-01

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

Computer graphics and computer vision

host publication

Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

series title

Advances in Neural Information Processing Systems

volume

32

publisher

Curran Associates, Inc

conference name

33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019

conference location

Vancouver, Canada

conference dates

2019-12-08 - 2019-12-14

external identifiers

scopus:85090173697

ISSN

1049-5258

ISBN

9781713807933

language

English

LU publication?

yes

id

e4f85a7c-7116-4f1b-b4a9-87ae4a617fb4

alternative location

https://papers.nips.cc/paper/2019/file/c3e4035af2a1cde9f21e1ae1951ac80b-Paper.pdf

date added to LUP

2020-09-28 11:12:40

date last changed

2025-10-14 11:07:58

@inproceedings{e4f85a7c-7116-4f1b-b4a9-87ae4a617fb4,
  abstract     = {{<p>Existing state-of-the-art estimation systems can detect 2d poses of multiple people in images quite reliably. In contrast, 3d pose estimation from a single image is ill-posed due to occlusion and depth ambiguities. Assuming access to multiple cameras, or given an active system able to position itself to observe the scene from multiple viewpoints, reconstructing 3d pose from 2d measurements becomes well-posed within the framework of standard multi-view geometry. Less clear is what is an informative set of viewpoints for accurate 3d reconstruction, particularly in complex scenes, where people are occluded by others or by scene objects. In order to address the view selection problem in a principled way, we here introduce ACTOR, an active triangulation agent for 3d human pose reconstruction. Our fully trainable agent consists of a 2d pose estimation network (any of which would work) and a deep reinforcement learning-based policy for camera viewpoint selection. The policy predicts observation viewpoints, the number of which varies adaptively depending on scene content, and the associated images are fed to an underlying pose estimator. Importantly, training the policy requires no annotations - given a 2d pose estimator, ACTOR is trained in a self-supervised manner. In extensive evaluations on complex multi-people scenes filmed in a Panoptic dome, under multiple viewpoints, we compare our active triangulation agent to strong multi-view baselines, and show that ACTOR produces significantly more accurate 3d pose reconstructions. We also provide a proof-of-concept experiment indicating the potential of connecting our view selection policy to a physical drone observer.</p>}},
  author       = {{Pirinen, Aleksis and Gärtner, Erik and Sminchisescu, Cristian}},
  booktitle    = {{Advances in Neural Information Processing Systems 32 (NeurIPS 2019)}},
  isbn         = {{9781713807933}},
  issn         = {{1049-5258}},
  language     = {{eng}},
  month        = {{01}},
  publisher    = {{Curran Associates, Inc}},
  series       = {{Advances in Neural Information Processing Systems}},
  title        = {{Domes to drones : Self-supervised active triangulation for 3d human pose reconstruction}},
  url          = {{https://papers.nips.cc/paper/2019/file/c3e4035af2a1cde9f21e1ae1951ac80b-Paper.pdf}},
  volume       = {{32}},
  year         = {{2019}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Domes to drones : Self-supervised active triangulation for 3d human pose reconstruction