Iterated Second-Order Label Sensitive Pooling for 3D Human Pose Estimation
(2014) 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014 p.1661-1668- Abstract
- Recently, the emergence of Kinect systems has demonstrated the benefits of predicting an intermediate body part labeling for 3D human pose estimation, in conjunction with RGB-D imagery. The availability of depth information plays a critical role, so an important question is whether a similar representation can be developed with sufficient robustness in order to estimate 3D pose from RGB images. This paper provides evidence for a positive answer, by leveraging (a) 2D human body part labeling in images, (b) second-order label-sensitive pooling over dynamically computed regions resulting from a hierarchical decomposition of the body, and (c) iterative structured-output modeling to contextualize the process based on 3D pose estimates. For... (More)
- Recently, the emergence of Kinect systems has demonstrated the benefits of predicting an intermediate body part labeling for 3D human pose estimation, in conjunction with RGB-D imagery. The availability of depth information plays a critical role, so an important question is whether a similar representation can be developed with sufficient robustness in order to estimate 3D pose from RGB images. This paper provides evidence for a positive answer, by leveraging (a) 2D human body part labeling in images, (b) second-order label-sensitive pooling over dynamically computed regions resulting from a hierarchical decomposition of the body, and (c) iterative structured-output modeling to contextualize the process based on 3D pose estimates. For robustness and generalization, we take advantage of a recent large-scale 3D human motion capture dataset, Human3.6M[18] that also has human body part labeling annotations available with images. We provide extensive experimental studies where alternative intermediate representations are compared and report a substantial 33% error reduction over competitive discriminative baselines that regress 3D human pose against global HOG features. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/8227403
- author
- Ionescu, Catalin ; Carreira, Joao and Sminchisescu, Cristian LU
- organization
- publishing date
- 2014
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- host publication
- 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- pages
- 1661 - 1668
- publisher
- IEEE - Institute of Electrical and Electronics Engineers Inc.
- conference name
- 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014
- conference location
- Columbus, OH, United States
- conference dates
- 2014-06-23 - 2014-06-28
- external identifiers
-
- wos:000361555601090
- scopus:84911420074
- ISSN
- 1063-6919
- DOI
- 10.1109/CVPR.2014.215
- language
- English
- LU publication?
- yes
- id
- 49f77724-9704-4715-8aba-5bc13ac83206 (old id 8227403)
- date added to LUP
- 2016-04-01 13:09:56
- date last changed
- 2025-04-04 15:10:51
@inproceedings{49f77724-9704-4715-8aba-5bc13ac83206, abstract = {{Recently, the emergence of Kinect systems has demonstrated the benefits of predicting an intermediate body part labeling for 3D human pose estimation, in conjunction with RGB-D imagery. The availability of depth information plays a critical role, so an important question is whether a similar representation can be developed with sufficient robustness in order to estimate 3D pose from RGB images. This paper provides evidence for a positive answer, by leveraging (a) 2D human body part labeling in images, (b) second-order label-sensitive pooling over dynamically computed regions resulting from a hierarchical decomposition of the body, and (c) iterative structured-output modeling to contextualize the process based on 3D pose estimates. For robustness and generalization, we take advantage of a recent large-scale 3D human motion capture dataset, Human3.6M[18] that also has human body part labeling annotations available with images. We provide extensive experimental studies where alternative intermediate representations are compared and report a substantial 33% error reduction over competitive discriminative baselines that regress 3D human pose against global HOG features.}}, author = {{Ionescu, Catalin and Carreira, Joao and Sminchisescu, Cristian}}, booktitle = {{2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}}, issn = {{1063-6919}}, language = {{eng}}, pages = {{1661--1668}}, publisher = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}}, title = {{Iterated Second-Order Label Sensitive Pooling for 3D Human Pose Estimation}}, url = {{http://dx.doi.org/10.1109/CVPR.2014.215}}, doi = {{10.1109/CVPR.2014.215}}, year = {{2014}}, }