Trajectory Optimization for Physics-Based Reconstruction of 3d Human Pose from Monocular Video

Gärtner, Erik; Andriluka, Mykhaylo; Xu, Hongyi; Sminchisescu, Cristian

Trajectory Optimization for Physics-Based Reconstruction of 3d Human Pose from Monocular Video

Mark

; Andriluka, Mykhaylo ; Xu, Hongyi and Sminchisescu, Cristian ^LU (2022) 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022

Abstract: We focus on the task of estimating a physically plausi-ble articulated human motion from monocular video. Ex-isting approaches that do not consider physics often pro-duce temporally inconsistent output with motion artifacts, while state-of-the-art physics-based approaches have either been shown to work only in controlled laboratory conditions or consider simplified body-ground contact limited to feet. This paper explores how these shortcomings can be addressed by directly incorporating a fully-featured physics engine into the pose estimation process. Given an uncon-trolled, real-world scene as input, our approach estimates the ground-plane location and the dimensions of the physi-cal body model. It then recovers the physical motion by... (More); We focus on the task of estimating a physically plausi-ble articulated human motion from monocular video. Ex-isting approaches that do not consider physics often pro-duce temporally inconsistent output with motion artifacts, while state-of-the-art physics-based approaches have either been shown to work only in controlled laboratory conditions or consider simplified body-ground contact limited to feet. This paper explores how these shortcomings can be addressed by directly incorporating a fully-featured physics engine into the pose estimation process. Given an uncon-trolled, real-world scene as input, our approach estimates the ground-plane location and the dimensions of the physi-cal body model. It then recovers the physical motion by per-forming trajectory optimization. The advantage of our for-mulation is that it readily generalizes to a variety of scenes that might have diverse ground properties and supports any form of self-contact and contact between the articu-lated body and scene geometry. We show that our approach achieves competitive results with respect to existing physics-based methods on the Human3.6M benchmark [13], while being directly applicable without re-training to more complex dynamic motions from the AIST benchmark [36] and to uncontrolled internet videos. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/f64363d1-5923-47d7-a032-bf55daab9391

author

Gärtner, Erik ^LU

; Andriluka, Mykhaylo ; Xu, Hongyi and Sminchisescu, Cristian ^LU

organization

Mathematics (Faculty of Engineering)

publishing date

2022

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

Robotics and automation

host publication

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

publisher

IEEE - Institute of Electrical and Electronics Engineers Inc.

conference name

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022

conference location

New Orleans, United States

conference dates

2022-06-19 - 2022-06-24

external identifiers

scopus:85132477539

ISBN

978-1-6654-6947-0

978-1-6654-6946-3

DOI

10.1109/CVPR52688.2022.01276

language

English

LU publication?

yes

id

f64363d1-5923-47d7-a032-bf55daab9391

date added to LUP

2022-05-06 10:48:26

date last changed

2026-01-11 21:56:01

@inproceedings{f64363d1-5923-47d7-a032-bf55daab9391,
  abstract     = {{We focus on the task of estimating a physically plausi-ble articulated human motion from monocular video. Ex-isting approaches that do not consider physics often pro-duce temporally inconsistent output with motion artifacts, while state-of-the-art physics-based approaches have either been shown to work only in controlled laboratory conditions or consider simplified body-ground contact limited to feet. This paper explores how these shortcomings can be addressed by directly incorporating a fully-featured physics engine into the pose estimation process. Given an uncon-trolled, real-world scene as input, our approach estimates the ground-plane location and the dimensions of the physi-cal body model. It then recovers the physical motion by per-forming trajectory optimization. The advantage of our for-mulation is that it readily generalizes to a variety of scenes that might have diverse ground properties and supports any form of self-contact and contact between the articu-lated body and scene geometry. We show that our approach achieves competitive results with respect to existing physics-based methods on the Human3.6M benchmark [13], while being directly applicable without re-training to more complex dynamic motions from the AIST benchmark [36] and to uncontrolled internet videos.}},
  author       = {{Gärtner, Erik and Andriluka, Mykhaylo and Xu, Hongyi and Sminchisescu, Cristian}},
  booktitle    = {{Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}},
  isbn         = {{978-1-6654-6947-0}},
  language     = {{eng}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  title        = {{Trajectory Optimization for Physics-Based Reconstruction of 3d Human Pose from Monocular Video}},
  url          = {{http://dx.doi.org/10.1109/CVPR52688.2022.01276}},
  doi          = {{10.1109/CVPR52688.2022.01276}},
  year         = {{2022}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Trajectory Optimization for Physics-Based Reconstruction of 3d Human Pose from Monocular Video