3D Human Pose and Shape Estimation Based on Parametric Model and Deep Learning

Li, Zhongguo

3D Human Pose and Shape Estimation Based on Parametric Model and Deep Learning

Mark

Li, Zhongguo ^LU (2021)

Abstract: 3D human body reconstruction from monocular images has wide applications in our life, such as movie, animation, Virtual/Augmented Reality, medical research and so on. Due to the high freedom of human body in real scene and the ambiguity of inferring 3D objects from 2D images, it is a challenging task to accurately recover 3D human body models from images. In this thesis, we explore the methods for estimating 3D human body models from images based on parametric model and deep learning.

In the first part, the coarse 3D human body models are estimated automatically from multi-view images based on a parametric human body model called SMPL model. Two routes are exploited for estimating the pose and shape parameters of the SMPL model to... (More); 3D human body reconstruction from monocular images has wide applications in our life, such as movie, animation, Virtual/Augmented Reality, medical research and so on. Due to the high freedom of human body in real scene and the ambiguity of inferring 3D objects from 2D images, it is a challenging task to accurately recover 3D human body models from images. In this thesis, we explore the methods for estimating 3D human body models from images based on parametric model and deep learning.

In the first part, the coarse 3D human body models are estimated automatically from multi-view images based on a parametric human body model called SMPL model. Two routes are exploited for estimating the pose and shape parameters of the SMPL model to obtain the 3D models: (1) Optimization based methods; and (2) Deep learning based methods. For the optimization based methods, we propose the novel energy functions based on some prior information including the 2D joint points and silhouettes. Through minimizing the energy functions, the SMPL model is fitted to the prior information, and then, the coarse 3D human body is obtained. In addition to the traditional optimization based methods, a deep learning based method is also proposed in the following work to regress the pose and shape parameters of the SMPL model. A novel architecture is proposed to put the optimization into a training loop of convolutional neural network (CNN) to form a self-supervision structure based on the multi-view images. The proposed methods are evaluated on both synthetic and real datasets to demonstrate that they can obtain better estimation of the pose and shape of 3D human body than previous approaches.

In the second part, the problem is shifted to the detailed 3D human body reconstruction from multi-view images. Instead of using the SMPL model, implicit function is utilized to represent 3D models because implicit representation can generate continuous surface and has better flexibility for arbitrary topology. Firstly, a multi-scale features based method is proposed to learn the implicit representation for 3D models through multi-stage hourglass networks from multi-view images. Furthermore, a coarse-to-fine method is proposed to refine the 3D models from multi-view images through learning the voxel super-resolution. In this method, the coarse 3D models are estimated firstly by the learned implicit function based on multi-scale features from multi-view images. Afterwards, by voxelizing the coarse 3D models to low resolution voxel grids, voxel super-resolution is learned through a multi-stage 3D CNN for feature extraction from low resolution voxel grids and fully connected neural network for predicting the implicit function. Voxel super-resolution is able to remove the false reconstruction and preserve the surface details. The proposed methods are evaluated on both real and synthetic datasets in which our method can estimate 3D model with higher accuracy and better surface quality than some previous methods. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/1764d01f-7f65-4b43-bc75-022270200da4

author

Li, Zhongguo ^LU

supervisor

Anders Heyden ^LU
Magnus Oskarsson ^LU

opponent

Prof. Heikkilä, Janne, University of Oulu, Finland

organization

publishing date

2021

type

Thesis

publication status

published

subject

Computer graphics and computer vision

keywords

3D Human body, Implicit function, Deep Learning, SMPL model, Parametric model, Pose and shape estimation

pages

179 pages

publisher

Lund University / Centre for Mathematical Sciences /LTH

defense location

Lecture hall MH:Hörmander, Centre of Mathematical Sciences, Sölvegatan 18, Faculty of Engineering LTH, Lund University, Lund. Zoom: https://lu-se.zoom.us/j/64744221051

defense date

2021-05-21 10:00:00

ISBN

978-91-7895-787-3

978-91-7895-788-0

language

English

LU publication?

yes

id

1764d01f-7f65-4b43-bc75-022270200da4

date added to LUP

2021-04-26 11:32:02

date last changed

2025-04-04 15:29:43

@phdthesis{1764d01f-7f65-4b43-bc75-022270200da4,
  abstract     = {{3D human body reconstruction from monocular images has wide applications in our life, such as movie, animation, Virtual/Augmented Reality, medical research and so on. Due to the high freedom of human body in real scene and the ambiguity of inferring 3D objects from 2D images, it is a challenging task to accurately recover 3D human body models from images. In this thesis, we explore the methods for estimating 3D human body models from images based on parametric model and deep learning.<br/><br/>In the first part, the coarse 3D human body models are estimated automatically from multi-view images based on a parametric human body model called SMPL model. Two routes are exploited for estimating the pose and shape parameters of the SMPL model to obtain the 3D models: (1) Optimization based methods; and (2) Deep learning based methods. For the optimization based methods, we propose the novel energy functions based on some prior information including the 2D joint points and silhouettes. Through minimizing the energy functions, the SMPL model is fitted to the prior information, and then, the coarse 3D human body is obtained. In addition to the traditional optimization based methods, a deep learning based method is also proposed in the following work to regress the pose and shape parameters of the SMPL model. A novel architecture is proposed to put the optimization into a training loop of convolutional neural network (CNN) to form a self-supervision structure based on the multi-view images. The proposed methods are evaluated on both synthetic and real datasets to demonstrate that they can obtain better estimation of the pose and shape of 3D human body than previous approaches.<br/><br/>In the second part, the problem is shifted to the detailed 3D human body reconstruction from multi-view images. Instead of using the SMPL model, implicit function is utilized to represent 3D models because implicit representation can generate continuous surface and has better flexibility for arbitrary topology. Firstly, a multi-scale features based method is proposed to learn the implicit representation for 3D models through multi-stage hourglass networks from multi-view images. Furthermore, a coarse-to-fine method is proposed to refine the 3D models from multi-view images through learning the voxel super-resolution. In this method, the coarse 3D models are estimated firstly by the learned implicit function based on multi-scale features from multi-view images. Afterwards, by voxelizing the coarse 3D models to low resolution voxel grids, voxel super-resolution is learned through a multi-stage 3D CNN for feature extraction from low resolution voxel grids and fully connected neural network for predicting the implicit function. Voxel super-resolution is able to remove the false reconstruction and preserve the surface details. The proposed methods are evaluated on both real and synthetic datasets in which our method can estimate 3D model with higher accuracy and better surface quality than some previous methods.}},
  author       = {{Li, Zhongguo}},
  isbn         = {{978-91-7895-787-3}},
  keywords     = {{3D Human body; Implicit function; Deep Learning; SMPL model; Parametric model; Pose and shape estimation}},
  language     = {{eng}},
  publisher    = {{Lund University / Centre for Mathematical Sciences /LTH}},
  school       = {{Lund University}},
  title        = {{3D Human Pose and Shape Estimation Based on Parametric Model and Deep Learning}},
  url          = {{https://lup.lub.lu.se/search/files/97117275/Li_thesis_printed.pdf}},
  year         = {{2021}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

3D Human Pose and Shape Estimation Based on Parametric Model and Deep Learning