Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

3D Human Pose and Shape Estimation Through Collaborative Learning and Multi-View Model-Fitting

Li, Zhongguo LU ; Oskarsson, Magnus LU orcid and Heyden, Anders LU (2021) 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) In IEEE Winter Conference on Applications of Computer Vision (WACV) p.1887-1896
Abstract
3D human pose and shape estimation plays a vital role in many computer vision applications. There are many deep learning based methods attempting to solve the problem only relying on single-view RGB images for training the network. However, since some public datasets are captured from multi-view cameras system, we propose a novel method to tackle the problem by putting optimization-based multi-view model-fitting into a regression-based learning loop from multi-view images. Firstly, a convolutional neural network (CNN) regresses the pose and shape of a parametric human body model (SMPL) from multi-view images. Then, utilizing the regressed pose and shape as initialization, we propose an improved multi-view optimization method based on the... (More)
3D human pose and shape estimation plays a vital role in many computer vision applications. There are many deep learning based methods attempting to solve the problem only relying on single-view RGB images for training the network. However, since some public datasets are captured from multi-view cameras system, we propose a novel method to tackle the problem by putting optimization-based multi-view model-fitting into a regression-based learning loop from multi-view images. Firstly, a convolutional neural network (CNN) regresses the pose and shape of a parametric human body model (SMPL) from multi-view images. Then, utilizing the regressed pose and shape as initialization, we propose an improved multi-view optimization method based on the SMPLify method (MV-SMPLify) to fit the SMPL model to the multi-view images simultaneously. Subsequently, the optimized parameters can be adopted to supervise the training of the CNN model. This whole process forms a self-supervising framework which can combine the advantages of the CNN approach and the optimization-based approach through a collaborative process. In addition, the multi-view images can provide more comprehensive supervision for the training. Experiments on public datasets qualitatively and quantitatively demonstrate that our method outperforms previous approaches in a number of ways. (Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
host publication
WACV - IEEE Winter Conference on Applications of Computer Vision
series title
IEEE Winter Conference on Applications of Computer Vision (WACV)
pages
10 pages
publisher
IEEE Computer Society
conference name
2021 IEEE Winter Conference on Applications of Computer Vision (WACV)
conference location
Waikoloa, United States
conference dates
2021-01-03 - 2021-01-08
external identifiers
  • scopus:85116152775
ISSN
2642-9381
ISBN
978-1-6654-4640-2
978-1-6654-0477-8
DOI
10.1109/WACV48630.2021.00193
language
English
LU publication?
yes
id
b3bd685a-6af5-4bf7-8d68-a8e211dddb8d
date added to LUP
2021-04-26 04:07:45
date last changed
2022-12-01 07:52:06
@inproceedings{b3bd685a-6af5-4bf7-8d68-a8e211dddb8d,
  abstract     = {{3D human pose and shape estimation plays a vital role in many computer vision applications. There are many deep learning based methods attempting to solve the problem only relying on single-view RGB images for training the network. However, since some public datasets are captured from multi-view cameras system, we propose a novel method to tackle the problem by putting optimization-based multi-view model-fitting into a regression-based learning loop from multi-view images. Firstly, a convolutional neural network (CNN) regresses the pose and shape of a parametric human body model (SMPL) from multi-view images. Then, utilizing the regressed pose and shape as initialization, we propose an improved multi-view optimization method based on the SMPLify method (MV-SMPLify) to fit the SMPL model to the multi-view images simultaneously. Subsequently, the optimized parameters can be adopted to supervise the training of the CNN model. This whole process forms a self-supervising framework which can combine the advantages of the CNN approach and the optimization-based approach through a collaborative process. In addition, the multi-view images can provide more comprehensive supervision for the training. Experiments on public datasets qualitatively and quantitatively demonstrate that our method outperforms previous approaches in a number of ways.}},
  author       = {{Li, Zhongguo and Oskarsson, Magnus and Heyden, Anders}},
  booktitle    = {{WACV - IEEE Winter Conference on Applications of Computer Vision}},
  isbn         = {{978-1-6654-4640-2}},
  issn         = {{2642-9381}},
  language     = {{eng}},
  pages        = {{1887--1896}},
  publisher    = {{IEEE Computer Society}},
  series       = {{IEEE Winter Conference on Applications of Computer Vision (WACV)}},
  title        = {{3D Human Pose and Shape Estimation Through Collaborative Learning and Multi-View Model-Fitting}},
  url          = {{http://dx.doi.org/10.1109/WACV48630.2021.00193}},
  doi          = {{10.1109/WACV48630.2021.00193}},
  year         = {{2021}},
}