3D Human Pose and Shape Estimation Through Collaborative Learning and Multi-View Model-Fitting

Li, Zhongguo; Oskarsson, Magnus; Heyden, Anders

3D Human Pose and Shape Estimation Through Collaborative Learning and Multi-View Model-Fitting

Mark

Li, Zhongguo ^LU ; Oskarsson, Magnus ^LU

and Heyden, Anders ^LU

(2021) 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) In IEEE Winter Conference on Applications of Computer Vision (WACV) p.1887-1896

Abstract: 3D human pose and shape estimation plays a vital role in many computer vision applications. There are many deep learning based methods attempting to solve the problem only relying on single-view RGB images for training the network. However, since some public datasets are captured from multi-view cameras system, we propose a novel method to tackle the problem by putting optimization-based multi-view model-fitting into a regression-based learning loop from multi-view images. Firstly, a convolutional neural network (CNN) regresses the pose and shape of a parametric human body model (SMPL) from multi-view images. Then, utilizing the regressed pose and shape as initialization, we propose an improved multi-view optimization method based on the... (More); 3D human pose and shape estimation plays a vital role in many computer vision applications. There are many deep learning based methods attempting to solve the problem only relying on single-view RGB images for training the network. However, since some public datasets are captured from multi-view cameras system, we propose a novel method to tackle the problem by putting optimization-based multi-view model-fitting into a regression-based learning loop from multi-view images. Firstly, a convolutional neural network (CNN) regresses the pose and shape of a parametric human body model (SMPL) from multi-view images. Then, utilizing the regressed pose and shape as initialization, we propose an improved multi-view optimization method based on the SMPLify method (MV-SMPLify) to fit the SMPL model to the multi-view images simultaneously. Subsequently, the optimized parameters can be adopted to supervise the training of the CNN model. This whole process forms a self-supervising framework which can combine the advantages of the CNN approach and the optimization-based approach through a collaborative process. In addition, the multi-view images can provide more comprehensive supervision for the training. Experiments on public datasets qualitatively and quantitatively demonstrate that our method outperforms previous approaches in a number of ways. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/b3bd685a-6af5-4bf7-8d68-a8e211dddb8d

author

Li, Zhongguo ^LU ; Oskarsson, Magnus ^LU

and Heyden, Anders ^LU

organization

publishing date

2021

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

Computer graphics and computer vision

host publication

WACV - IEEE Winter Conference on Applications of Computer Vision

series title

IEEE Winter Conference on Applications of Computer Vision (WACV)

pages

10 pages

publisher

IEEE Computer Society

conference name

2021 IEEE Winter Conference on Applications of Computer Vision (WACV)

conference location

Waikoloa, United States

conference dates

2021-01-03 - 2021-01-08

external identifiers

scopus:85116152775

ISSN

2642-9381

ISBN

978-1-6654-0477-8

978-1-6654-4640-2

DOI

10.1109/WACV48630.2021.00193

language

English

LU publication?

yes

id

b3bd685a-6af5-4bf7-8d68-a8e211dddb8d

date added to LUP

2021-04-26 04:07:45

date last changed

2025-10-14 09:06:39

@inproceedings{b3bd685a-6af5-4bf7-8d68-a8e211dddb8d,
  abstract     = {{3D human pose and shape estimation plays a vital role in many computer vision applications. There are many deep learning based methods attempting to solve the problem only relying on single-view RGB images for training the network. However, since some public datasets are captured from multi-view cameras system, we propose a novel method to tackle the problem by putting optimization-based multi-view model-fitting into a regression-based learning loop from multi-view images. Firstly, a convolutional neural network (CNN) regresses the pose and shape of a parametric human body model (SMPL) from multi-view images. Then, utilizing the regressed pose and shape as initialization, we propose an improved multi-view optimization method based on the SMPLify method (MV-SMPLify) to fit the SMPL model to the multi-view images simultaneously. Subsequently, the optimized parameters can be adopted to supervise the training of the CNN model. This whole process forms a self-supervising framework which can combine the advantages of the CNN approach and the optimization-based approach through a collaborative process. In addition, the multi-view images can provide more comprehensive supervision for the training. Experiments on public datasets qualitatively and quantitatively demonstrate that our method outperforms previous approaches in a number of ways.}},
  author       = {{Li, Zhongguo and Oskarsson, Magnus and Heyden, Anders}},
  booktitle    = {{WACV - IEEE Winter Conference on Applications of Computer Vision}},
  isbn         = {{978-1-6654-0477-8}},
  issn         = {{2642-9381}},
  language     = {{eng}},
  pages        = {{1887--1896}},
  publisher    = {{IEEE Computer Society}},
  series       = {{IEEE Winter Conference on Applications of Computer Vision (WACV)}},
  title        = {{3D Human Pose and Shape Estimation Through Collaborative Learning and Multi-View Model-Fitting}},
  url          = {{http://dx.doi.org/10.1109/WACV48630.2021.00193}},
  doi          = {{10.1109/WACV48630.2021.00193}},
  year         = {{2021}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

3D Human Pose and Shape Estimation Through Collaborative Learning and Multi-View Model-Fitting