Detailed 3D human body reconstruction from multi-view images combining voxel super-resolution and learned implicit representation

Li, Zhongguo; Oskarsson, Magnus; Heyden, Anders

Detailed 3D human body reconstruction from multi-view images combining voxel super-resolution and learned implicit representation

Mark

Li, Zhongguo ^LU ; Oskarsson, Magnus ^LU

and Heyden, Anders ^LU

(2022) In Applied Intelligence 52(6). p.6739-6759

Abstract: The task of reconstructing detailed 3D human body models from images is interesting but challenging in computer vision due to the high freedom of human bodies. This work proposes a coarse-to-fine method to reconstruct detailed 3D human body from multi-view images combining Voxel Super-Resolution (VSR) based on learning the implicit representation. Firstly, the coarse 3D models are estimated by learning an Pixel-aligned Implicit Function based on Multi-scale Features (MF-PIFu) which are extracted by multi-stage hourglass networks from the multi-view images. Then, taking the low resolution voxel grids which are generated by the coarse 3D models as input, the VSR is implemented by learning an implicit function through a multi-stage 3D... (More); The task of reconstructing detailed 3D human body models from images is interesting but challenging in computer vision due to the high freedom of human bodies. This work proposes a coarse-to-fine method to reconstruct detailed 3D human body from multi-view images combining Voxel Super-Resolution (VSR) based on learning the implicit representation. Firstly, the coarse 3D models are estimated by learning an Pixel-aligned Implicit Function based on Multi-scale Features (MF-PIFu) which are extracted by multi-stage hourglass networks from the multi-view images. Then, taking the low resolution voxel grids which are generated by the coarse 3D models as input, the VSR is implemented by learning an implicit function through a multi-stage 3D convolutional neural network. Finally, the refined detailed 3D human body models can be produced by VSR which can preserve the details and reduce the false reconstruction of the coarse 3D models. Benefiting from the implicit representation, the training process in our method is memory efficient and the detailed 3D human body produced by our method from multi-view images is the continuous decision boundary with high-resolution geometry. In addition, the coarse-to-fine method based on MF-PIFu and VSR can remove false reconstructions and preserve the appearance details in the final reconstruction, simultaneously. In the experiments, our method quantitatively and qualitatively achieves the competitive 3D human body models from images with various poses and shapes on both the real and synthetic datasets.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/1df4527b-0827-4f79-b172-63fe94362896

author

Li, Zhongguo ^LU ; Oskarsson, Magnus ^LU

and Heyden, Anders ^LU

organization

publishing date

2022

type

Contribution to journal

publication status

published

subject

Computer graphics and computer vision

keywords

Detailed 3D human body, Implicit representation, Multi-scale features, Multi-view images, Voxel super-resolution

in

Applied Intelligence

volume

52

issue

6

pages

6739 - 6759

publisher

Springer

external identifiers

scopus:85114865019

ISSN

0924-669X

DOI

10.1007/s10489-021-02783-8

language

English

LU publication?

yes

additional info

id

1df4527b-0827-4f79-b172-63fe94362896

date added to LUP

2021-10-12 14:13:18

date last changed

2025-10-14 13:14:37

@article{1df4527b-0827-4f79-b172-63fe94362896,
  abstract     = {{<p>The task of reconstructing detailed 3D human body models from images is interesting but challenging in computer vision due to the high freedom of human bodies. This work proposes a coarse-to-fine method to reconstruct detailed 3D human body from multi-view images combining Voxel Super-Resolution (VSR) based on learning the implicit representation. Firstly, the coarse 3D models are estimated by learning an Pixel-aligned Implicit Function based on Multi-scale Features (MF-PIFu) which are extracted by multi-stage hourglass networks from the multi-view images. Then, taking the low resolution voxel grids which are generated by the coarse 3D models as input, the VSR is implemented by learning an implicit function through a multi-stage 3D convolutional neural network. Finally, the refined detailed 3D human body models can be produced by VSR which can preserve the details and reduce the false reconstruction of the coarse 3D models. Benefiting from the implicit representation, the training process in our method is memory efficient and the detailed 3D human body produced by our method from multi-view images is the continuous decision boundary with high-resolution geometry. In addition, the coarse-to-fine method based on MF-PIFu and VSR can remove false reconstructions and preserve the appearance details in the final reconstruction, simultaneously. In the experiments, our method quantitatively and qualitatively achieves the competitive 3D human body models from images with various poses and shapes on both the real and synthetic datasets.</p>}},
  author       = {{Li, Zhongguo and Oskarsson, Magnus and Heyden, Anders}},
  issn         = {{0924-669X}},
  keywords     = {{Detailed 3D human body; Implicit representation; Multi-scale features; Multi-view images; Voxel super-resolution}},
  language     = {{eng}},
  number       = {{6}},
  pages        = {{6739--6759}},
  publisher    = {{Springer}},
  series       = {{Applied Intelligence}},
  title        = {{Detailed 3D human body reconstruction from multi-view images combining voxel super-resolution and learned implicit representation}},
  url          = {{http://dx.doi.org/10.1007/s10489-021-02783-8}},
  doi          = {{10.1007/s10489-021-02783-8}},
  volume       = {{52}},
  year         = {{2022}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Detailed 3D human body reconstruction from multi-view images combining voxel super-resolution and learned implicit representation