Three-dimensional reconstruction of human interactions
(2020) 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition p.7212-7221- Abstract
Understanding 3d human interactions is fundamental for fine grained scene analysis and behavioural modeling. However, most of the existing models focus on analyzing a single person in isolation, and those who process several people focus largely on resolving multi-person data association, rather than inferring interactions. This may lead to incorrect, lifeless 3d estimates, that miss the subtle human contact aspects–the essence of the event–and are of little use for detailed behavioral understanding. This paper addresses such issues and makes several contributions: (1) we introduce models for interaction signature estimation (ISP) encompassing contact detection, segmentation, and 3d contact signature prediction; (2) we show how such... (More)
Understanding 3d human interactions is fundamental for fine grained scene analysis and behavioural modeling. However, most of the existing models focus on analyzing a single person in isolation, and those who process several people focus largely on resolving multi-person data association, rather than inferring interactions. This may lead to incorrect, lifeless 3d estimates, that miss the subtle human contact aspects–the essence of the event–and are of little use for detailed behavioral understanding. This paper addresses such issues and makes several contributions: (1) we introduce models for interaction signature estimation (ISP) encompassing contact detection, segmentation, and 3d contact signature prediction; (2) we show how such components can be leveraged in order to produce augmented losses that ensure contact consistency during 3d reconstruction; (3) we construct several large datasets for learning and evaluating 3d contact prediction and reconstruction methods; specifically, we introduce CHI3D, a lab-based accurate 3d motion capture dataset with 631 sequences containing 2, 525 contact events, 728, 664 ground truth 3d poses, as well as FlickrCI3D, a dataset of 11, 216 images, with 14, 081 processed pairs of people, and 81, 233 facet-level surface correspondences within 138, 213 selected contact regions. Finally, (4) we present models and baselines to illustrate how contact estimation supports meaningful 3d reconstruction where essential interactions are captured. Models and data are made available for research purposes at http://vision.imar.ro/ci3d.
(Less)
- author
- Fieraru, Mihai ; Zanfir, Mihai ; Oneata, Elisabeta ; Popa, Alin Ionut ; Olaru, Vlad and Sminchisescu, Cristian LU
- organization
- publishing date
- 2020
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- host publication
- 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- series title
- Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
- pages
- 10 pages
- publisher
- IEEE - Institute of Electrical and Electronics Engineers Inc.
- conference name
- 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020
- conference location
- Virtual, Online, United States
- conference dates
- 2020-06-14 - 2020-06-19
- external identifiers
-
- scopus:85094601522
- ISSN
- 1063-6919
- ISBN
- 978-1-7281-7168-5
- DOI
- 10.1109/CVPR42600.2020.00724
- language
- English
- LU publication?
- yes
- id
- 4256fa9d-acf2-41da-9622-571faedda33a
- date added to LUP
- 2020-11-23 08:46:48
- date last changed
- 2022-05-12 07:56:59
@inproceedings{4256fa9d-acf2-41da-9622-571faedda33a, abstract = {{<p>Understanding 3d human interactions is fundamental for fine grained scene analysis and behavioural modeling. However, most of the existing models focus on analyzing a single person in isolation, and those who process several people focus largely on resolving multi-person data association, rather than inferring interactions. This may lead to incorrect, lifeless 3d estimates, that miss the subtle human contact aspects–the essence of the event–and are of little use for detailed behavioral understanding. This paper addresses such issues and makes several contributions: (1) we introduce models for interaction signature estimation (ISP) encompassing contact detection, segmentation, and 3d contact signature prediction; (2) we show how such components can be leveraged in order to produce augmented losses that ensure contact consistency during 3d reconstruction; (3) we construct several large datasets for learning and evaluating 3d contact prediction and reconstruction methods; specifically, we introduce CHI3D, a lab-based accurate 3d motion capture dataset with 631 sequences containing 2, 525 contact events, 728, 664 ground truth 3d poses, as well as FlickrCI3D, a dataset of 11, 216 images, with 14, 081 processed pairs of people, and 81, 233 facet-level surface correspondences within 138, 213 selected contact regions. Finally, (4) we present models and baselines to illustrate how contact estimation supports meaningful 3d reconstruction where essential interactions are captured. Models and data are made available for research purposes at http://vision.imar.ro/ci3d.</p>}}, author = {{Fieraru, Mihai and Zanfir, Mihai and Oneata, Elisabeta and Popa, Alin Ionut and Olaru, Vlad and Sminchisescu, Cristian}}, booktitle = {{2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}}, isbn = {{978-1-7281-7168-5}}, issn = {{1063-6919}}, language = {{eng}}, pages = {{7212--7221}}, publisher = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}}, series = {{Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition}}, title = {{Three-dimensional reconstruction of human interactions}}, url = {{http://dx.doi.org/10.1109/CVPR42600.2020.00724}}, doi = {{10.1109/CVPR42600.2020.00724}}, year = {{2020}}, }