Seg2Pose: Pose Estimations from Instance Segmentation Masks in One or Multiple Views for Traffic Applications

Ahrnbom, Martin; Persson, Ivar; Nilsson, Mikael

Seg2Pose: Pose Estimations from Instance Segmentation Masks in One or Multiple Views for Traffic Applications

Mark

Ahrnbom, Martin ^LU

; Persson, Ivar ^LU and Nilsson, Mikael ^LU

(2022) 17th International Conference on Computer Vision Theory and Applications, VISAPP 2022
5. p.777-784

Abstract: A system we denote Seg2Pose is presented which converts pixel coordinate tracks, represented by instance segmentation masks across multiple video frames, into world coordinate pose tracks, for road users seen by static surveillance cameras. The road users are bound to a ground surface represented by a number of 3D points and does not necessarily have to be perfectly flat. The system works with one or more views, by using a late fusion scheme. An approximate position, denoted the normal position, is computed from the camera calibration, per-class default heights and the ground surface model. The position is then refined a novel Convolutional Neural Network we denote Seg2PoseNet, taking instance segmentations and cropping positioning as its... (More); A system we denote Seg2Pose is presented which converts pixel coordinate tracks, represented by instance segmentation masks across multiple video frames, into world coordinate pose tracks, for road users seen by static surveillance cameras. The road users are bound to a ground surface represented by a number of 3D points and does not necessarily have to be perfectly flat. The system works with one or more views, by using a late fusion scheme. An approximate position, denoted the normal position, is computed from the camera calibration, per-class default heights and the ground surface model. The position is then refined a novel Convolutional Neural Network we denote Seg2PoseNet, taking instance segmentations and cropping positioning as its input. We evaluate this system quantitatively both on synthetic data from CARLA Simulator and on a real recording from a trinocular camera. The system outperforms the baseline method of only using the normal positions, which is roughly equivalent of a typical 2D to 3D conversion system, in both datasets. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/758b9f34-9b6a-49a3-b0c9-b78118b1b442

author

Ahrnbom, Martin ^LU

; Persson, Ivar ^LU and Nilsson, Mikael ^LU

organization

publishing date

2022-02-16

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

keywords

Pose Estimation, Instance Segmentation, Convolutional Neural Network, Traffic Safety, Road Users, Tracking, Stereo Camera, Trinocular Camera Array, Traffic Surveillance

host publication

Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications: VISAPP

volume

5

pages

8 pages

publisher

SciTePress

conference name

17th International Conference on Computer Vision Theory and Applications, VISAPP 2022<br/>

conference location

Virtual, Online Streaming

conference dates

2022-02-06 - 2022-02-08

external identifiers

scopus:85143899640

ISBN

978-989-758-555-5

DOI

10.5220/0010777700003124

language

Swedish

LU publication?

yes

id

758b9f34-9b6a-49a3-b0c9-b78118b1b442

date added to LUP

2022-02-16 13:13:32

date last changed

2025-11-03 09:55:08

@inproceedings{758b9f34-9b6a-49a3-b0c9-b78118b1b442,
  abstract     = {{A system we denote Seg2Pose is presented which converts pixel coordinate tracks, represented by instance segmentation masks across multiple video frames, into world coordinate pose tracks, for road users seen by static surveillance cameras. The road users are bound to a ground surface represented by a number of 3D points and does not necessarily have to be perfectly flat. The system works with one or more views, by using a late fusion scheme. An approximate position, denoted the normal position, is computed from the camera calibration, per-class default heights and the ground surface model. The position is then refined a novel Convolutional Neural Network we denote Seg2PoseNet, taking instance segmentations and cropping positioning as its input. We evaluate this system quantitatively both on synthetic data from CARLA Simulator and on a real recording from a trinocular camera. The system outperforms the baseline method of only using the normal positions, which is roughly equivalent of a typical 2D to 3D conversion system, in both datasets.}},
  author       = {{Ahrnbom, Martin and Persson, Ivar and Nilsson, Mikael}},
  booktitle    = {{Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications: VISAPP}},
  isbn         = {{978-989-758-555-5}},
  keywords     = {{Pose Estimation; Instance Segmentation; Convolutional Neural Network; Traffic Safety; Road Users; Tracking; Stereo Camera; Trinocular Camera Array; Traffic Surveillance}},
  language     = {{swe}},
  month        = {{02}},
  pages        = {{777--784}},
  publisher    = {{SciTePress}},
  title        = {{Seg2Pose: Pose Estimations from Instance Segmentation Masks in One or Multiple Views for Traffic Applications}},
  url          = {{http://dx.doi.org/10.5220/0010777700003124}},
  doi          = {{10.5220/0010777700003124}},
  volume       = {{5}},
  year         = {{2022}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Seg2Pose: Pose Estimations from Instance Segmentation Masks in One or Multiple Views for Traffic Applications