Relative Pose Estimation through Affine Corrections of Monocular Depth Priors

Yu, Yifan; Liu, Shaohui; Pautrat, Rémi; Pollefeys, Marc; Larsson, Viktor

Relative Pose Estimation through Affine Corrections of Monocular Depth Priors

Mark

Yu, Yifan ; Liu, Shaohui ; Pautrat, Rémi ; Pollefeys, Marc and Larsson, Viktor ^LU

(2025) p.16706-16716

Abstract: Monocular depth estimation (MDE) models have undergone significant advancements over recent years. Many MDE models aim to predict affine-invariant relative depth from monocular images, while recent developments in large-scale training and vision foundation models enable reasonable estimation of metric (absolute) depth. However, effectively leveraging these predictions for geometric vision tasks, in particular relative pose estimation, remains relatively under explored. While depths provide rich constraints for cross-view image alignment, the intrinsic noise and ambiguity from the monocular depth priors present practical challenges to improving upon classic keypoint-based solutions. In this paper, we develop three solvers for relative pose... (More); Monocular depth estimation (MDE) models have undergone significant advancements over recent years. Many MDE models aim to predict affine-invariant relative depth from monocular images, while recent developments in large-scale training and vision foundation models enable reasonable estimation of metric (absolute) depth. However, effectively leveraging these predictions for geometric vision tasks, in particular relative pose estimation, remains relatively under explored. While depths provide rich constraints for cross-view image alignment, the intrinsic noise and ambiguity from the monocular depth priors present practical challenges to improving upon classic keypoint-based solutions. In this paper, we develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities, covering both calibrated and uncalibrated conditions. We further propose a hybrid estimation pipeline that combines our proposed solvers with classic point-based solvers and epipolar constraints. We find that the affine correction modeling is beneficial to not only the relative depth priors but also, surprisingly, the "metric" ones. Results across multiple datasets demonstrate large improvements of our approach over classic keypoint-based baselines and PnP-based solutions, under both calibrated and uncalibrated setups. We also show that our method improves consistently with different feature matchers and MDE models, and can further benefit from very recent advances on both modules. Code is available at https://github.com/MarkYu98/madpose. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/32f0f5c0-4d66-4d4a-bd27-4e90ea514d5b

author

Yu, Yifan ; Liu, Shaohui ; Pautrat, Rémi ; Pollefeys, Marc and Larsson, Viktor ^LU

organization

publishing date

2025

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

Computer graphics and computer vision

host publication

2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

pages

11 pages

publisher

IEEE

ISBN

979-8-3315-4364-8

DOI

10.1109/CVPR52734.2025.01557

language

English

LU publication?

yes

id

32f0f5c0-4d66-4d4a-bd27-4e90ea514d5b

date added to LUP

2026-04-02 13:58:08

date last changed

2026-04-13 12:27:36

@inproceedings{32f0f5c0-4d66-4d4a-bd27-4e90ea514d5b,
  abstract     = {{Monocular depth estimation (MDE) models have undergone significant advancements over recent years. Many MDE models aim to predict affine-invariant relative depth from monocular images, while recent developments in large-scale training and vision foundation models enable reasonable estimation of metric (absolute) depth. However, effectively leveraging these predictions for geometric vision tasks, in particular relative pose estimation, remains relatively under explored. While depths provide rich constraints for cross-view image alignment, the intrinsic noise and ambiguity from the monocular depth priors present practical challenges to improving upon classic keypoint-based solutions. In this paper, we develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities, covering both calibrated and uncalibrated conditions. We further propose a hybrid estimation pipeline that combines our proposed solvers with classic point-based solvers and epipolar constraints. We find that the affine correction modeling is beneficial to not only the relative depth priors but also, surprisingly, the "metric" ones. Results across multiple datasets demonstrate large improvements of our approach over classic keypoint-based baselines and PnP-based solutions, under both calibrated and uncalibrated setups. We also show that our method improves consistently with different feature matchers and MDE models, and can further benefit from very recent advances on both modules. Code is available at https://github.com/MarkYu98/madpose.}},
  author       = {{Yu, Yifan and Liu, Shaohui and Pautrat, Rémi and Pollefeys, Marc and Larsson, Viktor}},
  booktitle    = {{2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}},
  isbn         = {{979-8-3315-4364-8}},
  language     = {{eng}},
  pages        = {{16706--16716}},
  publisher    = {{IEEE}},
  title        = {{Relative Pose Estimation through Affine Corrections of Monocular Depth Priors}},
  url          = {{http://dx.doi.org/10.1109/CVPR52734.2025.01557}},
  doi          = {{10.1109/CVPR52734.2025.01557}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Relative Pose Estimation through Affine Corrections of Monocular Depth Priors