Pixel-Perfect Structure-From-Motion With Featuremetric Refinement

Sarlin, Paul Edouard; Lindenberger, Philipp; Larsson, Viktor; Pollefeys, Marc

Pixel-Perfect Structure-From-Motion With Featuremetric Refinement

Mark

Sarlin, Paul Edouard ; Lindenberger, Philipp ; Larsson, Viktor ^LU and Pollefeys, Marc (2025) In IEEE Transactions on Pattern Analysis and Machine Intelligence 47(5). p.3298-3309

Abstract: Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this article, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network.... (More); Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this article, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors, challenging viewing conditions, and off-the-shelf deep features. Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/3336ceae-3da4-4656-a4cc-221aa5f62130

author

Sarlin, Paul Edouard ; Lindenberger, Philipp ; Larsson, Viktor ^LU and Pollefeys, Marc

organization

publishing date

2025

type

Contribution to journal

publication status

published

subject

Computer graphics and computer vision

keywords

Bundle adjustment, feature matching, featuremetric optimization, structure-from-Motion, visual localization

in

IEEE Transactions on Pattern Analysis and Machine Intelligence

volume

47

issue

5

pages

12 pages

publisher

IEEE - Institute of Electrical and Electronics Engineers Inc.

external identifiers

scopus:105002984488
pmid:37021895

ISSN

0162-8828

DOI

10.1109/TPAMI.2023.3237269

language

English

LU publication?

yes

id

3336ceae-3da4-4656-a4cc-221aa5f62130

date added to LUP

2025-08-29 14:20:14

date last changed

2026-01-31 04:40:36

@article{3336ceae-3da4-4656-a4cc-221aa5f62130,
  abstract     = {{<p>Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this article, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors, challenging viewing conditions, and off-the-shelf deep features. Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale.</p>}},
  author       = {{Sarlin, Paul Edouard and Lindenberger, Philipp and Larsson, Viktor and Pollefeys, Marc}},
  issn         = {{0162-8828}},
  keywords     = {{Bundle adjustment; feature matching; featuremetric optimization; structure-from-Motion; visual localization}},
  language     = {{eng}},
  number       = {{5}},
  pages        = {{3298--3309}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{IEEE Transactions on Pattern Analysis and Machine Intelligence}},
  title        = {{Pixel-Perfect Structure-From-Motion With Featuremetric Refinement}},
  url          = {{http://dx.doi.org/10.1109/TPAMI.2023.3237269}},
  doi          = {{10.1109/TPAMI.2023.3237269}},
  volume       = {{47}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Pixel-Perfect Structure-From-Motion With Featuremetric Refinement