Pixel-Perfect Structure-from-Motion with Featuremetric Refinement

Lindenberger, Philipp; Sarlin, Paul-Edouard; Larsson, Viktor; Pollefeys, Marc

Pixel-Perfect Structure-from-Motion with Featuremetric Refinement

Mark

Lindenberger, Philipp ; Sarlin, Paul-Edouard ; Larsson, Viktor ^LU and Pollefeys, Marc (2021) 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 p.5967-5977

Abstract: Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this paper, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This... (More); Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this paper, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors, challenging viewing conditions, and off-the-shelf deep features. Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale. Our code is publicly available at github.com/cvg/pixel-perfect-sfm as an add-on to the popular SfM software COLMAP. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/6d72bdd4-ed87-46e8-971d-f1d262c00320

author

Lindenberger, Philipp ; Sarlin, Paul-Edouard ; Larsson, Viktor ^LU and Pollefeys, Marc

publishing date

2021

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

host publication

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

pages

11 pages

publisher

IEEE - Institute of Electrical and Electronics Engineers Inc.

conference name

18th IEEE/CVF International Conference on Computer Vision, ICCV 2021

conference location

Virtual, Online, Canada

conference dates

2021-10-11 - 2021-10-17

external identifiers

scopus:85121666056

DOI

10.1109/ICCV48922.2021.00593

language

English

LU publication?

no

id

6d72bdd4-ed87-46e8-971d-f1d262c00320

date added to LUP

2022-09-06 13:22:17

date last changed

2025-10-14 11:45:53

@inproceedings{6d72bdd4-ed87-46e8-971d-f1d262c00320,
  abstract     = {{Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this paper, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors, challenging viewing conditions, and off-the-shelf deep features. Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale. Our code is publicly available at github.com/cvg/pixel-perfect-sfm as an add-on to the popular SfM software COLMAP.}},
  author       = {{Lindenberger, Philipp and Sarlin, Paul-Edouard and Larsson, Viktor and Pollefeys, Marc}},
  booktitle    = {{2021 IEEE/CVF International Conference on Computer Vision (ICCV)}},
  language     = {{eng}},
  pages        = {{5967--5977}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  title        = {{Pixel-Perfect Structure-from-Motion with Featuremetric Refinement}},
  url          = {{http://dx.doi.org/10.1109/ICCV48922.2021.00593}},
  doi          = {{10.1109/ICCV48922.2021.00593}},
  year         = {{2021}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Pixel-Perfect Structure-from-Motion with Featuremetric Refinement