Relative depth estimation from dense matches using Graph Neural Networks

Salnikov, Vladimir

Relative depth estimation from dense matches using Graph Neural Networks

Mark

Salnikov, Vladimir ^LU (2025) NUMK11 20251
Mathematics (Faculty of Sciences)
Centre for Mathematical Sciences

Abstract: This thesis explores the application of Graph Neural Networks (GNNs) for relative depth estimation from dense matches between images. The main goals were to investigate the feasibility of using dense matches alone for this task, assess the performance improvement offered by GNNs over classical methods, and identify the most suitable features. The methodology involved utilizing the matches from Dense Kernelized Feature Matching (DKM) algorithm applied to the ScanNet-1500 and MegaDepth-1500 datasets. A multi-scale Graph Attention Network (MultiScaleGAT) model was developed and evaluated against Uniform-One and K-Nearest Neighbour baselines using Mean Squared Error (MSE) and Success Rates (SR) metrics. Experimental results demonstrate that... (More); This thesis explores the application of Graph Neural Networks (GNNs) for relative depth estimation from dense matches between images. The main goals were to investigate the feasibility of using dense matches alone for this task, assess the performance improvement offered by GNNs over classical methods, and identify the most suitable features. The methodology involved utilizing the matches from Dense Kernelized Feature Matching (DKM) algorithm applied to the ScanNet-1500 and MegaDepth-1500 datasets. A multi-scale Graph Attention Network (MultiScaleGAT) model was developed and evaluated against Uniform-One and K-Nearest Neighbour baselines using Mean Squared Error (MSE) and Success Rates (SR) metrics. Experimental results demonstrate that dense correspondences provide sufficient information for relative depth estimation, and the MultiScaleGAT model significantly outperforms baseline methods on both indoor and outdoor datasets. A feature ablation study showed that the KNN ratio feature is crucial for cross-domain accuracy. The findings confirm the considerable advantage of using a learnable graph-based architecture for this task. (Less)
Popular Abstract: Computer vision is a field of artificial intelligence that teaches computers to interpret and understand visual information from images and videos. In particular, 3D computer vision aims to understand the three-dimensional structure of the world from these 2D images. Think about how our brains, receiving slightly different images from each eye, can perceive depth. 3D computer vision tries to replicate this ability computationally.
Relative depth estimation is a fundamental task in 3D computer vision. Instead of measuring the exact distance from the camera to every point in a scene, we capture two images from different viewpoints and calculate how much closer or farther each object appears in the second image relative to the first.
My... (More); Computer vision is a field of artificial intelligence that teaches computers to interpret and understand visual information from images and videos. In particular, 3D computer vision aims to understand the three-dimensional structure of the world from these 2D images. Think about how our brains, receiving slightly different images from each eye, can perceive depth. 3D computer vision tries to replicate this ability computationally.
Relative depth estimation is a fundamental task in 3D computer vision. Instead of measuring the exact distance from the camera to every point in a scene, we capture two images from different viewpoints and calculate how much closer or farther each object appears in the second image relative to the first.
My thesis explores how we can use a powerful type of artificial intelligence called Graph Neural Networks (GNNs) to improve this relative depth estimation process. To understand GNNs, let's first think about graphs. Imagine a social network where people are "nodes" and their friendships are "edges" connecting them. A graph is simply a collection of nodes and edges showing how they relate to each other.
Graph Neural Networks are designed specifically to work with this kind of connected data. Unlike typical AI methods that primarily look at individual data points, GNNs consider the relationships and connections between points. They do this by passing information, or "messages," along the edges of the graph, allowing each node to learn from its neighbors and from the broader structure of the network.
My research shows that using GNNs results in much better estimates of relative depth than simpler methods. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9189402

author

Salnikov, Vladimir ^LU

supervisor

Viktor Larsson ^LU

organization

course

NUMK11 20251

year

2025

type

M2 - Bachelor Degree

subject

Mathematics and Statistics

keywords

Relative depth estimation, Relative Scale estimation, Graph Neural Networks, Computer Vision

report number

LUNFNA-4061-2025

ISSN

1654-6229

other publication id

2025:K5

language

English

id

9189402

date added to LUP

2025-06-12 14:59:53

date last changed

2025-06-12 14:59:53

@misc{9189402,
  abstract     = {{This thesis explores the application of Graph Neural Networks (GNNs) for relative depth estimation from dense matches between images. The main goals were to investigate the feasibility of using dense matches alone for this task, assess the performance improvement offered by GNNs over classical methods, and identify the most suitable features. The methodology involved utilizing the matches from Dense Kernelized Feature Matching (DKM) algorithm applied to the ScanNet-1500 and MegaDepth-1500 datasets. A multi-scale Graph Attention Network (MultiScaleGAT) model was developed and evaluated against Uniform-One and K-Nearest Neighbour baselines using Mean Squared Error (MSE) and Success Rates (SR) metrics. Experimental results demonstrate that dense correspondences provide sufficient information for relative depth estimation, and the MultiScaleGAT model significantly outperforms baseline methods on both indoor and outdoor datasets. A feature ablation study showed that the KNN ratio feature is crucial for cross-domain accuracy. The findings confirm the considerable advantage of using a learnable graph-based architecture for this task.}},
  author       = {{Salnikov, Vladimir}},
  issn         = {{1654-6229}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Relative depth estimation from dense matches using Graph Neural Networks}},
  year         = {{2025}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Relative depth estimation from dense matches using Graph Neural Networks