Structure from Motion with a Neural Network

Gong, Jiarong

Structure from Motion with a Neural Network

Mark

Gong, Jiarong ^LU (2023) In Master's Theses in Mathematical Sciences FMAM02 20231
Mathematics (Faculty of Engineering)

Abstract: This project delves into the 3D reconstruction of both single and multiple rigid motions, examining the potential of deep learning methods, such as that proposed by Moran et al., to supplant traditional geometry-based approaches. The project is structured into two main parts.

In the first part, we focus on the 3D reconstruction of a single rigid motion, building on the work of Moran et al. In addition to using the dataset they used in their paper, we expand it with a new one named BlendedMVS and evaluate the generalization performance of the network on this enriched dataset. The network, in general, performs commendably in both single-scene optimization and multi-view learning settings. Furthermore, by exploring different architectures... (More); This project delves into the 3D reconstruction of both single and multiple rigid motions, examining the potential of deep learning methods, such as that proposed by Moran et al., to supplant traditional geometry-based approaches. The project is structured into two main parts.

In the first part, we focus on the 3D reconstruction of a single rigid motion, building on the work of Moran et al. In addition to using the dataset they used in their paper, we expand it with a new one named BlendedMVS and evaluate the generalization performance of the network on this enriched dataset. The network, in general, performs commendably in both single-scene optimization and multi-view learning settings. Furthermore, by exploring different architectures and enhancing the network, we manage to slightly improve the success rate of single-scene optimization.

In the second part, our attention shifts to multiple rigid motions. We initially employ YOLOV7 and Deep Sort for motion segmentation, using a portion of the Hopkins 155 dataset. In total, there are nine video files, each corresponding to a segmentation accuracy. The results reveal five instances of 100% accuracy, with the lowest accuracy standing at 99.55%. In terms of single-scene optimization and multi-view learning settings, there are no failed 3D reconstructions, indicating that all the reprojection errors fall below two pixels. (Less)
Popular Abstract: Transforming 2D Images into 3D Reality: Recovering Objects and Camera
Positions

From multiple 2D images, we can recreate a vivid 3D object. Amazing, isn’t it? The real challenge lies in the accuracy of this reconstruction. We measure this accuracy using something called ‘reprojection errors’, which simply put, tells us how closely our 3D reconstruction matches the original 2D images. The objective of this project is to reconstruct 3D objects from multiple 2D images. This task can be complex, especially when images contain both stationary and moving objects, such as buildings and vehicles. So, the project is divided into two parts based on the object types.

In the first part, we worked with images containing only stationary objects... (More); Transforming 2D Images into 3D Reality: Recovering Objects and Camera
Positions

From multiple 2D images, we can recreate a vivid 3D object. Amazing, isn’t it? The real challenge lies in the accuracy of this reconstruction. We measure this accuracy using something called ‘reprojection errors’, which simply put, tells us how closely our 3D reconstruction matches the original 2D images. The objective of this project is to reconstruct 3D objects from multiple 2D images. This task can be complex, especially when images contain both stationary and moving objects, such as buildings and vehicles. So, the project is divided into two parts based on the object types.

In the first part, we worked with images containing only stationary objects like buildings, parked cars, and trees. We used a new dataset named ‘BlendedMVS’. With the help of advanced image processing techniques, we extracted ‘point matches’ from these images. A ‘point match’ represents the same 3D point viewed from different images. After compiling these points into an input for our model, we used the network proposed by researchers Moran et al. to automatically generate the 3D objects and camera positions in space. We also add some improvements based on the network to increase the reconstruction quality. The results were promising; we managed to improve the accuracy of the reconstructions, as indicated by lower reprojection errors.

In the second part, we worked with video files, which can be seen as a series of images. This time, the data contained both stationary and moving objects. For this task, we used a dataset called ’Hopkins 155’. In each video, some 2D point matches from both the stationary scene and moving cars were already provided. Since we couldn’t reconstruct all of them at once, we segmented them and reconstructed each individually. To do this, we used tools like YOLOV7 and Deep Sort to detect and isolate different objects, such as cars and trucks. By seeing which bounding box a point was located in, we could roughly determine which object it came from. This method gave us highly accurate segmentation results, close to 100%. Using our improved architecture, we achieved reprojection errors below 1 pixel for ‘Hopkins 155’ dataset, indicating high-quality 3D reconstructions. Figures 1 and 2 are provided here to illustrate the process. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9126530

author

Gong, Jiarong ^LU

supervisor

Yaqing Ding ^LU
Carl Olsson ^LU

organization

Mathematics (Faculty of Engineering)

course

FMAM02 20231

year

2023

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

publication/series

Master's Theses in Mathematical Sciences

report number

LUTFMA-3507-2023

ISSN

1404-6342

other publication id

2023:E30

language

English

id

9126530

date added to LUP

2023-06-27 09:36:02

date last changed

2023-06-27 09:36:02

@misc{9126530,
  abstract     = {{This project delves into the 3D reconstruction of both single and multiple rigid motions, examining the potential of deep learning methods, such as that proposed by Moran et al., to supplant traditional geometry-based approaches. The project is structured into two main parts.

In the first part, we focus on the 3D reconstruction of a single rigid motion, building on the work of Moran et al. In addition to using the dataset they used in their paper, we expand it with a new one named BlendedMVS and evaluate the generalization performance of the network on this enriched dataset. The network, in general, performs commendably in both single-scene optimization and multi-view learning settings. Furthermore, by exploring different architectures and enhancing the network, we manage to slightly improve the success rate of single-scene optimization.

In the second part, our attention shifts to multiple rigid motions. We initially employ YOLOV7 and Deep Sort for motion segmentation, using a portion of the Hopkins 155 dataset. In total, there are nine video files, each corresponding to a segmentation accuracy. The results reveal five instances of 100% accuracy, with the lowest accuracy standing at 99.55%. In terms of single-scene optimization and multi-view learning settings, there are no failed 3D reconstructions, indicating that all the reprojection errors fall below two pixels.}},
  author       = {{Gong, Jiarong}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Structure from Motion with a Neural Network}},
  year         = {{2023}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Structure from Motion with a Neural Network