Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations

Описание к видео Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations

[Abstract]
This paper introduces and assesses a cross-modal global visual localization system that can localize camera images within a color 3D map representation built using both visual and lidar sensing. We present three different state-of-the-art methods for creating the color 3D map --- namely point clouds, meshes, and neural radiance fields (NeRF). Our system constructs a database of synthetic RGB and depth image pairs from these representations. This database serves as the basis for global localization. We present an automatic approach that builds this database by synthesizing novel images of the scene and exploiting the 3D structure encoded in the different representations.
Next, we present a global localization system that relies on the synthetic image database to accurately estimate the 6 Degrees of Freedom (DoF) camera pose of monocular query images. Our localization approach relies on different learning-based global descriptors and feature detectors which enable robust image retrieval and matching despite the domain gap between (real) query camera images and the synthetic database images.
We assess the system's performance through extensive real-world experiments in both indoor and outdoor settings, in order to evaluate the effectiveness of each map representation, the advantages of learning-based features and descriptors, and the benefits against traditional structure-from-motion localization approaches.
Our results show that all three map representations can achieve consistent localization success rates of 55% and higher across various environments. NeRF synthesized images show superior performance, localizing query images at an average success rate of 72%. Furthermore, we demonstrate that our synthesized database enables global localization even when the map creation data and the localization sequence are captured when travelling in opposite directions.
Our system, operating in real-time on a mobile laptop equipped with a GPU, achieves a processing rate of 1Hz.

Authors: Lintong Zhang, Yifu Tao, Jiarong Lin, Fu Zhang, Maurice Fallon

Pre-print: https://www.arxiv.org/abs/2408.11966
PDF: https://www.arxiv.org/pdf/2408.11966

Комментарии

Информация по комментариям в разработке