ECCV 2024: Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

Описание к видео ECCV 2024: Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

In this talk, I will introduce our recent work on open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, for open-vocabulary 3D semantic segmentation and visual grounding tasks. Diff2Scene gets rid of any labeled 3D data and effectively identifies objects, appearances, locations and their compositions in 3D scenes.

ECCV 2024 Paper: Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

https://arxiv.org/abs/2407.13642

About the Speaker

Xiaoyu Zhu is a Ph.D. student at Language Technologies Institute, School of Computer Science, Carnegie Mellon University. Her research interest is computer vision, multimodal learning, and generative models.

Комментарии

Информация по комментариям в разработке