Скачать или смотреть [CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

Скачать [CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно [CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Информация по загрузке:

Cкачать музыку [CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео [CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

Website: https://yunzeman.github.io/situation3d

Abstract: Being able to carry out complicated vision language reasoning tasks in 3D space represents a significant milestone in developing household robots and human-centered embodied AI. In this work, we demonstrate that a critical and distinct challenge in 3D vision language reasoning is the situational awareness, which incorporates two key components: (1) The autonomous agent grounds its self-location based on a language prompt. (2) The agent answers open-ended questions from the perspective of its calculated position. To address this challenge, we introduce SIG3D, an end-to-end Situation-Grounded model for 3D vision language reasoning. We tokenize the 3D scene into sparse voxel representation, and propose a language-grounded situation estimator, followed by a situated question answering module. Experiments on the SQA3D and ScanQA datasets show that SIG3D outperforms state-of-the-art models in situational estimation and question answering by a large margin (e.g., an enhancement of over 30% on situation accuracy). Subsequent analysis corroborates our architectural design choices, explores the distinct functions of visual and textual tokens, and highlights the importance of situational awareness in the domain of 3D question-answering.

Комментарии

Информация по комментариям в разработке