[CVPR 2024] Question Aware Vision Transformer for Multimodal Reasoning

Описание к видео [CVPR 2024] Question Aware Vision Transformer for Multimodal Reasoning

Title: Question Aware Vision Transformer for Multimodal Reasoning
Authors: Roy Ganz, Yair Kittenplon, Aviad Aberdam, Elad Ben Avraham, Oren Nuriel, Shai Mazor, Ron Litman
Conference: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024

arXiv: https://arxiv.org/pdf/2402.05472
code: https://github.com/amazon-science/QA-ViT

Комментарии

Информация по комментариям в разработке