Improving Open-Vocabulary Object Detection in a Vision Language Model | Nikko Yabut | NEECECON 2024

Описание к видео Improving Open-Vocabulary Object Detection in a Vision Language Model | Nikko Yabut | NEECECON 2024

Improving Open-Vocabulary Object Detection in a Vision Language Model
Nikko Carlo Yabut
MEng AI Student
Artificial Intelligence Program
UP Diliman

Vision Language Models (VLMs) are foundational models that provide meaningful answers to questions requiring image understanding. One of the many important tasks that VLMs can perform is open-vocabulary object detection (OVOD), which involves detecting objects beyond the known classes the model was trained on. Despite training on extensive vision and language datasets, state-of-the-art (SOTA) VLMs such as LLaVA-1.5-13b, LLaVA-NEXT, and Qwen-VL-Chat achieve only 7.8, 14.4, and 18.6 mAP respectively on the LVIS benchmark. In this paper, we propose improving the OVOD capability of LLaVA-1.5 by training it on a new dataset generated using YOLO-World and a VLM. By using a properly tuned prompt, our model, called LLaVA-OVOD, achieves 25.2 mAP on LVIS, a score comparable to SOTA specialist OVOD models such as OWL-ViT (25.6 mAP) and CondHead (25.1 mAP). Additionally, LLaVA-OVOD shows respectable gains in most of the original benchmark scores on general tasks.

As part of the National Electrical, Electronics and Computer Engineering Conference (NEECECON 2024), this technical session is organized by the UP Electrical and Electronics Engineering Institute with the theme "National Development through Sustainable Industrialization."

NEECECON 2024 is co-located with the Advanced Science, Technology, and Innovation Convention (ASTICON) 2024, held from 18 to 19 July 2024 at the Novotel Manila Araneta City in Quezon City.

ASTICON 2024 showcased DOST-ASTI and UP EEEI's pioneering contributions to the ICT landscape while celebrating the partnerships that drive technological advancement and societal progress in the country.

For more info about the event, visit https://neececon2024.eee.upd.edu.ph.

Комментарии

Информация по комментариям в разработке