Grounding Dino for open set object detection

Описание к видео Grounding Dino for open set object detection

This video talks about Grounding Dino - Dino's "open set" object detection brother that allows to detect objects from novel categories zero shot, as well as detect objects using referring expressions like "the lion most to the right".
This video is part of broader series: Modern Object Detection - from YOLO to Transformers    • Modern Object Detection: from YOLO to...  . Check out this playlist for other object detection videos, including source code reads for Grounding Dino predecessors - DETR, Deformable DETR, DAB DETR, DN DETR and Dino.
Important links:
Original paper: https://arxiv.org/pdf/2303.05499
GLIP paper - another open set object detection model with some explanations regarding training setup and datasets https://openaccess.thecvf.com/content...

00:00 - Intro
02:08 - Prerequisites
05:12 - Dino overview
13:05 - Bert overview
15:39 - Grounding Dino Inputs and Outputs
21:37 - Sub Sentence Level Attention Mask
24:29 - Cross Modality Feature Enhancer
30:06 - Language Guided Query Selection
35:32 - Cross Modality Decoder
39:19 - Token Output
41:00 - Training Data
48:05 - Results & Next Up

Комментарии

Информация по комментариям в разработке