Скачать или смотреть [CVPR 2024] Harnessing Large Language Models for Training-free Video Anomaly Detection

[CVPR 2024] Harnessing Large Language Models for Training-free Video Anomaly Detection

Università di TrentoUniversity of TrentoDipartimento di Ingegneria e Scienza dell'InformazioneDepartment of Information Engineering and Computer ScienceCVPR2024

Скачать [CVPR 2024] Harnessing Large Language Models for Training-free Video Anomaly Detection бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно [CVPR 2024] Harnessing Large Language Models for Training-free Video Anomaly Detection или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Информация по загрузке:

Cкачать музыку [CVPR 2024] Harnessing Large Language Models for Training-free Video Anomaly Detection бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео [CVPR 2024] Harnessing Large Language Models for Training-free Video Anomaly Detection

Video anomaly detection (VAD) aims to temporally locate abnormal events in a video. Existing works mostly rely on training deep models to learn the distribution of normality with either video-level supervision, one-class supervision, or in an unsupervised setting. Training-based methods are prone to be domain-specific, thus being costly for practical deployment as any domain change will involve data collection and model training.
In this paper, we radically depart from previous efforts and propose LAnguage-based VAD (LAVAD), a method tackling VAD in a novel, training-free paradigm, exploiting the capabilities of pre-trained large language models (LLMs) and existing vision-language models (VLMs). We leverage VLM-based captioning models to generate textual descriptions for each frame of any test video. With the textual scene description, we then devise a prompting mechanism to unlock the capability of LLMs in terms of temporal aggregation and anomaly score estimation, turning LLMs into an effective video anomaly detector. We further leverage modality-aligned VLMs and propose effective techniques based on cross-modal similarity for cleaning noisy captions and refining the LLM-based anomaly scores.
We evaluate LAVAD on two large datasets featuring real-world surveillance scenarios (UCF-Crime and XD-Violence), showing that it outperforms both unsupervised and one-class methods without requiring any training or data collection.

Project page: https://lucazanella.github.io/lavad/

Code: https://github.com/lucazanella/lavad

Комментарии

Информация по комментариям в разработке