[ICASSP 2022] Text2Video: Text-driven Talking-head Video Synthesis with Phoneme-Pose Dictionary Demo

Описание к видео [ICASSP 2022] Text2Video: Text-driven Talking-head Video Synthesis with Phoneme-Pose Dictionary Demo

Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary
Paper: https://arxiv.org/abs/2104.14631
Github: https://github.com/sibozhang/Text2Video
Project Page: https://sites.google.com/view/sibozha...

With the recent advancements in deep learning and computer vision, the
autonomous excavator has made signicant progress. Safety is the most
important section in any autonomous excavator system. In this paper,
we propose a vision-based excavator perception, activity analysis, and
safety monitoring system. Our perception system could detect multi-
class construction machines and humans in real-time while estimating the
poses and actions of the excavator. Then, we present a novel excavator
safety monitoring and activity analysis system based on the perception
result. To evaluate the performance of our method, we collect a dataset
using the Autonomous Excavator System (AES) [1] including multi-class
of objects in different lighting conditions with human annotations. We
also evaluate our method on a benchmark construction dataset. The
experimental results show that the proposed action recognition approach
outperforms the state-of-the-art approaches on top-1 accuracy by about
5.18%. Although the activity analysis and safety monitoring system is
designed for our Autonomous Excavator System (AES) in solid waste
scenes, it can also be generally applied to manned excavators and other
construction scenarios.

Keywords: Computer Vision, Deep Learning, Object Detection, Action
Recognition, Safety Monitor, Activity Analysis

Комментарии

Информация по комментариям в разработке