Stable Vidoe Diffusion - model architecture, training procedure and results (paper fully explained)

Описание к видео Stable Vidoe Diffusion - model architecture, training procedure and results (paper fully explained)

Stability AI which is one of the leading players in image generation space has come up with a brand new model for video generation namely Stable Video Diffusion. It's their first foundation model for generating videos. It's capable of generating videos at 14 and 25 frames at customizable frame rates between 3 and 30 frames per second. They have shown in their blog that the performance surpasses the leading closed models from runway and pika labs.

In this video let's dive into their research paper, and find out more about the three-stage training process proposed specifically for video generation models. As always, let's go into the model architecture, the training pipeline, and some of the practical application of the model at scale using what is called the base model released along with the paper.

⌚️ ⌚️ ⌚️ TIMESTAMPS ⌚️ ⌚️ ⌚️
0:00 - Intro
1:06 - Model Architecture
2:10 - Training Stages
2:25 - Image Pretraining Stage
2:45 - Motivation for Image Pretraining
3:15 - Video Curation Stage
4:16 - Video data curation pipeline
5:18 - LVD Dataset
5:35 - Filtering Mechanisms
5:55 - Optical Flow
6:08 - Synthetic Captions
7:07 - OCR Detection
7:47 - LVD dataset summarised
8:32 - Ablation studies
9:20 - High quality fine-tuning
10:35 - Base Model
11:15 - Tex-to-video example
11:22 - Image-to-video example
12:30 - Conclusion


🛠 🛠 🛠 MY SOFTWARE TOOLS 🛠 🛠 🛠
✍️ Notion - https://affiliate.notion.so/aibites-yt
✍️ Notion AI - https://affiliate.notion.so/ys9rqzv2vdd8
📹 OBS Studio for video editing - https://obsproject.com
📼 Manim for some animations - https://www.manim.community
🎵 My music - https://www.bensound.com and


📚 📚 📚 BOOKS I HAVE READ, REFER AND RECOMMEND 📚 📚 📚
📖 Deep Learning by Ian Goodfellow - https://amzn.to/3Wnyixv
📙 Pattern Recognition and Machine Learning by Christopher M. Bishop - https://amzn.to/3ZVnQQA
📗 Machine Learning: A Probabilistic Perspective by Kevin Murphy - https://amzn.to/3kAqThb
📘 Multiple View Geometry in Computer Vision by R Hartley and A Zisserman - https://amzn.to/3XKVOWi


MY KEY LINKS
YouTube:    / @aibites  
Twitter:   / ai_bites​  
Patreon:   / ai_bites​  
Github: https://github.com/ai-bites​


WHO AM I?
I am a Machine Learning Researcher / practitioner who has seen the grind of academia and start-ups equally. I started my career as a software engineer 15 years ago. Because of my love for Mathematics (coupled with a glimmer of luck), I graduated with a Master's in Computer Vision and Robotics in 2016 when the now happening AI revolution just started. Life has changed for the better ever since.

#machinelearning #deeplearning #aibites

Комментарии

Информация по комментариям в разработке