Logo video2dn
  • Сохранить видео с ютуба
  • Категории
    • Музыка
    • Кино и Анимация
    • Автомобили
    • Животные
    • Спорт
    • Путешествия
    • Игры
    • Люди и Блоги
    • Юмор
    • Развлечения
    • Новости и Политика
    • Howto и Стиль
    • Diy своими руками
    • Образование
    • Наука и Технологии
    • Некоммерческие Организации
  • О сайте

Скачать или смотреть But how do AI images and videos actually work? | Guest video by @WelchLabsVideo

  • 3Blue1Brown
  • 2025-07-25
  • 944997
But how do AI images and videos actually work? | Guest video by  @WelchLabsVideo
Mathematicsthree blue one brown3 blue 1 brown3b1b3brown1blue3 brown 1 bluethree brown one blue
  • ok logo

Скачать But how do AI images and videos actually work? | Guest video by @WelchLabsVideo бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно But how do AI images and videos actually work? | Guest video by @WelchLabsVideo или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

  • Информация по загрузке:

Cкачать музыку But how do AI images and videos actually work? | Guest video by @WelchLabsVideo бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео But how do AI images and videos actually work? | Guest video by @WelchLabsVideo

Diffusion models, CLIP, and the math of turning text into images
Welch Labs Book: https://www.welchlabs.com/resources/i...

Sections
0:00 - Intro
3:37 - CLIP
6:25 - Shared Embedding Space
8:16 - Diffusion Models & DDPM
11:44 - Learning Vector Fields
22:00 - DDIM
25:25 - Dall E 2
26:37 - Conditioning
30:02 - Guidance
33:39 - Negative Prompts
34:27 - Outro
35:32 - About guest videos

Special Thanks to:
Jonathan Ho - Jonathan is the Author of the DDPM paper and the Classifier Free Guidance Paper.
https://arxiv.org/pdf/2006.11239
https://arxiv.org/pdf/2207.12598

Preetum Nakkiran - Preetum has an excellent introductory diffusion tutorial:
https://arxiv.org/pdf/2406.08929

Chenyang Yuan - Many of the animations in this video were implemented using manim and Chenyang’s smalldiffusion library: https://github.com/yuanchenyang/small...

Cheyang also has a terrific tutorial and MIT course on diffusion models
https://www.chenyang.co/diffusion.html
https://www.practical-diffusion.org/

Other References
All of Sander Dieleman’s diffusion blog posts are fantastic: https://sander.ai/
CLIP Paper: https://arxiv.org/pdf/2103.00020
DDIM Paper: https://arxiv.org/pdf/2010.02502
Score-Based Generative Modeling: https://arxiv.org/pdf/2011.13456
Wan2.1: https://github.com/Wan-Video/Wan2.1
Stable Diffusion: https://huggingface.co/stabilityai/st...
Midjourney: https://www.midjourney.com/
Veo: https://deepmind.google/models/veo/
DallE 2 paper: https://cdn.openai.com/papers/dall-e-...
Code for this video: https://github.com/stephencwelch/mani...

Written by: Stephen Welch, with very helpful feedback from Grant Sanderson
Produced by: Stephen Welch, Sam Baskin, and Pranav Gundu

Technical Notes
The noise videos in the opening have been passed through a VAE (actually, diffusion process happens in a compressed “latent” space), which acts very much like a video compressor - this is why the noise videos don’t look like pure salt and pepper.

6:15 CLIP: Although directly minimizing cosine similarity would push our vectors 180 degrees apart on a single batch, overall in practice, we need CLIP to maximize the uniformity of concepts over the hypersphere it's operating on. For this reason, we animated these vectors as orthogonal-ish. See: https://proceedings.mlr.press/v119/wa...

Per Chenyang Yuan: at 10:15, the blurry image that results when removing random noise in DDPM is probably due to a mismatch in noise levels when calling the denoiser. When the denoiser is called on x_{t-1} during DDPM sampling, it is expected to have a certain noise level (let's call it sigma_{t-1}). If you generate x_{t-1} from x_t without adding noise, then the noise present in x_{t-1} is always smaller than sigma_{t-1}. This causes the denoiser to remove too much noise, thus pointing towards the mean of the dataset.

The text conditioning input to stable diffusion is not the 512-dim text embedding vector, but the output of the layer before that, [with dimension 77x512](https://stackoverflow.com/a/79243065)

For the vectors at 31:40 - Some implementations use f(x, t, cat) + alpha(f(x, t, cat) - f(x, t)), and some that do f(x, t) + alpha(f(x, t, cat) - f(x, t)), where an alpha value of 1 corresponds to no guidance. I chose the second format here to keep things simpler.

At 30:30, the unconditional t=1 vector field looks a bit different from what it did at the 17:15 mark. This is the result of different models trained for different parts of the video, and likely a result of different random initializations.


Premium Beat Music ID: EEDYZ3FP44YX8OWT

Комментарии

Информация по комментариям в разработке

Похожие видео

  • О нас
  • Контакты
  • Отказ от ответственности - Disclaimer
  • Условия использования сайта - TOS
  • Политика конфиденциальности

video2dn Copyright © 2023 - 2025

Контакты для правообладателей [email protected]