The animated Transformer: the Transformer model explained the fun way!

machine learningllmstransformerartificial intelligenecedeeplearningmlai

Скачать The animated Transformer: the Transformer model explained the fun way! бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно The animated Transformer: the Transformer model explained the fun way! или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Cкачать музыку The animated Transformer: the Transformer model explained the fun way! бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео The animated Transformer: the Transformer model explained the fun way!

In this video, I'll be taking you through the amazing yet mysterious Transformer model (sorry no Autobots or Decepticons), which is the unsung hero behind ChatGPT and other LLMs. You'll learn end-to-end how a Transformer works.

Papers
Original Transfomer: https://arxiv.org/pdf/1706.03762.pdf
GPT-3: https://arxiv.org/pdf/2005.14165.pdf
Batch normalization: https://arxiv.org/pdf/1502.03167.pdf
Layer normalization: https://arxiv.org/pdf/1607.06450.pdf
Fast Transformer: https://arxiv.org/pdf/1911.02150.pdf
Flash Attention: https://arxiv.org/pdf/2205.14135.pdf

00:00 - Introduction
1:40 - Input to the model
2:00 - Tokenization in Transformer
2:25 - Special tokens used by Transformer
3:01 - The input processor
3:51 - Some important notation and hyperparameters
4:40 - The importance of the context window size
5:24 - Basics of the Transformer
5:54 - RNNs vs Transformers
7:41 - Two types of attention: bidirectional vs causal
9:44 - Batch normalization vs Layer normalization
10:57 - Continuing on the Transformer
11:23 - Predictions with the Transformer
11:41 - Softmax for sequences
12:33 - Inference with the Transformer
13:14 - Sampling strategies for tokens
13:45 - Continuing on the Transformer
15:05 - Computing token embeddings
15:34 - Positional embeddings
17:00 - Self-attention layer
19:05 - Self-attention computations
21:31 - Multi-head attention
23:47 - Conclusion