Скачать или смотреть Tokenize any input, even continuous vectors! - Residual Vector Quantization - VALL-E (Part 2)

Tokenize any input, even continuous vectors! - Residual Vector Quantization - VALL-E (Part 2)

Скачать Tokenize any input, even continuous vectors! - Residual Vector Quantization - VALL-E (Part 2) бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно Tokenize any input, even continuous vectors! - Residual Vector Quantization - VALL-E (Part 2) или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Информация по загрузке:

Cкачать музыку Tokenize any input, even continuous vectors! - Residual Vector Quantization - VALL-E (Part 2) бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео Tokenize any input, even continuous vectors! - Residual Vector Quantization - VALL-E (Part 2)

VALL-E can generate audio of any text from just 3 seconds of audio sample. We will dissect the technology behind it, how it works, and also delve in a bit more into this cool quantization technique called Residual Vector Quantization, which allows quantizing of a continuous vector input space.

Part 1 here (Watch mainly for the explanation of the Mel Spectrogram):    • Using Transformers to mimic anyone's voice...

Slides and Jupyter Notebook can be found here: https://github.com/tanchongmin/Tensor...

Related papers:
Soundstream (First paper which introduced Residual Vector Quantization in modern times): https://arxiv.org/abs/2107.03312
Encodec (high fidelity audio compression which generates quantized codes): https://arxiv.org/pdf/2210.13438.pdf
VALL-E (Paper we are discussing): https://valle-demo.github.io/
VALL-E X (Cross-lingual VALL-E): https://arxiv.org/pdf/2303.03926.pdf
Universal Speech Model (Automatic Speech Recognition with 12 million hours pre-training data - showing the scalability of pre-training data): https://sites.research.google/usm/
Tacotron 2 (Generating text to speech via Mel Spectrogram): https://pytorch.org/hub/nvidia_deeple...

~~~~~
0:00 Introduction
4:14 Time and Frequency Domain representations
10:28 Recap on Part 1
12:30 Encodec (Corrected Model Explanation)
31:04 Coding session with Encodec!
53:52 VALL-E
55:40 Residual Vector Quantization and Hierarchical Representation
1:21:53 VALL-E Token Generation
1:33:55 Results
1:38:19 Limitations
1:41:14 How to perform hierarchical prediction?
1:45:04 Discussion

~~~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord:   / discord
LinkedIn:   / chong-min-tan-94652288
Online AI blog: https://delvingintotech.wordpress.com/.
Twitter:   / johntanchongmin
Try out my games here: https://simmer.io/@chongmin

Комментарии

Информация по комментариям в разработке