Vision Transformer Visualisation (An image is worth 16x16 words)

Описание к видео Vision Transformer Visualisation (An image is worth 16x16 words)

This video is a small visualisation of a Vision Transformer (ViT) from the paper, "An image is worth 16x16 words" by Alexey Dosovitskiy, et al. This video shows the transformation of an image of the GOAT step by step through the ViT architecture 🔥

Inspiration was taken from:
‪@DigitalSreeni‬ ‪@3blue1brown‬

Комментарии

Информация по комментариям в разработке