An image is worth 16x16 words: ViT | Vision Transformer explained

Описание к видео An image is worth 16x16 words: ViT | Vision Transformer explained

Mom, it's the Transformers again! They have come to ruin my CNN building blocks! 🥺 An Image is Worth 16x16 Words: paper explained. Is this the extinction of CNNs? Long live the Transformer?
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring....

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
Patreon:   / aicoffeebreak  
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

📺 Ms. Coffee Bean explains the TRANSFORMER:    • The Transformer neural network archit...  
📺 Ms. Coffee Bean on the Multimodal Transformer:    • Transformer combining Vision and Lang...  

Outline:
00:00 Pure Transformer for vision
01:17 How does it work?
03:58 The CNN Armageddon?

📄 Paper (not anonymous anymore): "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby

📚 Check out this wonderful post by @JacobGildenblat : https://jacobgil.github.io/deeplearni...

-----------------------------------
🔗 Links:
YouTube:    / aicoffeebreak  
Twitter:   / aicoffeebreak  
Reddit:   / aicoffeebreak  

#AICoffeeBreak #MsCoffeeBean #ComputerVision #ICLR2021 #MachineLearning #AI #research

Video contains emojis designed by OpenMoji – the open-source emoji and icon project. License: CC BY-SA 4.0

Комментарии

Информация по комментариям в разработке