[UPDATED] ViViT & NaViT papers: How Sora encoded space-time patches | Shawn's ML Notes

Описание к видео [UPDATED] ViViT & NaViT papers: How Sora encoded space-time patches | Shawn's ML Notes

Update on April 8th, 2023:
- Fixed missing narration on slide 25
- Added explanation for accuracy increase from upsampling (thanks to ‪@ryuku4966‬!)
- Amplified audio track

Original video (archived):    • [ARCHIVED] ViViT & NaViT papers: How ...  

--

Thank you for checking out my video notes on ViViT & NaViT papers: how Sora encoded space-time patches! I would love to share my ML learning journey with you.

Paper information:
- Arnab, Anurag, et al. "Vivit: A video vision transformer." Proceedings of the IEEE/CVF international conference on computer vision. 2021.
- Dehghani, Mostafa, et al. "Patch n’pack: Navit, a vision transformer for any aspect ratio and resolution." Advances in Neural Information Processing Systems 36 (2024).

Please let me know in the comment section regarding any questions, points of discussion, or anything you would like see next. See you in the next video!

Комментарии

Информация по комментариям в разработке