Google DeepMind Released PaliGemma 2: A New Family of Open-Weight Vision Language Models

Описание к видео Google DeepMind Released PaliGemma 2: A New Family of Open-Weight Vision Language Models

Google DeepMind Just Released PaliGemma 2: A New Family of Open-Weight Vision Language Models (3B, 10B and 28B) recently introduced the PaliGemma 2 series, a new family of Vision-Language Models (VLMs) with parameter sizes of 3 billion (3B), 10 billion (10B), and 28 billion (28B). The models support resolutions of 224×224, 448×448, and 896×896 pixels. This release includes nine pre-trained models with different combinations of sizes and resolutions, making them versatile for a variety of use cases. Two of these models are also fine-tuned on the DOCCI dataset, which contains image-text caption pairs, and support parameter sizes of 3B and 10B at a resolution of 448×448 pixels. Since these models are open-weight, they can be easily adopted as a direct replacement or upgrade for the original PaliGemma, offering users more flexibility for transfer learning and fine-tuning....

Read the full article here: https://www.marktechpost.com/2024/12/...

Paper: https://arxiv.org/abs/2412.03555

Models on Hugging Face: https://huggingface.co/collections/go...

Audio Created by NotebookLLM and reviewed by real human

👉 Don’t Forget to join our 55k+ ML SubReddit:   / machinelearningnews  

⚓ Feel free to subscribe to our AI Research Newsletter read by 30k+ AI and Data Professionals: https://airesearchinsights.com/subscribe

‪@Google_DeepMind‬ #artificialintelligence #computervision #ai

Комментарии

Информация по комментариям в разработке