Multimodal Retrieval Augmented Generation RAG using the Vertex AI Gemini API GSP1231

Описание к видео Multimodal Retrieval Augmented Generation RAG using the Vertex AI Gemini API GSP1231

Overview
Gemini is a family of generative AI models developed by Google DeepMind that is designed for multimodal use cases. The Gemini API gives you access to the Gemini Pro Vision and Gemini Pro models.

Retrieval augmented generation (RAG) has become a popular paradigm for enabling LLMs to access external data and also as a mechanism for grounding to mitigate against hallucinations. RAG models are trained to retrieve relevant documents from a large corpus and then generate a response based on the retrieved documents. In this lab, you learn how to perform multimodal RAG where you perform Q&A over a financial document filled with both text and images.

Comparing text-based and multimodal RAG
Multimodal RAG offers several advantages over text-based RAG:

1. Enhanced knowledge access: Multimodal RAG can access and process both textual and visual information, providing a richer and more comprehensive knowledge base for the LLM.
2. Improved reasoning capabilities: By incorporating visual cues, multimodal RAG can make better informed inferences across different types of data modalities.

This lab shows you how to use RAG with the Vertex AI Gemini API, text embeddings, and multimodal embeddings, to build a document search engine.

Objectives
In this lab, you learn how to:
- Extract and store metadata of documents containing both text and images, and generate embeddings the documents.
- Search the metadata with text queries to find similar text or images.
- Search the metadata with image queries to find similar images.
- Using a text query as input, search for contextual answers using both text and images.
#gcp #googlecloud #qwiklabs #learntoearn

Комментарии

Информация по комментариям в разработке