Making Unstructured Data Ready for RAG with Unstructured.io and Elasticsearch

Описание к видео Making Unstructured Data Ready for RAG with Unstructured.io and Elasticsearch

Unstructured data holds immense potential, especially for Generative AI (GenAI) applications such as Retrieval-Augmented Generation (RAG). However, the diverse array of unstructured file types and sources presents unique preprocessing challenges that are difficult to accommodate for with traditional ETL.

In this talk you’ll learn about Unstructured.io, a solution for preprocessing data from 25 different unstructured file types, including PDFs, PowerPoint presentations, markdown files, emails, and more. We’ll talk about extracting valuable textual data and associated metadata from these varied formats into a standardized JSON structure. This processed data can then be easily integrated into Elasticsearch for advanced analysis or use in RAG applications. By the end of the talk, you will know how to organize your data preprocessing to enable better use of unstructured data for RAG and more.

Speaker: Maria Khalusova, Developer Advocate at Unstructured.io

Timestamps:
0:00 Introduction
1:17 Unstructured data overview
3:58 Challenges with ETL
6:08 Unstructure.io overview
8:00 How Unstructure.io works with unstructured data
9:48 Unstructure.io partitioning strategies
11:45 Under the hood
16:15 Behind the scenes: Unstructure.io & Elasticsearch

Make sure to join your local Elastic User Group to stay up-to-date on upcoming meetups: https://community.elastic.co/

Questions? Check out https://discuss.elastic.co/
Connect with the Elastic community through Slack: https://ela.st/slack

#RAG #Elasticsearch #GenAI #UnstructuredData #Elastic #TechTalk

Комментарии

Информация по комментариям в разработке