Bo Wang, Isabelle Mohr – Jina Embeddings V2: From Raw Data to Bilingual Hybrid Search

Описание к видео Bo Wang, Isabelle Mohr – Jina Embeddings V2: From Raw Data to Bilingual Hybrid Search

In this talk, we explore the sophisticated design, training, and application of bilingual Jina Embeddings V2, the state-of-the-art German-English embedding model crafted here in Berlin. Acknowledging the inherent shortcomings of traditional exact match and term-based retrieval methods, we dive into the application of this bilingual model in a hybrid search setup. By combining vector-based search with conventional BM25 search, we harness the strengths of both approaches, leading to a marked enhancement in search results. This discussion is therefore highly relevant to anyone in the search field. Participants gain insights into the training processes of embedding models, the methodologies for sourcing and preparing data for these models, and the straightforward integration of our open-source German-English bilingual model into a search pipeline to enhance results. This talk is aimed at those keen on the latest in search and retrieval technologies, offering practical knowledge on improving search systems through the use of embeddings.

Speakers: Bo Wang, Isabelle Mohr

More: https://2024.berlinbuzzwords.de/sessi...

###

Follow us on Social Media and join the Community!
Mastodon: https://floss.social/@berlinbuzzwords
LinkedIn:   / berlin-buzzwords  
Instagram:   / berlinbuzzwords  

Website: https://2024.berlinbuzzwords.de

Berlin Buzzwords is an event by Plain Schwarz – https://plainschwarz.com

Комментарии

Информация по комментариям в разработке