Sentence Transformers and Embedding Evaluation - Nils Reimers - Talking Language AI Ep#3

Описание к видео Sentence Transformers and Embedding Evaluation - Nils Reimers - Talking Language AI Ep#3

Sentence Transformers (https://www.sbert.net/) is one of the most popular Language AI/NLP tools. Tens of thousands of users rely on it to build systems for text classification, neural/semantic search, text clustering, and other language AI tasks.

In this conversation, Nils Reimers, the creator of Sentence BERT talks about,
An introduction to the package and the Large Language Models provided in it
Lessons learned from the open-source development of such a popular package
His research collaborations on how to evaluate embeddings through works like MTEB: Massive Text Embedding Benchmark and BEIR

Bio: Nils Reimers is currently the Director and Principal Scientist of Machine Learning at Cohere. Previously, he authored several well-known research papers, including Sentence-BERT and the popular sentence-transformers library. He also worked as a Research Scientist at HuggingFace, (co-)founded several web companies and worked as an AI consultant in the area of investment banking, media, and IoT.


Join the Cohere Discord:   / discord  
Discussion thread for this episode (feel free to ask questions):
  / discord  

===
Contents
Introduction (0:00)
Nils Intro (2:19)
Neural search (2:55)
Dense Bi-encoders (6:26)
Contrastive training (8:16)
Why we need embedding benchmarks (10:07)
The predictive power of benchmarks declines over time (14:28)
Benchmarking Information Retrieval with BEIR (19:58)
Massive text embeddings benchmark (29:07)
SetFit (34:05)
Multilingual search and embeddings (40:52)
Cross-lingual search benefits and drawbacks (46:27)
Lessons from developing open source software (50:18)
The benefits and challenges of maintaining a popular open source library (54:21)

===
Resources:
Bonjour. مرحبا. Guten tag. Hola. Cohere's Multilingual Text Understanding Model is Now Available: https://txt.cohere.ai/multilingual/
Sentence Transformers: https://www.sbert.net/
SBERT Paper: https://arxiv.org/abs/1908.10084
MTEB: Massive Text Embedding Benchmark: https://arxiv.org/abs/2210.07316
BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models: https://openreview.net/forum?id=wCu6T...
SetFit - Efficient Few-shot Learning with Sentence Transformers https://github.com/huggingface/setfit

Комментарии

Информация по комментариям в разработке