Medical Search Engine with SPLADE + Sentence Transformers in Python

Описание к видео Medical Search Engine with SPLADE + Sentence Transformers in Python

In this video, we'll build a search engine for the medical field using hybrid search with NLP information retrieval models.

We use hybrid search with sentence transformers and SPLADE for medical quesiton-answering. By using hybrid search we're able to search using both dense and sparse vectors. This allows us to cover semantics with the dense vectors, and features like exact matching and keyword search with the sparse vectors.

For the sparse vectors we use SPLADE. SPLADE is the first sparse embedding method to outperform BM25 across a variety of tasks. It's an incredibly powerful technique that enables the typical sparse search advantages while also enabling learning term expansion to help minimize the vocabulary mismatch problem.

The demo we work through here uses SPLADE and a sentence transformer model trained on MS-MARCO. These are all implemented via Hugging Face transformers.

Finally, for the search component we use the Pinecone vector database. The only vector DB at the time of writing that natively supports SPLADE vectors.

 🔗 Code notebook:
https://github.com/pinecone-io/exampl...

🎙️ AI Dev Studio:
https://aurelio.ai/

🎉 Subscribe for Article and Video Updates!
  / subscribe  
  / membership  

👾 Discord:
  / discord  

00:00 Hybrid search for medical field
00:18 Hybrid search process
02:42 Prerequisites and Installs
03:26 Pubmed QA data preprocessing step
08:25 Creating dense vectors with sentence-transformers
10:30 Creating sparse vector embeddings with SPLADE
18:12 Preparing sparse-dense format for Pinecone
21:02 Creating the Pinecone sparse-dense index
24:25 Making hybrid search queries
29:59 Final thoughts on sparse-dense with SPLADE

#artificialintelligence #nlp #naturallanguageprocessing #machinelearning #searchengine

Комментарии

Информация по комментариям в разработке