MultiModal Search (Text+Image) using TF MobileNet , HF SBERT in Python on Kaggle Shopee Dataset

Описание к видео MultiModal Search (Text+Image) using TF MobileNet , HF SBERT in Python on Kaggle Shopee Dataset

Multi Modal Search (Text+Image) using Tensorflow MobileNet , HuggingFace Sentence Transformers in Python on Kaggle Shopee Dataset

In this video I demonstrate how you can perform multimodal(image+text) search to find similar images+texts given a test image+text from a multimodal (texts+images) database . I use the Kaggle Shopee dataset. I use Tensorflow MobileNet CNN and hugging face sentence transformers BERT to extract image and text embeddings to create a joint embedding search space. Given an image and it text description I extract joint embedding and then use nearest neighbours algorithm to find top 5 similar images+texts description from my joint embedding search space

If you are new here and like such content please subscribe to the channel here:
https://www.youtube.com/c/RitheshSree...

If you like to support me financially, It is totally optional and voluntary.
Buy me a coffee here: https://www.buymeacoffee.com/rithesh




Github Code: https://github.com/rsreetech/MultiMod...
Python 3.6
https://www.python.org/downloads/rele...

Tensorflow 2.0 and above
https://www.tensorflow.org/install

Hugging Face transformers
https://huggingface.co/transformers/

Sentence transformers
https://www.sbert.net/

Kaggle Shopee dataset:
https://www.kaggle.com/c/shopee-produ...


References:

MobileNet : https://arxiv.org/pdf/1704.04861.pdf

BERT: http://jalammar.github.io/illustrated...

Sk-learn nearest neighbours :
https://scikit-learn.org/stable/modul... ,
https://scikit-learn.org/stable/modul... ,
https://scikit-learn.org/stable/modul...

Комментарии

Информация по комментариям в разработке