Implementing semantic search with LangChain
This guide shows you how to implement semantic search using LangChain and similarity search.
In this guide, you’ll use OpenAI’s text embeddings to measure the similarity between document properties. Then, you’ll use the LangChain framework to seamlessly integrate Meilisearch and create an application with semantic search.
Requirements
This guide assumes a basic understanding of Python and LangChain. Beginners to LangChain will still find the tutorial accessible.
- Python (LangChain requires >= 3.8.1 and < 4.0) and the pip CLI
- A Meilisearch >= 1.6 project
- An OpenAI API key
Creating the application
Create a folder for your application with an empty setup.py
file.
Before writing any code, install the necessary dependencies:
First create a .env to store our credentials:
Now that you have your environment variables available, create a setup.py
file with some boilerplate code:
Importing documents and embeddings
Now that the project is ready, import some documents in Meilisearch. First, download this small movies dataset:
movies-lite.json
Download movies-lite.json
Then, update the setup.py file to load the JSON and store it in Meilisearch. You will also use the OpenAI text search models to generate vector embeddings.
To use vector search, we need to set the embedders index setting. In this case, you are using an userProvided
source which requires to specify the size of the vectors in a dimensions
field. The default model used by OpenAIEmbeddings()
is text-embedding-ada-002
, which has 1,536 dimensions.
Your Meilisearch instance will now contain your documents. Meilisearch runs tasks like document import asynchronously, so you might need to wait a bit for documents to be available. Consult the asynchronous operations explanation for more information on how tasks work.
Performing similarity search
Your database is now populated with the data from the movies dataset. Create a new search.py
file to make a semantic search query: searching for documents using similarity search.
Run search.py
. If everything is working correctly, you should see an output like this:
Congrats 🎉 You managed to make a similarity search using Meilisearch as a LangChain vector store.
Going further
Using Meilisearch as a LangChain vector store allows you to load documents and search for them in different ways:
For additional information, consult:
Finally, should you want to use Meilisearch’s vector search capabilities without LangChain or its hybrid search feature, refer to the dedicated tutorial.