This guide shows you how to implement semantic search using LangChain and similarity search.
In this guide, you’ll use OpenAI’s text embeddings to measure the similarity between document properties. Then, you’ll use the LangChain framework to seamlessly integrate Meilisearch and create an application with semantic search.
# .envMEILI_HTTP_ADDR="your Meilisearch host"MEILI_API_KEY="your Meilisearch API key"OPENAI_API_KEY="your OpenAI API key"
Now that you have your environment variables available, create a setup.py file with some boilerplate code:
Copy
# setup.pyimport osfrom dotenv import load_dotenv # remove if not using dotenvfrom langchain.vectorstores import Meilisearchfrom langchain.embeddings.openai import OpenAIEmbeddingsfrom langchain.document_loaders import JSONLoaderload_dotenv() # remove if not using dotenv# exit if missing env varsif "MEILI_HTTP_ADDR" not in os.environ: raise Exception("Missing MEILI_HTTP_ADDR env var")if "MEILI_API_KEY" not in os.environ: raise Exception("Missing MEILI_API_KEY env var")if "OPENAI_API_KEY" not in os.environ: raise Exception("Missing OPENAI_API_KEY env var")# Setup code will go here 👇
Then, update the setup.py file to load the JSON and store it in Meilisearch. You will also use the OpenAI text search models to generate vector embeddings.
To use vector search, we need to set the embedders index setting. In this case, you are using an userProvided source which requires to specify the size of the vectors in a dimensions field. The default model used by OpenAIEmbeddings() is text-embedding-ada-002, which has 1,536 dimensions.
Your Meilisearch instance will now contain your documents. Meilisearch runs tasks like document import asynchronously, so you might need to wait a bit for documents to be available. Consult the asynchronous operations explanation for more information on how tasks work.
Your database is now populated with the data from the movies dataset. Create a new search.py file to make a semantic search query: searching for documents using similarity search.
Copy
# search.pyimport osfrom dotenv import load_dotenvfrom langchain.vectorstores import Meilisearchfrom langchain.embeddings.openai import OpenAIEmbeddingsimport meilisearchload_dotenv()# You can use the same code as `setup.py` to check for missing env vars# Create the vector storeclient = meilisearch.Client( url=os.environ.get("MEILI_HTTP_ADDR"), api_key=os.environ.get("MEILI_API_KEY"),)embeddings = OpenAIEmbeddings()vector_store = Meilisearch(client=client, embedding=embeddings)# Make similarity searchembedder_name = "custom"query = "superhero fighting evil in a city at night"results = vector_store.similarity_search( query=query, embedder_name=embedder_name, k=3,)# Display resultsfor result in results: print(result.page_content)
Run search.py. If everything is working correctly, you should see an output like this:
Copy
{"id": 155, "title": "The Dark Knight", "overview": "Batman raises the stakes in his war on crime. With the help of Lt. Jim Gordon and District Attorney Harvey Dent, Batman sets out to dismantle the remaining criminal organizations that plague the streets. The partnership proves to be effective, but they soon find themselves prey to a reign of chaos unleashed by a rising criminal mastermind known to the terrified citizens of Gotham as the Joker."}{"id": 314, "title": "Catwoman", "overview": "Liquidated after discovering a corporate conspiracy, mild-mannered graphic artist Patience Phillips washes up on an island, where she's resurrected and endowed with the prowess of a cat -- and she's eager to use her new skills ... as a vigilante. Before you can say \"cat and mouse,\" handsome gumshoe Tom Lone is on her tail."}{"id": 268, "title": "Batman", "overview": "Batman must face his most ruthless nemesis when a deformed madman calling himself \"The Joker\" seizes control of Gotham's criminal underworld."}
Congrats 🎉 You managed to make a similarity search using Meilisearch as a LangChain vector store.
Finally, should you want to use Meilisearch’s vector search capabilities without LangChain or its hybrid search feature, refer to the dedicated tutorial.