Couchbase
Couchbase is an award-winning distributed NoSQL cloud database that delivers unmatched versatility, performance, scalability, and financial value for all of your cloud, mobile, AI, and edge computing applications. Couchbase embraces AI with coding assistance for developers and vector search for their applications.
Vector Search is a part of the Full Text Search Service (Search Service) in Couchbase.
This tutorial explains how to use Vector Search in Couchbase. You can work with either Couchbase Capella and your self-managed Couchbase Server.
Setup
To access the CouchbaseVectorStore
you first need to install the langchain-couchbase
partner package:
pip install -qU langchain-couchbase
Credentials
Head over to the Couchbase website and create a new connection, making sure to save your database username and password:
import getpass
COUCHBASE_CONNECTION_STRING = getpass.getpass(
"Enter the connection string for the Couchbase cluster: "
)
DB_USERNAME = getpass.getpass("Enter the username for the Couchbase cluster: ")
DB_PASSWORD = getpass.getpass("Enter the password for the Couchbase cluster: ")
If you want to get best in-class automated tracing of your model calls you can also set your LangSmith API key by uncommenting below:
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()
Initialization
Before instantiating we need to create a connection.
Create Couchbase Connection Object
We create a connection to the Couchbase cluster initially and then pass the cluster object to the Vector Store.
Here, we are connecting using the username and password from above. You can also connect using any other supported way to your cluster.
For more information on connecting to the Couchbase cluster, please check the documentation.
from datetime import timedelta
from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions
auth = PasswordAuthenticator(DB_USERNAME, DB_PASSWORD)
options = ClusterOptions(auth)
cluster = Cluster(COUCHBASE_CONNECTION_STRING, options)
# Wait until the cluster is ready for use.
cluster.wait_until_ready(timedelta(seconds=5))
We will now set the bucket, scope, and collection names in the Couchbase cluster that we want to use for Vector Search.
For this example, we are using the default scope & collections.
BUCKET_NAME = "langchain_bucket"
SCOPE_NAME = "_default"
COLLECTION_NAME = "default"
SEARCH_INDEX_NAME = "langchain-test-index"
For details on how to create a Search index with support for Vector fields, please refer to the documentation.
Simple Instantiation
Below, we create the vector store object with the cluster information and the search index name.
pip install -qU langchain-openai
import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
from langchain_couchbase.vectorstores import CouchbaseVectorStore
vector_store = CouchbaseVectorStore(
cluster=cluster,
bucket_name=BUCKET_NAME,
scope_name=SCOPE_NAME,
collection_name=COLLECTION_NAME,
embedding=embeddings,
index_name=SEARCH_INDEX_NAME,
)
Specify the Text & Embeddings Field
You can optionally specify the text & embeddings field for the document using the text_key
and embedding_key
fields.
vector_store_specific = CouchbaseVectorStore(
cluster=cluster,
bucket_name=BUCKET_NAME,
scope_name=SCOPE_NAME,
collection_name=COLLECTION_NAME,
embedding=embeddings,
index_name=SEARCH_INDEX_NAME,
text_key="text",
embedding_key="embedding",
)
Manage vector store
Once you have created your vector store, we can interact with it by adding and deleting different items.
Add items to vector store
We can add items to our vector store by using the add_documents
function.
from uuid import uuid4
from langchain_core.documents import Document
document_1 = Document(
page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
metadata={"source": "tweet"},
)
document_2 = Document(
page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
metadata={"source": "news"},
)
document_3 = Document(
page_content="Building an exciting new project with LangChain - come check it out!",
metadata={"source": "tweet"},
)
document_4 = Document(
page_content="Robbers broke into the city bank and stole $1 million in cash.",
metadata={"source": "news"},
)
document_5 = Document(
page_content="Wow! That was an amazing movie. I can't wait to see it again.",
metadata={"source": "tweet"},
)
document_6 = Document(
page_content="Is the new iPhone worth the price? Read this review to find out.",
metadata={"source": "website"},
)
document_7 = Document(
page_content="The top 10 soccer players in the world right now.",
metadata={"source": "website"},
)
document_8 = Document(
page_content="LangGraph is the best framework for building stateful, agentic applications!",
metadata={"source": "tweet"},
)
document_9 = Document(
page_content="The stock market is down 500 points today due to fears of a recession.",
metadata={"source": "news"},
)
document_10 = Document(
page_content="I have a bad feeling I am going to get deleted :(",
metadata={"source": "tweet"},
)
documents = [
document_1,
document_2,
document_3,
document_4,
document_5,
document_6,
document_7,
document_8,
document_9,
document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]
vector_store.add_documents(documents=documents, ids=uuids)
Delete items from vector store
vector_store.delete(ids=[uuids[-1]])
Query vector store
Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.
Query directly
Similarity search
Performing a simple similarity search can be done as follows:
results = vector_store.similarity_search(
"LangChain provides abstractions to make working with LLMs easy",
k=2,
)
for res in results:
print(f"* {res.page_content} [{res.metadata}]")
Similarity search with Score
You can also fetch the scores for the results by calling the similarity_search_with_score
method.
results = vector_store.similarity_search_with_score("Will it be hot tomorrow?", k=1)
for res, score in results:
print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")
Specifying Fields to Return
You can specify the fields to return from the document using fields
parameter in the searches. These fields are returned as part of the metadata
object in the returned Document. You can fetch any field that is stored in the Search index. The text_key
of the document is returned as part of the document's page_content
.
If you do not specify any fields to be fetched, all the fields stored in the index are returned.
If you want to fetch one of the fields in the metadata, you need to specify it using .
For example, to fetch the source
field in the metadata, you need to specify metadata.source
.
query = "What did I eat for breakfast today?"
results = vector_store.similarity_search(query, fields=["metadata.source"])
print(results[0])