LangChain Series: A Deep Dive into Embedding Models

Posted Jan 4, 2026

By Shivraj Badu 5 min read

LangChain Series: A Deep Dive into Embedding Models

Introduction

In our previous posts, we’ve seen how chains and agents form the backbone of LangChain applications. We’ve also touched on a critical concept that powers features like Retrieval-Augmented Generation (RAG) and semantic search: embeddings. An embedding model is a neural network that converts text (or other data types) into a dense vector of numbers. This vector captures the “meaning” or semantic content of the text, allowing us to perform powerful similarity comparisons.

This post will focus on the “E” in RAG: the Embedding Models. We’ll explore how LangChain provides a simple, unified interface for interacting with dozens of different embedding models, making it easy to integrate them into your applications and even swap them out with minimal code changes.

The Standard `Embeddings` Interface

One of LangChain’s most powerful design principles is its use of standardized interfaces. For embedding models, this means that whether you’re using a model from OpenAI, Hugging Face, Cohere, or another provider, LangChain wraps it in a class that behaves in a consistent way.

This standard Embeddings class has two core methods you need to know:

embed_query(text: str) -> List[float]: This method takes a single string of text and returns a single embedding vector. It’s typically used for embedding a user’s search query right before performing a similarity search.
embed_documents(texts: List[str]) -> List[List[float]]: This method takes a list of strings and returns a list of embeddings, one for each document. It’s designed for efficiency, as it can often process multiple documents in a single batch request to the underlying model provider. This is the method you’d use when indexing a large knowledge base.

This standardization is a huge benefit. It means you can write your application’s logic once and then easily experiment with different embedding models to see which one performs best for your specific use case.

Initializing Embedding Models

While LangChain doesn’t have a single init_embeddings function, the concept you proposed is a perfect way to understand the different patterns for initializing these models. In practice, you will import a specific class for each provider, such as OpenAIEmbeddings or HuggingFaceEmbeddings.

Let’s explore these patterns using your conceptual init_embeddings helper as a guide, and show the actual LangChain code for each.

Method 1: Using a Model String

Conceptually, you might imagine identifying a model with a single string that combines the provider and model name.

Conceptual Code:

# init_embeddings("openai:text-embedding-3-small")

This is a clean pattern that encapsulates all the necessary information.

Actual LangChain Code: In LangChain, you achieve this by importing the specific provider class and passing the model name to its constructor.

        
      
# pip install langchain-openai
from langchain_openai import OpenAIEmbeddings

# Initialize the model
model = OpenAIEmbeddings(model="text-embedding-3-small")

# Embed a single query
vector = model.embed_query("Hello, world!")
print(f"Vector dimension: {len(vector)}")

Method 2: Using Explicit Provider and Model Arguments

Another way to think about it is by explicitly separating the provider and the model.

Conceptual Code:

# init_embeddings(model="text-embedding-3-small", provider="openai")

Actual LangChain Code: This translates to choosing the correct class to import based on the provider. The core logic remains the same, which highlights the consistency of the LangChain interface. Let’s look at OpenAI and then conceptually at another provider like Cohere.

        
      
from langchain_openai import OpenAIEmbeddings
# from langchain_cohere import CohereEmbeddings # Example for another provider

# OpenAI
openai_model = OpenAIEmbeddings(model="text-embedding-3-small")
documents = ["Hello, world!", "Goodbye, world!"]
openai_vectors = openai_model.embed_documents(documents)
print(f"OpenAI embedded {len(openai_vectors)} documents.")

# Cohere (Conceptual Example)
# cohere_model = CohereEmbeddings(model="embed-english-v3.0")
# cohere_vectors = cohere_model.embed_documents(documents)
# print(f"Cohere embedded {len(cohere_vectors)} documents.")

Method 3: Passing Additional Parameters

You often need to pass extra parameters to the model, such as an API key, endpoint URL, or other configuration options.

Conceptual Code:

# init_embeddings("openai:text-embedding-3-small", api_key="sk-...")

Actual LangChain Code: Any additional keyword arguments you pass to the LangChain Embeddings class constructor are passed directly to the underlying client for that provider. This makes it easy to configure authentication, timeouts, and other settings.

        
      
from langchain_openai import OpenAIEmbeddings

# The api_key is passed directly to the underlying OpenAI client
model = OpenAIEmbeddings(
    model="text-embedding-3-small",
    api_key="sk-..." # Replace with your actual key if not using .env
)

vector = model.embed_query("This is a test.")

A Practical Example

Let’s tie this all together in a complete, runnable script that demonstrates the process.

        
      
import os
from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings

# 1. Load environment variables from .env
load_dotenv()

# 2. Initialize the OpenAI embedding model
# LangChain will automatically use the OPENAI_API_KEY from your .env file
embeddings_model = OpenAIEmbeddings(model="text-embedding-3-small")

# 3. Define a list of documents to embed
documents = [
    "The cat sat on the mat.",
    "The dog chased the ball.",
    "Photosynthesis is the process used by plants to convert light energy.",
]

# 4. Embed the documents
document_vectors = embeddings_model.embed_documents(documents)

print(f"Successfully embedded {len(document_vectors)} documents.")
print(f"Dimension of the first vector: {len(document_vectors[0])}")

# 5. Define a user query to embed
user_query = "What is a feline?"

# 6. Embed the user query
query_vector = embeddings_model.embed_query(user_query)

print(f"\nSuccessfully embedded the query.")
print(f"Dimension of the query vector: {len(query_vector)}")

Conclusion

LangChain’s standardized Embeddings interface is a cornerstone of its design. It abstracts away the provider-specific details, allowing you to focus on your application’s logic. This makes it incredibly easy to get started with any embedding model and to experiment with different ones as you optimize your application for performance, cost, and accuracy.

AI, Machine Learning, LangChain

langchain embeddings python

This post is licensed under CC BY 4.0 by the author.

LangChain Series: A Deep Dive into Embedding Models

Introduction

The Standard Embeddings Interface

Initializing Embedding Models

Method 1: Using a Model String

Method 2: Using Explicit Provider and Model Arguments

Method 3: Passing Additional Parameters

A Practical Example

Conclusion

Trending Tags

The Standard `Embeddings` Interface