Part 3: Spring AI Embeddings and Vector Stores

RAG rests on two primitives: embeddings (turning text into vectors) and a vector store (saving those vectors and finding the nearest ones to a query). Spring AI gives you one interface for each — EmbeddingModel and VectorStore — and you choose the implementation with a dependency and config, exactly like the chat client in Part 2. This is Part 3 of the Spring AI series, and it’s the groundwork for the RAG pipeline in Part 4.

What embeddings actually are

An embedding is a list of numbers (a vector) that captures the meaning of a piece of text. Texts with similar meaning land close together in that vector space, even if they share no exact words — “reset my password” and “I can’t log in” end up near each other. That’s the whole trick behind semantic search: you compare meaning, not keywords. The number of dimensions depends on the embedding model.

1. Embeddings

Inject EmbeddingModel and call embed:

@Service
public class EmbeddingService {

    private final EmbeddingModel embeddingModel;

    public EmbeddingService(EmbeddingModel embeddingModel) {
        this.embeddingModel = embeddingModel;
    }

    public List<Double> embed(String text) {
        return embeddingModel.embed(text);
    }
}

Use the OpenAI or Ollama embedding starter and set the model under spring.ai.openai.embedding.options.model (or the Ollama equivalent). Same pattern as chat: one interface, swap the provider in config.

Version heads-up [needs source]: the embed(...) return type has differed across Spring AI versions (e.g. float[] vs List<Double>), as has the SearchRequest builder API below. Match the signatures your pinned version exposes rather than the literal types here.

2. Vector store

Implementations range from in-memory (great for tests) to Redis, Pgvector, Chroma, and Pinecone. Example with Pgvector (Postgres plus the pgvector extension):

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
</dependency>

Choosing one:

In-memory / SimpleVectorStore — zero setup, perfect for prototypes and tests; not persistent.
Pgvector — you already run Postgres; one extension turns it into a vector DB. Great default for most teams.
Redis / Chroma / Pinecone — when you need dedicated scale, managed hosting, or very large indexes.

For development with no API key at all, pair a local embeddings model (e.g. the transformers starter) with an in-memory or file-based store — fully offline.

3. Add and query

vectorStore.add(List.of(
    new Document("Your document text here.", Map.of("source", "doc1"))
));

List<Document> similar = vectorStore.similaritySearch(
    SearchRequest.query("your query").withTopK(5)
);

You add Document objects (with optional metadata); the store uses the configured EmbeddingModel to embed them automatically. similaritySearch embeds the query the same way and returns the closest documents.

Two things to internalize:

Metadata is leverage. Store source, tenant, date, etc. on each Document so you can later filter searches (“only this customer’s docs”) and cite where an answer came from.
topK is a dial. It’s how many neighbors to return. Too few and you miss context; too many and you flood the prompt downstream. Start around 3–5 and measure.

Chunking: the part that decides quality

You rarely embed whole documents — you split them into chunks first, then embed each chunk. This matters more than your choice of vector store. Chunks that are too large dilute relevance and waste the model’s context window; too small and they lose the surrounding meaning. A few hundred tokens with a little overlap is a sane starting point. If your RAG answers feel vague later, fix chunking before anything else.

Common gotchas

Changing the embedding model after indexing. Vectors from different models aren’t comparable — switch models and you must re-embed everything.
No metadata. Without it you can’t filter by source/tenant or tell users where an answer came from.
Embedding huge blobs. Whole-document vectors retrieve poorly. Chunk first.
Assuming in-memory persists. SimpleVectorStore is gone on restart — use Pgvector/Redis for anything real.

FAQ

What's the difference between an embedding and a vector store?

The EmbeddingModel produces vectors from text; the VectorStore saves those vectors and finds the nearest ones to a query. You need both for similarity search.

Which vector store should I pick?

In-memory for prototypes/tests, Pgvector if you already run Postgres (the easiest production default), and Redis/Chroma/Pinecone when you need dedicated scale or managed hosting.

Can I generate embeddings without a cloud API?

Yes — use a local embedding model (e.g. the transformers starter, or Ollama) with an in-memory or file-based store. No key, no cost, fully offline.

How big should my chunks be?

A few hundred tokens with slight overlap is a good starting point. It’s the biggest lever on retrieval quality, so tune it empirically for your content.

Key takeaway: Spring AI gives you EmbeddingModel (text → vectors) and VectorStore (store + similarity search) behind swappable implementations. Store metadata, tune topK, and treat chunking as your main quality dial. With these in place, you’re ready for the RAG pipeline in Part 4.

What embeddings actually are#

1. Embeddings#

2. Vector store#

3. Add and query#

Chunking: the part that decides quality#

Common gotchas#

FAQ#