RAG rests on two primitives: embeddings (turning text into vectors) and a vector store (saving those vectors and finding the nearest ones to a query). Spring AI gives you one interface for each — EmbeddingModel and VectorStore — and you choose the implementation with a dependency and config, exactly like the chat client in Part 2. This is Part 3 of the Spring AI series, and it’s the groundwork for the RAG pipeline in Part 4.
What embeddings actually are
An embedding is a list of numbers (a vector) that captures the meaning of a piece of text. Texts with similar meaning land close together in that vector space, even if they share no exact words — “reset my password” and “I can’t log in” end up near each other. That’s the whole trick behind semantic search: you compare meaning, not keywords. The number of dimensions depends on the embedding model.
1. Embeddings
Inject EmbeddingModel and call embed:
@Service
public class EmbeddingService {
private final EmbeddingModel embeddingModel;
public EmbeddingService(EmbeddingModel embeddingModel) {
this.embeddingModel = embeddingModel;
}
public List<Double> embed(String text) {
return embeddingModel.embed(text);
}
}
Use the OpenAI or Ollama embedding starter and set the model under spring.ai.openai.embedding.options.model (or the Ollama equivalent). Same pattern as chat: one interface, swap the provider in config.
Version heads-up
[needs source]: theembed(...)return type has differed across Spring AI versions (e.g.float[]vsList<Double>), as has theSearchRequestbuilder API below. Match the signatures your pinned version exposes rather than the literal types here.
2. Vector store
Implementations range from in-memory (great for tests) to Redis, Pgvector, Chroma, and Pinecone. Example with Pgvector (Postgres plus the pgvector extension):
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
</dependency>
Choosing one:
- In-memory / SimpleVectorStore — zero setup, perfect for prototypes and tests; not persistent.
- Pgvector — you already run Postgres; one extension turns it into a vector DB. Great default for most teams.
- Redis / Chroma / Pinecone — when you need dedicated scale, managed hosting, or very large indexes.
For development with no API key at all, pair a local embeddings model (e.g. the transformers starter) with an in-memory or file-based store — fully offline.
3. Add and query
vectorStore.add(List.of(
new Document("Your document text here.", Map.of("source", "doc1"))
));
List<Document> similar = vectorStore.similaritySearch(
SearchRequest.query("your query").withTopK(5)
);
You add Document objects (with optional metadata); the store uses the configured EmbeddingModel to embed them automatically. similaritySearch embeds the query the same way and returns the closest documents.
Two things to internalize:
- Metadata is leverage. Store
source,tenant,date, etc. on eachDocumentso you can later filter searches (“only this customer’s docs”) and cite where an answer came from. topKis a dial. It’s how many neighbors to return. Too few and you miss context; too many and you flood the prompt downstream. Start around 3–5 and measure.
Chunking: the part that decides quality
You rarely embed whole documents — you split them into chunks first, then embed each chunk. This matters more than your choice of vector store. Chunks that are too large dilute relevance and waste the model’s context window; too small and they lose the surrounding meaning. A few hundred tokens with a little overlap is a sane starting point. If your RAG answers feel vague later, fix chunking before anything else.
Common gotchas
- Changing the embedding model after indexing. Vectors from different models aren’t comparable — switch models and you must re-embed everything.
- No metadata. Without it you can’t filter by source/tenant or tell users where an answer came from.
- Embedding huge blobs. Whole-document vectors retrieve poorly. Chunk first.
- Assuming in-memory persists.
SimpleVectorStoreis gone on restart — use Pgvector/Redis for anything real.
FAQ
What's the difference between an embedding and a vector store?
EmbeddingModel produces vectors from text; the VectorStore saves those vectors and finds the nearest ones to a query. You need both for similarity search.
Which vector store should I pick?
Can I generate embeddings without a cloud API?
How big should my chunks be?
Key takeaway: Spring AI gives you EmbeddingModel (text → vectors) and VectorStore (store + similarity search) behind swappable implementations. Store metadata, tune topK, and treat chunking as your main quality dial. With these in place, you’re ready for the RAG pipeline in Part 4.