With Spring AI on the classpath, calling a chat model comes down to three things: add the right starter, set an API key or base URL in config, and inject ChatClient. The same Java code then works whether you’re hitting OpenAI in the cloud or a local Ollama model on your laptop — you swap the dependency and the config, not the logic. This is Part 2 of the Spring AI series.
1. Dependencies
For OpenAI:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-open-ai-spring-boot-starter</artifactId>
</dependency>
For Ollama (local, no API key):
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>
Version heads-up
[needs source]: starter artifact IDs have changed across Spring AI versions (e.g. the-spring-boot-startersuffix). Check the exact coordinates for the version you pinned in Part 1.
You can even include both starters and choose at runtime — handy for “Ollama in dev, OpenAI in prod.”
2. Configuration
OpenAI in application.yml — note the key comes from an environment variable, never hardcoded:
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4o-mini
Ollama (default base URL is http://localhost:11434):
spring:
ai:
ollama:
base-url: http://localhost:11434
chat:
options:
model: llama3.2
The options block is where you set per-request defaults like model and temperature. Anything you set here applies globally; you can override per call in code.
3. Use the client
Inject ChatClient.Builder, build once, and call:
@Service
public class ChatService {
private final ChatClient chatClient;
public ChatService(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
public String ask(String userMessage) {
return chatClient.prompt()
.user(userMessage)
.call()
.content();
}
}
ChatClient is provider-agnostic: this exact code runs against OpenAI or Ollama. Switch by changing the starter and config — the service doesn’t know or care which model answered.
System prompts and parameters
Real prompts usually set a system message (the model’s “role”) and tune parameters. The fluent API handles both:
public String ask(String userMessage) {
return chatClient.prompt()
.system("You are a terse senior engineer. Answer in at most three sentences.")
.user(userMessage)
.call()
.content();
}
A clear system prompt is the cheapest quality lever you have — it shapes tone, format, and guardrails before the user ever types anything. For tips on writing them, see prompt engineering that actually works.
Streaming responses
For chat UIs you don’t want to wait for the whole answer. Use stream() instead of call() and consume a reactive Flux of tokens:
public Flux<String> askStreaming(String userMessage) {
return chatClient.prompt()
.user(userMessage)
.stream()
.content();
}
Return that Flux from a controller (or push it over Server-Sent Events) and the response renders token-by-token, exactly like the chat apps you’ve used.
Structured output
Need JSON or a typed object back instead of a string? Spring AI can map the model’s response straight onto a Java record via its structured-output support (.entity(MyRecord.class)), so you skip manual parsing. The exact method name varies by version [needs source], but the capability is there — lean on it instead of regexing model output.
Common gotchas
- Hardcoded API keys. Use
${OPENAI_API_KEY}and an env var. A key in source is a key in your git history forever. - Wrong/unavailable model name.
gpt-4o-miniorllama3.2must exist for your provider/account; for Ollama you mustollama pullthe model first. - Ollama not running. The Ollama starter expects the daemon at
localhost:11434— start it before your app. - Blocking on
stream(). The streaming API returns aFlux; consume it reactively, don’t.block()it back into a string and lose the point.
FAQ
Can I switch from OpenAI to Ollama without changing code?
application.yml config; your ChatClient code stays identical.
How do I set temperature or max tokens?
application.yml under the provider’s chat.options, or per call via the prompt builder’s options. Config sets the default; code overrides it for a specific request.
How do I stream the response token by token?
.stream().content() instead of .call().content(). It returns a Flux<String> you can return from a controller or push over SSE for a live-typing UI.
Can the model return a typed Java object?
Key takeaway: Spring AI chat completions = a starter + application.yml config + an injected ChatClient. The same code targets OpenAI or Ollama; add a system prompt for quality, use stream() for live UIs, and keep keys in env vars. Next: embeddings and vector stores, the foundation for RAG.