[{"content":"A Spring AI RAG pipeline lets an LLM answer using your documents instead of just its training data. RAG — retrieval-augmented generation — is three moves: take the user\u0026rsquo;s question, find the most relevant chunks from your data (using embeddings and a vector store), and send those chunks plus the question to the model. The model answers grounded in what you gave it. Spring AI\u0026rsquo;s ChatClient and document/vector abstractions make this surprisingly little code. This is Part 4 of the Spring AI series and it ties the previous parts together.\nWhy RAG instead of fine-tuning? Before writing code, know why you\u0026rsquo;re here. RAG keeps the model fixed and injects fresh context at query time, so you update knowledge by re-indexing documents — no retraining. That makes it ideal for changing internal docs, FAQs, and product data. (For the full trade-off, see To RAG or to Fine-Tune?.) Fine-tuning changes behavior and style; RAG changes what the model knows right now. Most production apps start with RAG.\n1. Assemble the pipeline You need three pieces:\nA vector store preloaded with embedded documents (from Part 3 on embeddings and vector stores). A ChatClient to call the model (from Part 2 on chat completions). A prompt with a placeholder for the retrieved context. The flow at runtime is always the same shape: embed the query → similarity search → stuff the top chunks into the prompt → generate.\n2. Retrieve and prompt @Service public class RagService { private final VectorStore vectorStore; private final ChatClient chatClient; public RagService(VectorStore vectorStore, ChatClient.Builder chatBuilder) { this.vectorStore = vectorStore; this.chatClient = chatBuilder.build(); } public String ask(String question) { var similar = vectorStore.similaritySearch( SearchRequest.query(question).withTopK(5) ); String context = similar.stream() .map(Document::getContent) .collect(Collectors.joining(\u0026#34;\\n\\n\u0026#34;)); return chatClient.prompt() .user(u -\u0026gt; u.text(\u0026#34;\u0026#34;\u0026#34; Answer based only on the following context. If the answer is not in the context, say so. Context: {context} Question: {question} \u0026#34;\u0026#34;\u0026#34;) .param(\u0026#34;context\u0026#34;, context) .param(\u0026#34;question\u0026#34;, question)) .call() .content(); } } Version heads-up [needs source]: the SearchRequest API has shifted across Spring AI releases (the fluent SearchRequest.query(...).withTopK(...) vs. a SearchRequest.builder()...build() style). Match whatever your dependency version exposes. Spring AI also offers a built-in QuestionAnswerAdvisor that does retrieve-and-augment for you — handy once you outgrow the manual version above.\n3. What you get The user asks a question; you search the vector store, concatenate the top chunks into context, and pass both into the prompt. The reply is grounded in your documents instead of the model\u0026rsquo;s general training. From here you can tune topK, add metadata filters (e.g. only search a given tenant or document set), or move the prompt into one of Spring AI\u0026rsquo;s resource-based templates for consistency across services.\nThe instruction \u0026ldquo;answer based only on the context, and say so if it\u0026rsquo;s not there\u0026rdquo; matters more than it looks — it\u0026rsquo;s your main lever against hallucination. Without it, the model happily fills gaps from training data.\n4. Preloading the store (ingestion) A pipeline is only as good as what\u0026rsquo;s in the store. Ingestion is a one-time (or scheduled) job:\nRead files with a document reader (PDF, text, Markdown, etc.). Split them into chunks with a text splitter — chunks that are too big dilute relevance and blow your context budget; too small lose meaning. A few hundred tokens with slight overlap is a sane starting point. Embed and store — vectorStore.add(documents) uses the configured EmbeddingModel to vectorize and persist each chunk. Run that once at startup or via an admin job, and every question afterward flows through the clean retrieve → augment → generate sequence.\nCommon pitfalls Bad chunking. This is the single biggest quality lever. If answers feel vague, fix chunk size and overlap before touching the model. Retrieving too much. A huge topK floods the prompt, costs more tokens, and can lower answer quality. Start small (3–5) and measure. No grounding instruction. Always tell the model to answer only from context — otherwise RAG and hallucination coexist. Embedding/model mismatch on re-index. If you change the embedding model, you must re-embed everything; old and new vectors aren\u0026rsquo;t comparable. Ignoring metadata. Storing source/tenant/date as metadata lets you filter and cite — skip it and you can\u0026rsquo;t tell users where an answer came from. FAQ What\u0026#39;s the minimum I need for RAG in Spring Boot? An EmbeddingModel, a VectorStore (even an in-memory one for dev), and a ChatClient. That\u0026rsquo;s it — everything above is built on those three beans. Do I need a dedicated vector database? No. In-memory works for prototypes; Pgvector, Redis, Chroma, or Pinecone are for production scale and persistence. Swap via dependency + config. How do I stop the model from making things up? Ground it: instruct it to answer only from the retrieved context, and improve retrieval quality (chunking, topK, filters). RAG reduces hallucination but doesn\u0026rsquo;t eliminate it. Can Spring AI handle retrieval for me? Yes — the QuestionAnswerAdvisor wires retrieval into the ChatClient so you don\u0026rsquo;t assemble the prompt by hand. The manual version here is worth understanding first. Key takeaway: A Spring AI RAG pipeline is retrieve → augment → generate over three beans — EmbeddingModel, VectorStore, ChatClient. Quality lives in ingestion (chunking) and a strict grounding instruction, not in clever model settings.\nThis wraps the Spring AI series: Intro → Chat Completions → Embeddings \u0026amp; Vector Stores → RAG.\n","permalink":"https://coderboi.com/posts/spring-ai-rag-pipeline/","summary":"\u003cp\u003eA Spring AI RAG pipeline lets an LLM answer using \u003cem\u003eyour\u003c/em\u003e documents instead of just its training data. RAG — retrieval-augmented generation — is three moves: take the user\u0026rsquo;s question, find the most relevant chunks from your data (using embeddings and a vector store), and send those chunks plus the question to the model. The model answers grounded in what you gave it. Spring AI\u0026rsquo;s \u003ccode\u003eChatClient\u003c/code\u003e and document/vector abstractions make this surprisingly little code. This is Part 4 of the Spring AI series and it ties the previous parts together.\u003c/p\u003e","title":"Part 4: Spring AI Building a RAG Pipeline"},{"content":"RAG rests on two primitives: embeddings (turning text into vectors) and a vector store (saving those vectors and finding the nearest ones to a query). Spring AI gives you one interface for each — EmbeddingModel and VectorStore — and you choose the implementation with a dependency and config, exactly like the chat client in Part 2. This is Part 3 of the Spring AI series, and it\u0026rsquo;s the groundwork for the RAG pipeline in Part 4.\nWhat embeddings actually are An embedding is a list of numbers (a vector) that captures the meaning of a piece of text. Texts with similar meaning land close together in that vector space, even if they share no exact words — \u0026ldquo;reset my password\u0026rdquo; and \u0026ldquo;I can\u0026rsquo;t log in\u0026rdquo; end up near each other. That\u0026rsquo;s the whole trick behind semantic search: you compare meaning, not keywords. The number of dimensions depends on the embedding model.\n1. Embeddings Inject EmbeddingModel and call embed:\n@Service public class EmbeddingService { private final EmbeddingModel embeddingModel; public EmbeddingService(EmbeddingModel embeddingModel) { this.embeddingModel = embeddingModel; } public List\u0026lt;Double\u0026gt; embed(String text) { return embeddingModel.embed(text); } } Use the OpenAI or Ollama embedding starter and set the model under spring.ai.openai.embedding.options.model (or the Ollama equivalent). Same pattern as chat: one interface, swap the provider in config.\nVersion heads-up [needs source]: the embed(...) return type has differed across Spring AI versions (e.g. float[] vs List\u0026lt;Double\u0026gt;), as has the SearchRequest builder API below. Match the signatures your pinned version exposes rather than the literal types here.\n2. Vector store Implementations range from in-memory (great for tests) to Redis, Pgvector, Chroma, and Pinecone. Example with Pgvector (Postgres plus the pgvector extension):\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.springframework.ai\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;spring-ai-pgvector-store-spring-boot-starter\u0026lt;/artifactId\u0026gt; \u0026lt;/dependency\u0026gt; Choosing one:\nIn-memory / SimpleVectorStore — zero setup, perfect for prototypes and tests; not persistent. Pgvector — you already run Postgres; one extension turns it into a vector DB. Great default for most teams. Redis / Chroma / Pinecone — when you need dedicated scale, managed hosting, or very large indexes. For development with no API key at all, pair a local embeddings model (e.g. the transformers starter) with an in-memory or file-based store — fully offline.\n3. Add and query vectorStore.add(List.of( new Document(\u0026#34;Your document text here.\u0026#34;, Map.of(\u0026#34;source\u0026#34;, \u0026#34;doc1\u0026#34;)) )); List\u0026lt;Document\u0026gt; similar = vectorStore.similaritySearch( SearchRequest.query(\u0026#34;your query\u0026#34;).withTopK(5) ); You add Document objects (with optional metadata); the store uses the configured EmbeddingModel to embed them automatically. similaritySearch embeds the query the same way and returns the closest documents.\nTwo things to internalize:\nMetadata is leverage. Store source, tenant, date, etc. on each Document so you can later filter searches (\u0026ldquo;only this customer\u0026rsquo;s docs\u0026rdquo;) and cite where an answer came from. topK is a dial. It\u0026rsquo;s how many neighbors to return. Too few and you miss context; too many and you flood the prompt downstream. Start around 3–5 and measure. Chunking: the part that decides quality You rarely embed whole documents — you split them into chunks first, then embed each chunk. This matters more than your choice of vector store. Chunks that are too large dilute relevance and waste the model\u0026rsquo;s context window; too small and they lose the surrounding meaning. A few hundred tokens with a little overlap is a sane starting point. If your RAG answers feel vague later, fix chunking before anything else.\nCommon gotchas Changing the embedding model after indexing. Vectors from different models aren\u0026rsquo;t comparable — switch models and you must re-embed everything. No metadata. Without it you can\u0026rsquo;t filter by source/tenant or tell users where an answer came from. Embedding huge blobs. Whole-document vectors retrieve poorly. Chunk first. Assuming in-memory persists. SimpleVectorStore is gone on restart — use Pgvector/Redis for anything real. FAQ What\u0026#39;s the difference between an embedding and a vector store? The EmbeddingModel produces vectors from text; the VectorStore saves those vectors and finds the nearest ones to a query. You need both for similarity search. Which vector store should I pick? In-memory for prototypes/tests, Pgvector if you already run Postgres (the easiest production default), and Redis/Chroma/Pinecone when you need dedicated scale or managed hosting. Can I generate embeddings without a cloud API? Yes — use a local embedding model (e.g. the transformers starter, or Ollama) with an in-memory or file-based store. No key, no cost, fully offline. How big should my chunks be? A few hundred tokens with slight overlap is a good starting point. It\u0026rsquo;s the biggest lever on retrieval quality, so tune it empirically for your content. Key takeaway: Spring AI gives you EmbeddingModel (text → vectors) and VectorStore (store + similarity search) behind swappable implementations. Store metadata, tune topK, and treat chunking as your main quality dial. With these in place, you\u0026rsquo;re ready for the RAG pipeline in Part 4.\n","permalink":"https://coderboi.com/posts/spring-ai-embeddings-vector-stores/","summary":"\u003cp\u003eRAG rests on two primitives: \u003cstrong\u003eembeddings\u003c/strong\u003e (turning text into vectors) and a \u003cstrong\u003evector store\u003c/strong\u003e (saving those vectors and finding the nearest ones to a query). Spring AI gives you one interface for each — \u003ccode\u003eEmbeddingModel\u003c/code\u003e and \u003ccode\u003eVectorStore\u003c/code\u003e — and you choose the implementation with a dependency and config, exactly like the \u003ca href=\"/posts/spring-ai-chat-completions/\"\u003echat client in Part 2\u003c/a\u003e. This is Part 3 of the Spring AI series, and it\u0026rsquo;s the groundwork for the \u003ca href=\"/posts/spring-ai-rag-pipeline/\"\u003eRAG pipeline in Part 4\u003c/a\u003e.\u003c/p\u003e","title":"Part 3: Spring AI Embeddings and Vector Stores"},{"content":"When you need an LLM to use your knowledge or behave a specific way, two approaches dominate the conversation: RAG (retrieval-augmented generation) and fine-tuning. They sound interchangeable and they\u0026rsquo;re not — they solve different problems and have very different cost, complexity, and maintenance profiles. Getting RAG vs fine-tuning right early saves you a lot of wasted GPU budget. Here\u0026rsquo;s the honest comparison.\nThe one-line difference RAG changes what the model knows right now by injecting relevant documents into the prompt at query time. The model\u0026rsquo;s weights never change. Fine-tuning changes how the model behaves by updating its weights on your examples. Knowledge problem → reach for RAG. Behavior/format/style problem → consider fine-tuning. Most \u0026ldquo;the AI doesn\u0026rsquo;t know our stuff\u0026rdquo; issues are knowledge problems.\nHow RAG works (and when to use it) RAG keeps the base model fixed and, for each question, retrieves the most relevant chunks from your data (via embeddings and a vector store), adds them to the prompt, and lets the model answer grounded in that context.\nChoose RAG when:\nYour knowledge changes often — re-index documents instead of retraining. You have lots of internal docs, FAQs, or product data to answer from. You need citations — you can show which source an answer came from. You want to avoid retraining entirely. The trade-offs: the model\u0026rsquo;s context window caps how much you can retrieve, and answer quality depends heavily on your chunking and retrieval quality. Bad retrieval, bad answers — no model setting fixes that.\nWant to build one? See the Spring AI RAG pipeline.\nHow fine-tuning works (and when to use it) Fine-tuning continues training the model on your curated examples, baking a behavior into the weights.\nChoose fine-tuning when:\nYou need a specific style, tone, or output format the base model doesn\u0026rsquo;t reliably produce. You want to teach a specialized task or domain phrasing. You\u0026rsquo;d like to shrink prompts — behavior learned in weights doesn\u0026rsquo;t need re-explaining every call, which can cut token cost and latency. The trade-offs: it requires curated training data, compute, and versioning discipline. And crucially, it does not reliably teach the model new facts — and when your knowledge changes, you may need to retrain or re-evaluate. People constantly try to fine-tune in facts and end up with a confident, outdated model.\nCost and maintenance at a glance RAG Fine-tuning Changes The prompt (retrieved context) The model weights Best for Knowledge, freshness, citations Style, format, specialized tasks Update when data changes Re-index documents Retrain the model Upfront cost Vector store + retrieval setup Training data + compute Citations Yes (you know the source) No Risk Retrieval quality Stale facts, overfitting In practice: use both This isn\u0026rsquo;t a religious war. A lot of production systems use RAG for knowledge and light fine-tuning for style or a narrow task on top. The pragmatic order:\nStart with prompt engineering. Often a good prompt is enough. Add RAG when the problem is \u0026ldquo;the model doesn\u0026rsquo;t know our data.\u0026rdquo; Fine-tune only when you have clear training data and a behavior prompts-plus-RAG can\u0026rsquo;t deliver. Begin with the cheapest lever and only escalate when you\u0026rsquo;ve proven you need to.\nCommon gotchas Fine-tuning to inject facts. Use RAG for knowledge; fine-tuning teaches behavior, not reliable, up-to-date facts. Skipping retrieval quality. Blaming the model when the real problem is chunking/topK/filters. Fine-tuning too early. Expensive and slow to iterate; exhaust prompting and RAG first. No evaluation. Either approach needs a test set — \u0026ldquo;it seems better\u0026rdquo; isn\u0026rsquo;t a metric. FAQ RAG or fine-tuning for a Q\u0026amp;A bot over my docs? RAG. It\u0026rsquo;s a knowledge problem: retrieve the relevant chunks and ground the answer, and you can update by re-indexing instead of retraining. Can I use both together? Yes, and many production systems do — RAG supplies current knowledge while light fine-tuning enforces a consistent style or a narrow task. Is fine-tuning always more expensive than RAG? Usually higher upfront (data + compute) and costlier to update, but it can lower per-request token cost by shrinking prompts. RAG shifts cost to retrieval infrastructure. Match it to your change frequency. Why not just fine-tune the facts in? Fine-tuning is poor at reliably storing and updating facts, and it bakes in a knowledge cutoff. RAG keeps facts external and current, and lets you cite sources. Key takeaway: RAG vs fine-tuning comes down to knowledge vs behavior. Use RAG for changing, citable knowledge; fine-tune for style, format, or specialized tasks. Start with prompting, add RAG when the model lacks your data, and fine-tune last — often the best system uses both.\n","permalink":"https://coderboi.com/posts/rag-vs-finetuning/","summary":"\u003cp\u003eWhen you need an LLM to use \u003cem\u003eyour\u003c/em\u003e knowledge or behave a \u003cem\u003especific\u003c/em\u003e way, two approaches dominate the conversation: \u003cstrong\u003eRAG (retrieval-augmented generation)\u003c/strong\u003e and \u003cstrong\u003efine-tuning\u003c/strong\u003e. They sound interchangeable and they\u0026rsquo;re not — they solve different problems and have very different cost, complexity, and maintenance profiles. Getting RAG vs fine-tuning right early saves you a lot of wasted GPU budget. Here\u0026rsquo;s the honest comparison.\u003c/p\u003e\n\u003ch2 id=\"the-one-line-difference\"\u003eThe one-line difference\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eRAG changes what the model \u003cem\u003eknows\u003c/em\u003e right now\u003c/strong\u003e by injecting relevant documents into the prompt at query time. The model\u0026rsquo;s weights never change.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eFine-tuning changes how the model \u003cem\u003ebehaves\u003c/em\u003e\u003c/strong\u003e by updating its weights on your examples.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eKnowledge problem → reach for RAG. Behavior/format/style problem → consider fine-tuning. Most \u0026ldquo;the AI doesn\u0026rsquo;t know our stuff\u0026rdquo; issues are knowledge problems.\u003c/p\u003e","title":"To RAG or to Fine-Tune? Picking the Right Tool for the AI Job"},{"content":"With Spring AI on the classpath, calling a chat model comes down to three things: add the right starter, set an API key or base URL in config, and inject ChatClient. The same Java code then works whether you\u0026rsquo;re hitting OpenAI in the cloud or a local Ollama model on your laptop — you swap the dependency and the config, not the logic. This is Part 2 of the Spring AI series.\n1. Dependencies For OpenAI:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.springframework.ai\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;spring-ai-open-ai-spring-boot-starter\u0026lt;/artifactId\u0026gt; \u0026lt;/dependency\u0026gt; For Ollama (local, no API key):\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.springframework.ai\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;spring-ai-ollama-spring-boot-starter\u0026lt;/artifactId\u0026gt; \u0026lt;/dependency\u0026gt; Version heads-up [needs source]: starter artifact IDs have changed across Spring AI versions (e.g. the -spring-boot-starter suffix). Check the exact coordinates for the version you pinned in Part 1.\nYou can even include both starters and choose at runtime — handy for \u0026ldquo;Ollama in dev, OpenAI in prod.\u0026rdquo;\n2. Configuration OpenAI in application.yml — note the key comes from an environment variable, never hardcoded:\nspring: ai: openai: api-key: ${OPENAI_API_KEY} chat: options: model: gpt-4o-mini Ollama (default base URL is http://localhost:11434):\nspring: ai: ollama: base-url: http://localhost:11434 chat: options: model: llama3.2 The options block is where you set per-request defaults like model and temperature. Anything you set here applies globally; you can override per call in code.\n3. Use the client Inject ChatClient.Builder, build once, and call:\n@Service public class ChatService { private final ChatClient chatClient; public ChatService(ChatClient.Builder builder) { this.chatClient = builder.build(); } public String ask(String userMessage) { return chatClient.prompt() .user(userMessage) .call() .content(); } } ChatClient is provider-agnostic: this exact code runs against OpenAI or Ollama. Switch by changing the starter and config — the service doesn\u0026rsquo;t know or care which model answered.\nSystem prompts and parameters Real prompts usually set a system message (the model\u0026rsquo;s \u0026ldquo;role\u0026rdquo;) and tune parameters. The fluent API handles both:\npublic String ask(String userMessage) { return chatClient.prompt() .system(\u0026#34;You are a terse senior engineer. Answer in at most three sentences.\u0026#34;) .user(userMessage) .call() .content(); } A clear system prompt is the cheapest quality lever you have — it shapes tone, format, and guardrails before the user ever types anything. For tips on writing them, see prompt engineering that actually works.\nStreaming responses For chat UIs you don\u0026rsquo;t want to wait for the whole answer. Use stream() instead of call() and consume a reactive Flux of tokens:\npublic Flux\u0026lt;String\u0026gt; askStreaming(String userMessage) { return chatClient.prompt() .user(userMessage) .stream() .content(); } Return that Flux from a controller (or push it over Server-Sent Events) and the response renders token-by-token, exactly like the chat apps you\u0026rsquo;ve used.\nStructured output Need JSON or a typed object back instead of a string? Spring AI can map the model\u0026rsquo;s response straight onto a Java record via its structured-output support (.entity(MyRecord.class)), so you skip manual parsing. The exact method name varies by version [needs source], but the capability is there — lean on it instead of regexing model output.\nCommon gotchas Hardcoded API keys. Use ${OPENAI_API_KEY} and an env var. A key in source is a key in your git history forever. Wrong/unavailable model name. gpt-4o-mini or llama3.2 must exist for your provider/account; for Ollama you must ollama pull the model first. Ollama not running. The Ollama starter expects the daemon at localhost:11434 — start it before your app. Blocking on stream(). The streaming API returns a Flux; consume it reactively, don\u0026rsquo;t .block() it back into a string and lose the point. FAQ Can I switch from OpenAI to Ollama without changing code? Yes — that\u0026rsquo;s the main selling point. Swap the starter dependency and the application.yml config; your ChatClient code stays identical. How do I set temperature or max tokens? In application.yml under the provider\u0026rsquo;s chat.options, or per call via the prompt builder\u0026rsquo;s options. Config sets the default; code overrides it for a specific request. How do I stream the response token by token? Use .stream().content() instead of .call().content(). It returns a Flux\u0026lt;String\u0026gt; you can return from a controller or push over SSE for a live-typing UI. Can the model return a typed Java object? Yes — Spring AI\u0026rsquo;s structured-output support maps a response onto a record/class so you don\u0026rsquo;t parse JSON by hand. Check your version\u0026rsquo;s exact API. Key takeaway: Spring AI chat completions = a starter + application.yml config + an injected ChatClient. The same code targets OpenAI or Ollama; add a system prompt for quality, use stream() for live UIs, and keep keys in env vars. Next: embeddings and vector stores, the foundation for RAG.\n","permalink":"https://coderboi.com/posts/spring-ai-chat-completions/","summary":"\u003cp\u003eWith \u003ca href=\"/posts/spring-ai-intro/\"\u003eSpring AI on the classpath\u003c/a\u003e, calling a chat model comes down to three things: add the right starter, set an API key or base URL in config, and inject \u003ccode\u003eChatClient\u003c/code\u003e. The same Java code then works whether you\u0026rsquo;re hitting OpenAI in the cloud or a local Ollama model on your laptop — you swap the dependency and the config, not the logic. This is Part 2 of the Spring AI series.\u003c/p\u003e","title":"Part 2: Spring AI Chat Completions with OpenAI or Ollama"},{"content":"Spring AI brings AI capabilities into the Spring ecosystem as a first-class citizen. Instead of hand-rolling HTTP clients for OpenAI, Anthropic, or Ollama and wiring JSON parsing, retries, and secrets yourself, you get a consistent abstraction over chat models, embeddings, and vector stores — with the usual Spring benefits: dependency injection, configuration properties, auto-configuration, and optional observability. If you already think in @Service and application.yml, Spring AI will feel immediately familiar. This is Part 1 of a four-part series that ends with a working RAG app.\nWhat it gives you Chat models — one API whether you call OpenAI, Anthropic, Azure OpenAI, or a local Ollama model. Swap providers via config, not code. (Part 2 covers this.) Embeddings — turn text into vectors for similarity search, with multiple backends (cloud or local). (Part 3.) Vector stores — persist and query embeddings, with adapters for Pgvector, Redis, Chroma, Pinecone, and more. (Part 3.) RAG — retrieval-augmented generation building blocks: load documents, embed them, store them, and augment prompts with retrieved context. (Part 4.) The unifying idea is the portable abstraction: ChatClient, EmbeddingModel, and VectorStore are interfaces. The provider is an implementation you select with a dependency and configure in YAML.\nWhy not just call the APIs directly? You can — and for a one-off script, a raw HTTP client is perfectly fine. Spring AI earns its place when you\u0026rsquo;re building an application rather than a script:\nProvider portability. Prototype against local Ollama (free, private), ship against a hosted model — same code. Config, not constants. API keys and base URLs live in application.yml / env vars, not scattered string literals. Spring-native ergonomics. Constructor injection, @ConfigurationProperties, testing slices, and Micrometer observability all just work. Higher-level building blocks. Prompt templates, structured output mapping to Java objects, advisors, and RAG plumbing you\u0026rsquo;d otherwise write by hand. The trade-off: it\u0026rsquo;s another abstraction layer, and APIs have shifted across early versions (see the version note below). For most Spring teams, the consistency is worth it.\nWhat you need Java 17+ and Spring Boot 3.x (Spring AI is built on Spring Framework 6). A model starter for your provider, e.g. spring-ai-openai-spring-boot-starter for OpenAI or spring-ai-ollama-spring-boot-starter for local models. For RAG: a vector store dependency and, optionally, document readers/splitters. Version heads-up [needs source]: Spring AI\u0026rsquo;s APIs and starter artifact names changed in the run-up to and after its 1.0 GA (and some docs reference a 2.x line aligned with Spring Boot 4). Pin a version in your build and follow the docs for that version — copy-pasting across versions is the main source of \u0026ldquo;it doesn\u0026rsquo;t compile.\u0026rdquo;\nHow the series fits together Each part builds on the last:\nIntro (you are here) — the why and the setup. Chat completions — call a model with ChatClient. Embeddings \u0026amp; vector stores — turn data into searchable vectors. RAG pipeline — retrieve relevant chunks and feed them into a prompt so the model answers from your data. If you only care about \u0026ldquo;make the model answer using my documents,\u0026rdquo; that\u0026rsquo;s RAG — and it\u0026rsquo;s the most common production pattern. (Not sure whether you need RAG or fine-tuning? Here\u0026rsquo;s the comparison.)\nFAQ Is Spring AI production-ready? It reached a 1.0 GA and is actively developed. As with any young framework, pin your version and read release notes before upgrading — the API moved quickly in early releases. [needs source] Which model providers does Spring AI support? Many, through provider-specific starters — including OpenAI, Anthropic, Azure OpenAI, and local models via Ollama, plus several vector stores. You select one with a dependency and configure it in application.yml. Do I need a cloud API key to try it? No. Run a local model with Ollama (see running LLMs locally) and point Spring AI at http://localhost:11434 — no key, no cost, fully offline. Spring AI or LangChain4j? Both wrap LLM providers for Java. Spring AI is the natural fit if you\u0026rsquo;re already in the Spring ecosystem (auto-config, properties, DI); LangChain4j is framework-agnostic. Choose based on your stack. [needs source] Key takeaway: Spring AI is a portable abstraction over chat models, embeddings, and vector stores that fits the Spring Boot programming model. Use it (over raw HTTP clients) when you want provider portability, config-driven setup, and ready-made RAG building blocks. Next up: calling a chat model.\n","permalink":"https://coderboi.com/posts/spring-ai-intro/","summary":"\u003cp\u003eSpring AI brings AI capabilities into the Spring ecosystem as a first-class citizen. Instead of hand-rolling HTTP clients for OpenAI, Anthropic, or Ollama and wiring JSON parsing, retries, and secrets yourself, you get a consistent abstraction over chat models, embeddings, and vector stores — with the usual Spring benefits: dependency injection, configuration properties, auto-configuration, and optional observability. If you already think in \u003ccode\u003e@Service\u003c/code\u003e and \u003ccode\u003eapplication.yml\u003c/code\u003e, Spring AI will feel immediately familiar. This is Part 1 of a four-part series that ends with a working RAG app.\u003c/p\u003e","title":"Part 1: Introduction to Spring AI"},{"content":"Running LLMs locally with Ollama means an open-weight model lives on your machine — no API keys, no per-token billing, no sending your data to someone else\u0026rsquo;s servers. You install Ollama, pull a model with one command, and chat from the terminal or hit a local API. For prototyping, privacy-sensitive work, or just learning how these models behave without a credit card attached, it\u0026rsquo;s the simplest on-ramp there is.\nInstall and run a model Installation is straightforward: grab the Ollama binary for macOS, Linux, or Windows from the project site and run it. Then, from the command line:\nollama run llama3.2 That pulls the model (one-time download) and drops you into an interactive chat. Other useful commands:\nollama pull mistral # download a model without chatting ollama list # show installed models ollama rm llama3.2 # free up disk space Models are stored on disk and loaded into RAM when you use them — which is why memory is the thing that matters most.\nHardware: what you actually need Models come in sizes measured in parameters (and billions thereof), and size drives RAM/VRAM needs:\nSmall models (a few billion params, e.g. Phi, small Llama/Mistral variants) run on modest laptops — think 8–16 GB RAM. Larger models want 16 GB or more, and a GPU makes them dramatically faster. Quantized versions trade a little quality for much smaller memory use — usually the right call on a laptop. [needs source] If a model feels painfully slow, drop to a smaller or more aggressively quantized variant before blaming Ollama.\nThe OpenAI-compatible local API Here\u0026rsquo;s the part that makes Ollama genuinely useful for developers: it exposes a local HTTP API at http://localhost:11434, including an OpenAI-compatible endpoint. That means an app written against the OpenAI API can be pointed at your local Ollama by changing the base URL and the model name — no other code changes.\nThis is why Spring AI works the same against Ollama or OpenAI: in Part 2 of the Spring AI series you literally just set base-url: http://localhost:11434 and a local model name. Develop offline against Ollama, deploy against a hosted model — identical code. The project supports many open-weight models, with new ones added regularly.\nWhen local beats the cloud (and when it doesn\u0026rsquo;t) Run locally when:\nPrivacy matters — data never leaves your machine. Great for sensitive documents or regulated work. You\u0026rsquo;re prototyping — no metered billing while you experiment. You want to learn — poke at temperature, prompts, and model differences freely. You need offline — no internet dependency after the download. Stick with the cloud when:\nYou need the frontier-quality answers only the biggest hosted models give. You need to serve many concurrent users — your laptop is not a fleet. You don\u0026rsquo;t want to manage hardware, memory, and model updates. A common, pragmatic pattern: Ollama for local dev and CI, a hosted model in production — and because the API is compatible, switching is a config change.\nCommon gotchas Out-of-memory / crawling speed. The model is too big for your RAM/VRAM — use a smaller or quantized variant. \u0026ldquo;Model not found.\u0026rdquo; You have to ollama pull (or run) a model before an app can use it. App can\u0026rsquo;t reach Ollama. The daemon must be running and listening on localhost:11434 before your app starts. Expecting GPT-4-class output from a 3B model. Small local models are capable but not magic — calibrate expectations to model size. FAQ Is Ollama free? Yes — Ollama is free and runs open-weight models locally. There are no API or token charges; your only cost is the hardware it runs on. Can I use Ollama as a drop-in for the OpenAI API? Largely, yes. It exposes an OpenAI-compatible endpoint at localhost:11434, so many OpenAI clients work by changing the base URL and model name. Verify the specific features you use are supported. What hardware do I need? Small/quantized models run on 8–16 GB RAM; larger models want 16 GB+ and benefit a lot from a GPU. Pick a model size that fits your machine. Can I use Ollama with Spring Boot? Yes — Spring AI has an Ollama starter. Point it at http://localhost:11434, set a pulled model name, and the same ChatClient code runs locally or against the cloud. See the Spring AI chat post. Key takeaway: Ollama lets you run LLMs locally with one command — free, private, and offline. Match model size to your RAM, use its OpenAI-compatible API to develop against local models and deploy to the cloud unchanged, and reach for hosted models when you need frontier quality or real concurrency.\n","permalink":"https://coderboi.com/posts/running-llms-locally-ollama/","summary":"\u003cp\u003eRunning LLMs locally with Ollama means an open-weight model lives on \u003cem\u003eyour\u003c/em\u003e machine — no API keys, no per-token billing, no sending your data to someone else\u0026rsquo;s servers. You install Ollama, pull a model with one command, and chat from the terminal or hit a local API. For prototyping, privacy-sensitive work, or just \u003cem\u003elearning how these models behave\u003c/em\u003e without a credit card attached, it\u0026rsquo;s the simplest on-ramp there is.\u003c/p\u003e\n\u003ch2 id=\"install-and-run-a-model\"\u003eInstall and run a model\u003c/h2\u003e\n\u003cp\u003eInstallation is straightforward: grab the Ollama binary for macOS, Linux, or Windows from the project site and run it. Then, from the command line:\u003c/p\u003e","title":"Taming the AI Beast on Your Own Laptop with Ollama"},{"content":"Prompt engineering is just the craft of phrasing your request so the model gives you what you actually want — the right format, tone, and level of detail. It\u0026rsquo;s less \u0026ldquo;magic words\u0026rdquo; and more \u0026ldquo;clear communication with a literal-minded intern.\u0026rdquo; These prompt engineering tips work across most modern LLMs (GPT, Claude, Llama, and friends), and none of them require yelling, emojis, or threatening the model into compliance.\nBe explicit about task and format Vague in, vague out. \u0026ldquo;Tell me about APIs\u0026rdquo; can mean anything; the model guesses, and you get a rambling essay. Spell out task, audience, length, and format:\nIn 3 short paragraphs, explain what a REST API is for a beginner. Use simple language and exactly one example.\nThat single sentence pins down structure (3 paragraphs), audience (beginner), and constraints (one example). For code, name the language, version, and style so the model doesn\u0026rsquo;t pick for you. The more decisions you make, the fewer the model makes badly.\nShow, don\u0026rsquo;t just tell: few-shot examples When a task is nuanced, demonstrate it. Few-shot prompting means including one or two input→output pairs before the real request so the model mimics the pattern:\nClassify the sentiment as POSITIVE, NEGATIVE, or NEUTRAL. Review: \u0026ldquo;Shipping was fast.\u0026rdquo; → POSITIVE Review: \u0026ldquo;It broke in a week.\u0026rdquo; → NEGATIVE Review: \u0026ldquo;It\u0026rsquo;s fine, I guess.\u0026rdquo; →\nFor structured output (say, JSON with specific keys), a small example is worth a paragraph of description — the model copies the shape. Two good examples usually beat a long, abstract spec.\nGive it room to think For anything involving reasoning, asking the model to work step by step before answering measurably improves accuracy on multi-step problems. A simple \u0026ldquo;think through it step by step, then give the final answer\u0026rdquo; often does it. For cleaner output, have it reason first and then return only the final answer in a specified format, so the user doesn\u0026rsquo;t see the scratch work.\nSet the role with a system prompt Most APIs separate a system prompt (the model\u0026rsquo;s persona and rules) from the user message. Use it to fix tone and guardrails once, instead of repeating yourself every turn:\nYou are a terse senior engineer. Prefer code over prose. If a question is ambiguous, ask one clarifying question first.\nIf you\u0026rsquo;re wiring this up in code, Spring AI\u0026rsquo;s chat client exposes .system(...) for exactly this.\nConstrain and ground to cut hallucination LLMs hallucinate when they lack information. Two defenses:\nConstrain. \u0026ldquo;If you\u0026rsquo;re not sure, say you don\u0026rsquo;t know\u0026rdquo; gives the model permission to not bluff. Ground. Provide the source material in the prompt and instruct it to answer only from that. At scale, that\u0026rsquo;s RAG — retrieval feeding the prompt automatically. Iterate — prompts are code Your first prompt is a draft. When the output is off, don\u0026rsquo;t start over — change one thing: add a constraint, split the task into steps, tighten the format, or add an example. Keep a small library of prompts that work for your recurring tasks; that reuse is where the real time savings live. Treat prompts like code: version them, test them on edge cases, and refine.\nCommon gotchas Asking for too much at once. Break a mega-prompt into steps or separate calls; quality drops as you pile on demands. Placeholder ambiguity. \u0026ldquo;Make it better\u0026rdquo; — better how? Specify the dimension (shorter, friendlier, more formal). Ignoring the system prompt. Stuffing persona instructions into every user message instead of setting them once. No output contract. If you need JSON, say so and show the shape — don\u0026rsquo;t hope. FAQ What is prompt engineering, really? Designing the input to an LLM — task, context, examples, and format — so it reliably produces the output you want. It\u0026rsquo;s structured communication, not secret keywords. What\u0026#39;s zero-shot vs few-shot prompting? Zero-shot gives the instruction with no examples; few-shot includes one or more input→output examples so the model copies the pattern. Few-shot usually wins on nuanced or format-specific tasks. Does telling the model to \u0026#39;think step by step\u0026#39; actually help? For multi-step reasoning, yes — it generally improves accuracy. Have it reason first, then return the final answer in your required format. How do I get consistent JSON out of an LLM? Specify the exact schema, show a small example, and instruct it to return only JSON. In code, use structured-output features (e.g. Spring AI mapping to a record) instead of parsing free text. Key takeaway: Effective prompt engineering = be explicit about task/format, show examples for nuance, let the model reason step by step, set the role in a system prompt, and iterate one change at a time. Ground it with provided context to keep it honest.\n","permalink":"https://coderboi.com/posts/prompt-engineering-tips/","summary":"\u003cp\u003ePrompt engineering is just the craft of phrasing your request so the model gives you what you actually want — the right format, tone, and level of detail. It\u0026rsquo;s less \u0026ldquo;magic words\u0026rdquo; and more \u0026ldquo;clear communication with a literal-minded intern.\u0026rdquo; These prompt engineering tips work across most modern LLMs (GPT, Claude, Llama, and friends), and none of them require yelling, emojis, or threatening the model into compliance.\u003c/p\u003e\n\u003ch2 id=\"be-explicit-about-task-and-format\"\u003eBe explicit about task and format\u003c/h2\u003e\n\u003cp\u003eVague in, vague out. \u0026ldquo;Tell me about APIs\u0026rdquo; can mean anything; the model guesses, and you get a rambling essay. Spell out \u003cstrong\u003etask, audience, length, and format\u003c/strong\u003e:\u003c/p\u003e","title":"Stop Yelling at the AI: Prompt Engineering That Actually Works"},{"content":"A large language model (LLM) is a neural network trained on enormous amounts of text to do one deceptively simple thing: predict the next token (roughly, the next word-piece) in a sequence. Do that well enough, at billions of parameters, and something surprising falls out — the model can answer questions, summarize documents, translate, and write code. Models like GPT, Claude, and Llama are all LLMs. This is the no-hype, human-friendly explanation of what they are and why they matter to anyone building software.\nHow LLMs actually work Under the hood is the transformer architecture, whose key move is self-attention — the ability to weigh how much each word relates to every other word in the input, capturing long-range context. Training happens in stages:\nPre-training. The model reads a massive text corpus (books, articles, code) and learns grammar, facts, and reasoning patterns purely by predicting the next token, over and over. Fine-tuning \u0026amp; alignment. Techniques like instruction tuning and RLHF (reinforcement learning from human feedback) teach it to follow instructions and behave helpfully and safely, rather than just autocomplete. You interact with the result by sending a prompt and getting a completion back, usually via an API or a chat interface. Everything fancy — chatbots, copilots, RAG systems — is built on that prompt-in, text-out loop.\nTokens, context windows, and why they matter Two concepts you\u0026rsquo;ll bump into constantly:\nTokens. Models don\u0026rsquo;t see words; they see tokens (word fragments). Cost and limits are measured in tokens, so \u0026ldquo;be concise\u0026rdquo; is also \u0026ldquo;be cheaper.\u0026rdquo; Context window. The maximum number of tokens a model can consider at once — your prompt plus its answer. Everything the model \u0026ldquo;knows\u0026rdquo; in a conversation has to fit in that window, which is exactly why RAG exists: to feed in only the relevant chunks instead of an entire knowledge base. Strengths and limitations Because they\u0026rsquo;re trained on broad data, LLMs are remarkably versatile — translation, classification, extraction, and generation, often with little or no task-specific training. That generality is the superpower.\nThe flip side, which you must design around:\nHallucination. They can state false things confidently. The model optimizes for plausible-sounding text, not truth. Prompt sensitivity. Small wording changes can shift output quality a lot — hence prompt engineering. Knowledge cutoff. A base model doesn\u0026rsquo;t know about events after its training data ends, and has no live access to your private data. No real-time facts. Out of the box it can\u0026rsquo;t look things up — you have to give it the information. The fixes are practical: write careful prompts, use RAG to inject current or private knowledge, keep a human in the loop for high-stakes output, and evaluate results instead of trusting them blindly.\nWhy a mental model matters You don\u0026rsquo;t need to derive attention math to use LLMs well, but a solid mental model pays off immediately. It tells you why a prompt failed (ambiguous instructions, missing context, too much asked at once), when to reach for RAG versus fine-tuning, and which model to pick for a task. Treat the model as a brilliant, eager intern with no memory of your business and a tendency to bluff — and you\u0026rsquo;ll design far more reliable systems.\nWant to go from theory to code? The Spring AI series builds a real chat-and-RAG app on Spring Boot, and you can run a model locally with Ollama for free while you learn.\nFAQ What does LLM stand for? Large Language Model — a neural network with many parameters trained on large text corpora to predict the next token, which lets it generate and understand natural language. How is an LLM different from a chatbot? The LLM is the underlying model. A chatbot is an application built on top of it — adding a chat UI, system prompts, memory, and often retrieval (RAG) so the model answers usefully. Why do LLMs hallucinate? They\u0026rsquo;re trained to produce plausible text, not verified truth, so when they lack the right information they fill the gap convincingly. Grounding them with retrieved context and evaluating outputs reduces this. Do I need a powerful machine to use an LLM? Not to use hosted models — that\u0026rsquo;s an API call. To run one locally you need decent RAM/GPU, though small models run on modest hardware. See running LLMs locally with Ollama. Key takeaway: An LLM is a transformer trained to predict the next token; scale that up and you get a versatile language engine that\u0026rsquo;s powerful but prone to hallucination, prompt-sensitive, and limited to its training cutoff. Build around those limits with good prompts, RAG, and evaluation — and you\u0026rsquo;ll ship reliable AI features.\n","permalink":"https://coderboi.com/posts/intro-large-language-models/","summary":"\u003cp\u003eA large language model (LLM) is a neural network trained on enormous amounts of text to do one deceptively simple thing: predict the next token (roughly, the next word-piece) in a sequence. Do that well enough, at billions of parameters, and something surprising falls out — the model can answer questions, summarize documents, translate, and write code. Models like GPT, Claude, and Llama are all LLMs. This is the no-hype, human-friendly explanation of what they are and why they matter to anyone building software.\u003c/p\u003e","title":"WTF is an LLM? A Human-Friendly Guide to AI Brains"},{"content":"\u0026ldquo;It works on my machine\u0026rdquo; stops being funny the moment it has to run on someone else\u0026rsquo;s. Dockerizing a Spring Boot app gives you one artifact that runs identically on your laptop, in CI, and on Kubernetes or AWS — same JDK, same dependencies, same behavior. The trick is doing it so the image is small, secure, and cached well, not a 600 MB blob that rebuilds from scratch on every code change. Here\u0026rsquo;s the approach I actually use.\nThe multi-stage Dockerfile A multi-stage build compiles in one stage and ships from a slim runtime in another, so build tools never end up in the final image:\n# --- Stage 1: build --- FROM eclipse-temurin:21-jdk AS build WORKDIR /app COPY . . RUN ./mvnw -q clean package -DskipTests # --- Stage 2: runtime --- FROM eclipse-temurin:21-jre WORKDIR /app COPY --from=build /app/target/*.jar app.jar EXPOSE 8080 ENTRYPOINT [\u0026#34;java\u0026#34;, \u0026#34;-jar\u0026#34;, \u0026#34;app.jar\u0026#34;] The runtime stage uses a JRE, not a JDK, so you\u0026rsquo;re not shipping a compiler to production. Two refinements worth making:\nRun as a non-root user. Add a user and USER directive so a container breakout doesn\u0026rsquo;t land as root. Pin the base image to a digest or specific tag for reproducible builds. Layered JARs for faster rebuilds Here\u0026rsquo;s the gotcha: copy a single fat JAR and every code change busts the Docker layer that contains all your dependencies — so each build re-downloads/re-layers ~50 MB of libraries that didn\u0026rsquo;t change. Spring Boot\u0026rsquo;s layered JAR splits the archive into dependencies, spring-boot-loader, snapshot-dependencies, and application, so Docker caches the dependency layers and only the tiny application layer changes when you edit code:\nFROM eclipse-temurin:21-jdk AS build WORKDIR /app COPY . . RUN ./mvnw -q clean package -DskipTests RUN java -Djarmode=layertools -jar target/*.jar extract FROM eclipse-temurin:21-jre WORKDIR /app COPY --from=build /app/dependencies/ ./ COPY --from=build /app/spring-boot-loader/ ./ COPY --from=build /app/snapshot-dependencies/ ./ COPY --from=build /app/application/ ./ EXPOSE 8080 ENTRYPOINT [\u0026#34;java\u0026#34;, \u0026#34;org.springframework.boot.loader.launch.JarLauncher\u0026#34;] Version heads-up [needs source]: the loader launcher class moved to org.springframework.boot.loader.launch.JarLauncher in Spring Boot 3.2+. Older versions use org.springframework.boot.loader.JarLauncher. Match your Boot version.\nOrder matters: copy the layers that change least (dependencies) first so they cache.\nOr skip the Dockerfile with Buildpacks Don\u0026rsquo;t want to maintain a Dockerfile at all? Spring Boot\u0026rsquo;s build plugin produces an OCI image via Cloud Native Buildpacks with one command:\n./mvnw spring-boot:build-image No Dockerfile required. The resulting image is layered, runs as non-root, and follows sensible defaults out of the box. For many projects this is the better default — you only drop to a hand-written Dockerfile when you need custom OS packages or fine-grained control.\nConfiguration and local stacks A container should get its configuration from the environment, not baked into the image. Externalize database URLs, secrets, and feature flags as env vars (Spring maps SPRING_DATASOURCE_URL to spring.datasource.url automatically) or a config server. The same image then runs in dev, staging, and prod with different env.\nFor local development, use Docker Compose to bring up your app plus its dependencies in one command:\nservices: app: build: . ports: [\u0026#34;8080:8080\u0026#34;] environment: SPRING_DATASOURCE_URL: jdbc:postgresql://db:5432/app depends_on: [db] db: image: postgres:16 environment: POSTGRES_DB: app POSTGRES_PASSWORD: secret docker compose up and you have the full stack — no \u0026ldquo;install Postgres locally\u0026rdquo; instructions in your README.\nCommon gotchas Shipping a JDK at runtime. Use a JRE base image; it\u0026rsquo;s smaller and has less attack surface. No layered JAR. Without it, every code change re-layers all your dependencies — slow builds, slow pushes. Running as root. Add a non-root USER. Many base images still default to root. Hardcoded config. Baking URLs/secrets into the image breaks the \u0026ldquo;one image, many environments\u0026rdquo; promise (and leaks secrets). Ignoring .dockerignore. Without it you copy target/, .git, and IDE files into the build context, bloating it and busting caches. FAQ Dockerfile or Spring Boot Buildpacks? Buildpacks (spring-boot:build-image) are the easiest sane default — layered, non-root, no file to maintain. Write a Dockerfile when you need custom OS packages, a specific base image, or tighter control. Why is my image so large? You\u0026rsquo;re probably shipping a JDK instead of a JRE, copying the whole build context, or not using a slim base. Multi-stage build + JRE runtime + .dockerignore usually cuts it dramatically. What\u0026#39;s a layered JAR and why use it? It splits the Boot JAR into layers (dependencies, loader, app) so Docker caches the rarely-changing dependency layers. Result: code-only changes produce tiny, fast rebuilds. How do I pass config into the container? Environment variables. Spring relaxed binding maps SPRING_DATASOURCE_URL to spring.datasource.url, etc. Keep secrets in the orchestrator\u0026rsquo;s secret store, not the image. Key takeaway: Dockerize Spring Boot with a multi-stage build on a JRE base, use the layered-JAR layout for fast cached rebuilds (or skip the Dockerfile entirely with spring-boot:build-image), run as non-root, and feed config through environment variables.\nGot the app built first? See Surviving Day 1 with Spring Boot 3.\n","permalink":"https://coderboi.com/posts/spring-boot-docker/","summary":"\u003cp\u003e\u0026ldquo;It works on my machine\u0026rdquo; stops being funny the moment it has to run on someone else\u0026rsquo;s. Dockerizing a Spring Boot app gives you one artifact that runs identically on your laptop, in CI, and on Kubernetes or AWS — same JDK, same dependencies, same behavior. The trick is doing it so the image is \u003cstrong\u003esmall, secure, and cached well\u003c/strong\u003e, not a 600 MB blob that rebuilds from scratch on every code change. Here\u0026rsquo;s the approach I actually use.\u003c/p\u003e","title":"Dockerizing Spring Boot (Because \"It Works on My Machine\" Isn’t Enough)"},{"content":"Spring Boot JWT authentication is the default way most teams secure a REST API today. Instead of keeping a session in server memory, the client carries a signed JSON Web Token on every request, and the server verifies it. No session table, no sticky load balancing, no \u0026ldquo;which node has my session?\u0026rdquo; — which is exactly why it fits SPAs, mobile apps, and microservices. Let\u0026rsquo;s wire it up properly, and talk about the parts that bite you later.\nThe authentication flow The whole dance is four steps:\nThe client POSTs credentials to a login endpoint (e.g. /auth/login). You validate them, then mint a signed JWT containing claims like the subject (username), roles, issued-at, and expiry. The client stores that token and sends it on every subsequent request as Authorization: Bearer \u0026lt;token\u0026gt;. A filter on the server validates the signature and expiry, builds an Authentication, and drops it into the SecurityContext. The key property: the token is signed, not encrypted. Anyone can decode and read the claims (they\u0026rsquo;re just base64url) — the signature only proves the server issued it and nobody tampered with it. So never put secrets in a JWT, and always serve it over HTTPS.\nCreating the token at login Spring Security doesn\u0026rsquo;t ship a JWT encoder/decoder for symmetric signing out of the box, so most people reach for jjwt or Nimbus JOSE. A token service typically wraps a signing key and exposes generateToken(user) and extractUsername(token).\nVersion heads-up [needs source]: jjwt\u0026rsquo;s builder API changed between 0.11.x (setSubject, signWith(key, SignatureAlgorithm.HS256)) and 0.12.x (subject, signWith(key)). Check the version in your pom.xml before copying snippets from older tutorials — that mismatch is the #1 reason these examples don\u0026rsquo;t compile.\nWhatever library you use, the rules are the same: sign with a strong key (256-bit minimum for HS256), set a short expiry on the access token (15–60 minutes is typical), and store the signing secret in config/secret manager, never in source.\nValidating tokens with a filter This is the heart of it. Add a OncePerRequestFilter that runs before Spring\u0026rsquo;s username/password filter, pulls the bearer token, validates it, and authenticates the request:\n@Component public class JwtAuthFilter extends OncePerRequestFilter { private final JwtService jwtService; public JwtAuthFilter(JwtService jwtService) { this.jwtService = jwtService; } @Override protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain chain) throws ServletException, IOException { String header = request.getHeader(\u0026#34;Authorization\u0026#34;); if (header != null \u0026amp;\u0026amp; header.startsWith(\u0026#34;Bearer \u0026#34;)) { String token = header.substring(7); try { String username = jwtService.extractUsername(token); // throws if bad/expired if (username != null \u0026amp;\u0026amp; SecurityContextHolder.getContext().getAuthentication() == null) { var auth = new UsernamePasswordAuthenticationToken( username, null, jwtService.extractAuthorities(token)); auth.setDetails(new WebAuthenticationDetailsSource().buildDetails(request)); SecurityContextHolder.getContext().setAuthentication(auth); } } catch (JwtException ex) { // Invalid or expired token: leave the context empty. // The entry point will answer with 401 for protected routes. } } chain.doFilter(request, response); } } Note what it does not do: it doesn\u0026rsquo;t reject the request itself. It just authenticates when it can and steps aside when it can\u0026rsquo;t. Authorization rules decide the rest.\nWiring it into Spring Security On Spring Boot 3 (Spring Security 6) you configure a SecurityFilterChain bean with the lambda DSL. Make the API stateless and slot your filter in:\n@Configuration @EnableWebSecurity public class SecurityConfig { @Bean SecurityFilterChain filterChain(HttpSecurity http, JwtAuthFilter jwtAuthFilter) throws Exception { http .csrf(csrf -\u0026gt; csrf.disable()) // safe for a token API with no cookies .sessionManagement(sm -\u0026gt; sm.sessionCreationPolicy(SessionCreationPolicy.STATELESS)) .authorizeHttpRequests(auth -\u0026gt; auth .requestMatchers(\u0026#34;/auth/**\u0026#34;).permitAll() .anyRequest().authenticated()) .addFilterBefore(jwtAuthFilter, UsernamePasswordAuthenticationFilter.class); return http.build(); } } STATELESS is the line that matters — it tells Spring not to create or use an HttpSession, which is the entire point of going token-based.\nUsing it in your controllers Once the context is populated, your controllers behave like any secured Spring app. Method security gives you fine-grained control:\n@GetMapping(\u0026#34;/admin/reports\u0026#34;) @PreAuthorize(\u0026#34;hasRole(\u0026#39;ADMIN\u0026#39;)\u0026#34;) public List\u0026lt;Report\u0026gt; reports() { ... } Add @EnableMethodSecurity to turn on @PreAuthorize. When a token is missing or invalid, the request fails authorization and your error handling returns a clean 401/403 — wire that through the same global exception handler you use for everything else so clients get a consistent JSON error instead of a stack trace.\nCommon pitfalls Long-lived access tokens. You can\u0026rsquo;t revoke a JWT once issued. Keep access tokens short and add refresh tokens (stored server-side, revocable) for longer sessions. No logout story. \u0026ldquo;Logout\u0026rdquo; with stateless JWTs means deleting the token client-side and/or maintaining a short server-side denylist for the access-token window. Weak or hardcoded secrets. A leaked HS256 key lets anyone forge tokens. Rotate keys and externalize config. Trusting unverified claims. Always validate signature and expiry before reading any claim. CSRF confusion. Disabling CSRF is fine for a bearer-token API, but not if you store the token in a cookie — then you need CSRF protection again. FAQ Is a JWT encrypted? No. It\u0026rsquo;s signed and base64url-encoded, so the payload is readable by anyone. Don\u0026rsquo;t store sensitive data in it; rely on HTTPS for confidentiality in transit. JWT or server sessions? Sessions are simpler and instantly revocable — great for a classic monolith with a browser. JWTs shine when you have multiple clients or services and want to avoid shared session state. How do I revoke a token? You don\u0026rsquo;t, directly. Use short expiries plus refresh tokens, or keep a denylist of token IDs (jti) until they expire. Where should the client store the token? An HttpOnly cookie (with CSRF protection) or in-memory. Avoid localStorage if you can — it\u0026rsquo;s reachable by any XSS on the page. Key takeaway: Spring Boot JWT authentication = a signed token on every request + a stateless SecurityFilterChain + a OncePerRequestFilter that validates and authenticates. Keep access tokens short, sign with a strong externalized key, add refresh tokens for real sessions, and serve everything over HTTPS.\nNew to securing APIs? Start with a plain endpoint first — see Your First Spring Boot REST Controller in 5 Minutes — then layer security on top.\n","permalink":"https://coderboi.com/posts/spring-boot-jwt-security/","summary":"\u003cp\u003eSpring Boot JWT authentication is the default way most teams secure a REST API today. Instead of keeping a session in server memory, the client carries a signed JSON Web Token on every request, and the server verifies it. No session table, no sticky load balancing, no \u0026ldquo;which node has my session?\u0026rdquo; — which is exactly why it fits SPAs, mobile apps, and microservices. Let\u0026rsquo;s wire it up properly, and talk about the parts that bite you later.\u003c/p\u003e","title":"No Ticket, No Entry: Securing Spring Boot with JWTs"},{"content":"Spring WebFlux is the reactive, non-blocking sibling of Spring MVC. Instead of one thread per request blocking on I/O, WebFlux runs on a small event-loop and frees the thread while it waits for the database or another service to answer. Built on Project Reactor, it\u0026rsquo;s a great fit for high-concurrency APIs and reactive data sources — but it\u0026rsquo;s not a free upgrade, and that nuance is the most important thing to get right.\nMVC vs WebFlux in one paragraph Spring MVC uses the classic thread-per-request model: each request holds a thread until it\u0026rsquo;s done, blocking on I/O along the way. That\u0026rsquo;s simple and fine for most apps. WebFlux uses a non-blocking event loop: a handful of threads juggle thousands of in-flight requests, never blocking — when work is waiting on I/O, the thread goes off and serves someone else. More concurrency, fewer threads, less memory under load. The cost is a different, harder programming model.\nReturning Mono and Flux A WebFlux controller returns reactive types instead of plain objects:\n@RestController public class UserController { private final UserRepository users; // reactive repository public UserController(UserRepository users) { this.users = users; } @GetMapping(\u0026#34;/users/{id}\u0026#34;) public Mono\u0026lt;User\u0026gt; byId(@PathVariable String id) { return users.findById(id); // 0 or 1 element } @GetMapping(\u0026#34;/users\u0026#34;) public Flux\u0026lt;User\u0026gt; all() { return users.findAll(); // 0..N elements, streamed } } Mono\u0026lt;T\u0026gt; = a stream of zero or one item (a single user, a save result). Flux\u0026lt;T\u0026gt; = a stream of zero to many items (a list, a feed, an SSE stream). You return the pipeline, not the value. The framework subscribes, handles backpressure (so a slow client can\u0026rsquo;t overwhelm a fast producer), and writes the response as data arrives. The golden rule: never call a blocking method inside the chain — that stalls the event loop and defeats the entire model.\nAnnotated vs functional routing WebFlux gives you two equivalent styles on the same reactive engine:\nAnnotated controllers — look just like Spring MVC (the example above). Familiar, easy to adopt.\nFunctional endpoints — define routes as data with a router function:\n@Bean RouterFunction\u0026lt;ServerResponse\u0026gt; routes(UserHandler handler) { return route(GET(\u0026#34;/users/{id}\u0026#34;), handler::byId) .andRoute(GET(\u0026#34;/users\u0026#34;), handler::all); } Both integrate with the same stack — validation, exception handling, and security all work with reactive types — so the choice is team preference. Annotated is the gentler on-ramp; functional keeps routing explicit and centralized.\nWhen (and when not) to use it This is the part most tutorials skip. WebFlux is only worth it if your whole I/O path is non-blocking. One blocking call poisons the benefit:\nReach for WebFlux when:\nYou have very high concurrency with lots of I/O wait (many slow downstream calls). Your data layer is reactive — R2DBC, reactive Mongo, reactive Redis. You\u0026rsquo;re streaming (Server-Sent Events, large datasets, chat). Stick with Spring MVC when:\nYour database driver is the classic blocking JDBC (most apps). A blocking driver behind a reactive API gives you the complexity of reactive with none of the throughput. Your team isn\u0026rsquo;t fluent in reactive debugging (stack traces are harder, and a stray .block() can deadlock). Concurrency is moderate — MVC on modern hardware (and Java 21 virtual threads) handles a lot. [needs source] Virtual threads, in particular, now give blocking-style MVC code much of the scalability that used to require reactive — so \u0026ldquo;I need to handle more connections\u0026rdquo; is no longer an automatic vote for WebFlux.\nCommon gotchas A blocking call in the chain. JDBC, RestTemplate, Thread.sleep, file I/O — any of them stall the event loop. Use reactive clients (WebClient, R2DBC) end to end. Calling .block(). It \u0026ldquo;works\u0026rdquo; in tests and then deadlocks in production. Treat it as a code smell outside of narrow, well-understood cases. Expecting reactive to be faster per request. It isn\u0026rsquo;t — latency per request is similar or slightly worse. The win is throughput and resource usage under concurrency, not raw speed. Mixing blocking JPA with WebFlux. The classic mistake. If you\u0026rsquo;re on JPA/JDBC, you probably want MVC. FAQ What\u0026#39;s the difference between Mono and Flux? Mono\u0026lt;T\u0026gt; emits zero or one item; Flux\u0026lt;T\u0026gt; emits zero to many. Use Mono for a single result (one entity, a save), Flux for collections or streams. Is WebFlux faster than Spring MVC? Not per request — latency is comparable. WebFlux wins on throughput and memory under high concurrency with non-blocking I/O. For ordinary CRUD on a blocking database, MVC is simpler and just as fast. Can I use WebFlux with a regular SQL database? Only meaningfully via a reactive driver like R2DBC. Classic JDBC/JPA is blocking, which cancels out WebFlux\u0026rsquo;s benefits — for that stack, use Spring MVC. Do virtual threads make WebFlux obsolete? No, but they narrow the gap. Java 21 virtual threads let blocking-style MVC scale to many concurrent connections, so reactive is now mostly for streaming and genuinely reactive stacks rather than raw connection counts. Key takeaway: Spring WebFlux returns Mono/Flux on a non-blocking event loop for high-concurrency, I/O-heavy, or streaming workloads. It\u0026rsquo;s only a win if your entire stack is non-blocking — with classic JDBC/JPA, stay on Spring MVC.\nBuilding plain REST first? Start with a Spring Boot REST controller.\n","permalink":"https://coderboi.com/posts/spring-webflux-rest-apis/","summary":"\u003cp\u003eSpring WebFlux is the reactive, non-blocking sibling of Spring MVC. Instead of one thread per request blocking on I/O, WebFlux runs on a small event-loop and frees the thread while it waits for the database or another service to answer. Built on Project Reactor, it\u0026rsquo;s a great fit for high-concurrency APIs and reactive data sources — but it\u0026rsquo;s not a free upgrade, and that nuance is the most important thing to get right.\u003c/p\u003e","title":"Going Async: Painless REST APIs with Spring WebFlux"},{"content":"Getting started with Spring Boot 3 is mostly the familiar Spring experience with a few hard requirements you can\u0026rsquo;t skip. The big two: Java 17 is the new minimum, and the whole framework moved from the javax.* namespace to jakarta.*. If you\u0026rsquo;re starting fresh, you barely notice. If you\u0026rsquo;re upgrading from Boot 2, those two facts explain 90% of the migration pain. Let\u0026rsquo;s get you running, then cover what changed.\nWhat\u0026rsquo;s actually new in Spring Boot 3 Java 17 baseline. Boot 3 won\u0026rsquo;t run on Java 8 or 11. You also get to use records, sealed classes, pattern matching, and text blocks without apology. Jakarta EE 9+ namespace. Every javax.persistence, javax.servlet, javax.validation import becomes jakarta.*. This is the single most common upgrade gotcha. Spring Framework 6 under the hood, with first-class observability (Micrometer tracing) and support for GraalVM native images for fast-starting, low-memory binaries. If you\u0026rsquo;re new to Spring, don\u0026rsquo;t memorize this — just know your imports say jakarta, not javax.\nGenerate a project with Spring Initializr The easiest way to start is start.spring.io. Choose Maven or Gradle, Java 17+, and add dependencies up front:\nSpring Web — REST APIs (see your first REST controller). Spring Data JPA — database persistence. Spring Boot DevTools — auto-restart on code changes during development. Generate, unzip, and open it in your IDE. You get a standard project with a single @SpringBootApplication main class — run that and you have a live app.\nRun it Start the app (./mvnw spring-boot:run, or run the main class) and you\u0026rsquo;ll see the embedded Tomcat server come up on port 8080. This is Spring Boot\u0026rsquo;s whole philosophy in action: convention over configuration. Auto-configuration inspects your classpath — saw spring-boot-starter-web? It configures Tomcat, Jackson, and MVC. Saw a JDBC driver and spring-boot-starter-data-jpa? It wires a DataSource and JPA. You write business logic; Boot wires the plumbing.\nA few first-day conventions worth knowing:\nConfiguration lives in src/main/resources/application.properties (or application.yml). Spring scans for components in the package of your main class and below — keep your code under that package or things won\u0026rsquo;t be found. spring-boot-starter-test is included by default, with JUnit 5 and an in-memory test slice setup ready to go. Where to go next Once the app runs, the natural progression is: add a controller, a service, and a repository following standard Spring layering. From there, the features that make Boot production-ready:\nProfiles (application-dev.yml, application-prod.yml) for environment-specific config. Externalized configuration so secrets and URLs come from env vars, not source. Spring Boot Actuator for health checks, metrics, and readiness/liveness probes — essential before you put anything on Kubernetes. When you\u0026rsquo;re ready to ship it, containerize it — see Dockerizing Spring Boot.\nCommon gotchas javax imports won\u0026rsquo;t resolve. You\u0026rsquo;re on Boot 3 — change them to jakarta. Most IDEs can do this with a global find-replace on imports. \u0026ldquo;Unsupported class file version.\u0026rdquo; You\u0026rsquo;re building with an older JDK. Boot 3 needs JDK 17+. App starts then exits immediately. Usually means no web starter on the classpath — add spring-boot-starter-web if you expect a running server. Beans not found. Component scanning only covers the main class\u0026rsquo;s package downward; misplaced packages are the usual culprit. FAQ What Java version does Spring Boot 3 require? Java 17 is the minimum. Java 21 is fully supported and a good choice for new projects (virtual threads, pattern matching). Java 8/11 are not supported. What\u0026#39;s the biggest change from Spring Boot 2? The javax.* → jakarta.* namespace migration. Combined with the Java 17 baseline, that\u0026rsquo;s what most Boot 2 → 3 upgrade effort goes into. Maven or Gradle? Either works identically with Spring Boot. Maven is more common in enterprise/Spring tutorials; Gradle builds are faster and more flexible. Pick what your team knows. Do I need to deploy to a separate server? No. Boot apps embed their server (Tomcat by default) and run as a plain java -jar executable. That\u0026rsquo;s also what makes them easy to containerize. Key takeaway: Getting started with Spring Boot 3 = Java 17+, jakarta.* imports, and a project from start.spring.io. Auto-configuration handles the plumbing; you add controllers/services/repositories and lean on profiles, externalized config, and Actuator when it\u0026rsquo;s time for production.\n","permalink":"https://coderboi.com/posts/spring-boot-3-getting-started/","summary":"\u003cp\u003eGetting started with Spring Boot 3 is mostly the familiar Spring experience with a few hard requirements you can\u0026rsquo;t skip. The big two: \u003cstrong\u003eJava 17 is the new minimum\u003c/strong\u003e, and the whole framework moved from the \u003ccode\u003ejavax.*\u003c/code\u003e namespace to \u003cstrong\u003e\u003ccode\u003ejakarta.*\u003c/code\u003e\u003c/strong\u003e. If you\u0026rsquo;re starting fresh, you barely notice. If you\u0026rsquo;re upgrading from Boot 2, those two facts explain 90% of the migration pain. Let\u0026rsquo;s get you running, then cover what changed.\u003c/p\u003e","title":"Surviving \u0026 Thriving on Day 1 with Spring Boot 3"},{"content":"Good Spring Boot exception handling means your API fails predictably. Out of the box, an unhandled exception gives the client either a wall of HTML (the Whitelabel Error Page) or a JSON blob that leaks your package names and stack trace. Both are bad: ugly for your frontend, and a small gift to attackers. The fix is to handle errors in one place with @RestControllerAdvice and return a consistent error body. Here\u0026rsquo;s the version I actually ship.\n1. The error response body Start with a DTO your frontend can rely on — same shape for every error, every time:\npackage com.coderboi.demo.api; import java.time.Instant; public record ApiError(String message, int status, String path, Instant timestamp) {} A record is perfect here: immutable, no boilerplate, serializes straight to JSON. The contract is what matters — your client should be able to read message, status, and path without guessing.\n2. The global exception handler @RestControllerAdvice applies across every controller. Catch what you care about specifically, and keep one catch-all so nothing escapes as a stack trace:\npackage com.coderboi.demo.web; import com.coderboi.demo.api.ApiError; import jakarta.servlet.http.HttpServletRequest; import org.springframework.http.HttpStatus; import org.springframework.http.ResponseEntity; import org.springframework.web.bind.annotation.ExceptionHandler; import org.springframework.web.bind.annotation.RestControllerAdvice; import java.time.Instant; @RestControllerAdvice public class GlobalExceptionHandler { @ExceptionHandler(IllegalArgumentException.class) public ResponseEntity\u0026lt;ApiError\u0026gt; handleBadRequest( IllegalArgumentException ex, HttpServletRequest request) { var error = new ApiError( ex.getMessage(), HttpStatus.BAD_REQUEST.value(), request.getRequestURI(), Instant.now() ); return ResponseEntity.status(HttpStatus.BAD_REQUEST).body(error); } @ExceptionHandler(Exception.class) public ResponseEntity\u0026lt;ApiError\u0026gt; handleEverythingElse( Exception ex, HttpServletRequest request) { var error = new ApiError( \u0026#34;Something went wrong. We logged it. Maybe.\u0026#34;, HttpStatus.INTERNAL_SERVER_ERROR.value(), request.getRequestURI(), Instant.now() ); return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(error); } } Now a bad request returns {\u0026quot;message\u0026quot;:\u0026quot;...\u0026quot;,\u0026quot;status\u0026quot;:400,\u0026quot;path\u0026quot;:\u0026quot;/api/...\u0026quot;,\u0026quot;timestamp\u0026quot;:\u0026quot;...\u0026quot;} instead of chaos. Two rules make this robust:\nThe catch-all message is generic on purpose. Never echo ex.getMessage() from the 500 handler — that\u0026rsquo;s how internal details leak. Log the real exception server-side; tell the client only \u0026ldquo;something went wrong.\u0026rdquo; Specific beats general. Spring picks the handler for the most specific matching exception type, so your IllegalArgumentException handler wins over the Exception catch-all. 3. Handle validation errors properly The most common \u0026ldquo;400\u0026rdquo; isn\u0026rsquo;t a thrown IllegalArgumentException — it\u0026rsquo;s bean validation failing on an @Valid request body, which throws MethodArgumentNotValidException. Handle it so clients learn which field was wrong:\n@ExceptionHandler(MethodArgumentNotValidException.class) public ResponseEntity\u0026lt;Map\u0026lt;String, String\u0026gt;\u0026gt; handleValidation(MethodArgumentNotValidException ex) { Map\u0026lt;String, String\u0026gt; errors = new HashMap\u0026lt;\u0026gt;(); ex.getBindingResult().getFieldErrors() .forEach(err -\u0026gt; errors.put(err.getField(), err.getDefaultMessage())); return ResponseEntity.badRequest().body(errors); } This is the difference between \u0026ldquo;400 Bad Request\u0026rdquo; (useless) and \u0026ldquo;email must be a valid address\u0026rdquo; (actionable).\n4. Custom exceptions for your domain Throw meaningful exceptions from your services and map each to a status:\npublic class ResourceNotFoundException extends RuntimeException { public ResourceNotFoundException(String message) { super(message); } } @ExceptionHandler(ResourceNotFoundException.class) public ResponseEntity\u0026lt;ApiError\u0026gt; handleNotFound(ResourceNotFoundException ex, HttpServletRequest request) { var error = new ApiError(ex.getMessage(), HttpStatus.NOT_FOUND.value(), request.getRequestURI(), Instant.now()); return ResponseEntity.status(HttpStatus.NOT_FOUND).body(error); } Your service does throw new ResourceNotFoundException(\u0026quot;Order \u0026quot; + id + \u0026quot; not found\u0026quot;) and the advice turns it into a clean 404. Business logic stays readable; HTTP concerns live in one place.\n5. The standards-based option: ProblemDetail Spring Boot 3 ships ProblemDetail and ErrorResponse based on RFC 9457 (Problem Details for HTTP APIs). If you\u0026rsquo;d rather follow the standard than roll your own DTO, extend ResponseEntityExceptionHandler and return ProblemDetail objects with type, title, status, detail, and instance. It interops well with clients that already understand the format. A custom DTO (like ApiError above) gives you full control; ProblemDetail gives you a standard. Pick one and be consistent.\nCommon pitfalls Leaking ex.getMessage() on 500s. Generic message to the client; full detail to the logs. No catch-all. One unmapped exception and you\u0026rsquo;re back to the Whitelabel page. Swallowing exceptions silently. Always log before returning — a clean 500 with no log entry is a debugging nightmare. Inconsistent error shapes. If /orders and /users return different error JSON, every client integration hurts. One DTO, everywhere. Wrong status codes. Validation = 400, auth = 401/403, missing = 404, your-fault = 500. Returning 200 with an error body breaks every HTTP client. FAQ @ControllerAdvice vs @RestControllerAdvice — what\u0026#39;s the difference? @RestControllerAdvice = @ControllerAdvice + @ResponseBody, so handler return values are serialized to JSON automatically. Use it for REST APIs. Does the advice apply to every controller? Yes, by default it\u0026rsquo;s global. You can scope it with @RestControllerAdvice(basePackages = ...) or by annotation if you need per-area handling. How do I hide stack traces but still debug? Return a generic body to the client and log the full exception (with a correlation/trace id) server-side. Never put stack traces in the response. Should I use ProblemDetail or a custom DTO? ProblemDetail if you want the RFC 9457 standard and client interop; a custom record if you want a bespoke shape. Don\u0026rsquo;t mix both in one API. Key takeaway: Centralize Spring Boot exception handling in a single @RestControllerAdvice: specific handlers for known cases (validation, not-found, auth), a generic catch-all that logs internally and never leaks details, and one consistent error body. Pair it with proper status codes and you\u0026rsquo;ll never debug a Whitelabel page again.\nSecuring the same API? The clean 401/403 responses from Spring Boot JWT auth flow through this exact handler.\n","permalink":"https://coderboi.com/posts/spring-boot-global-exception-handling/","summary":"\u003cp\u003eGood Spring Boot exception handling means your API fails \u003cem\u003epredictably\u003c/em\u003e. Out of the box, an unhandled exception gives the client either a wall of HTML (the Whitelabel Error Page) or a JSON blob that leaks your package names and stack trace. Both are bad: ugly for your frontend, and a small gift to attackers. The fix is to handle errors in \u003cstrong\u003eone place\u003c/strong\u003e with \u003ccode\u003e@RestControllerAdvice\u003c/code\u003e and return a consistent error body. Here\u0026rsquo;s the version I actually ship.\u003c/p\u003e","title":"Spring Boot Exception Handling That Doesn't Make You Cry"},{"content":"You want to ship an API. You don\u0026rsquo;t want to hand-configure Tomcat, write a web.xml, or fight a servlet container. A Spring Boot REST controller gets you from zero to a working HTTP endpoint in about five minutes — and this guide is the copy-paste-run version, plus just enough explanation that you actually understand what you pasted.\nWhat you need first One dependency: spring-boot-starter-web. Add it to your pom.xml (or build.gradle) and Spring Boot pulls in Spring MVC, Jackson (for JSON), and an embedded Tomcat. That last part is the magic — there\u0026rsquo;s no external server to install or deploy to. Your app is the server. Generate a skeleton at start.spring.io if you don\u0026rsquo;t have one (more on that in Surviving Day 1 with Spring Boot 3).\n1. The main class Nothing fancy. Just a class with main and @SpringBootApplication:\npackage com.coderboi.demo; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; @SpringBootApplication public class DemoApplication { public static void main(String[] args) { SpringApplication.run(DemoApplication.class, args); } } @SpringBootApplication is three annotations in one: @Configuration, @EnableAutoConfiguration, and @ComponentScan. That last one means Spring scans this package (and below) for components — which is why your controller gets picked up automatically as long as it lives under com.coderboi.demo.\n2. The controller One annotation, one method, one endpoint. That\u0026rsquo;s it:\npackage com.coderboi.demo.web; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.RestController; @RestController public class HelloController { @GetMapping(\u0026#34;/hello\u0026#34;) public String hello(@RequestParam(defaultValue = \u0026#34;coderboi\u0026#34;) String name) { return \u0026#34;Hey %s, the API is alive.\u0026#34;.formatted(name); } } Hit http://localhost:8080/hello and you get plain text back. Hit http://localhost:8080/hello?name=world and you get \u0026ldquo;Hey world, the API is alive.\u0026rdquo; No XML, no web.xml, no tears.\nThe annotations that matter @RestController — marks the class as a web controller and tells Spring every method returns the response body directly (it\u0026rsquo;s @Controller + @ResponseBody). Without the @ResponseBody half, Spring would try to resolve your return value to a view template. @GetMapping(\u0026quot;/hello\u0026quot;) — maps HTTP GET /hello to this method. There\u0026rsquo;s a sibling for each verb: @PostMapping, @PutMapping, @DeleteMapping, @PatchMapping. @RequestParam — binds a query-string parameter. defaultValue makes it optional; drop that and the param becomes required (a missing one returns 400). Returning JSON instead of text Real APIs return JSON. Return an object (or a record) and Jackson serializes it automatically — no extra config:\npublic record Greeting(String message, long timestamp) {} @GetMapping(\u0026#34;/greeting\u0026#34;) public Greeting greeting(@RequestParam(defaultValue = \u0026#34;coderboi\u0026#34;) String name) { return new Greeting(\u0026#34;Hey %s 👋\u0026#34;.formatted(name), System.currentTimeMillis()); } GET /greeting?name=world now returns {\u0026quot;message\u0026quot;:\u0026quot;Hey world 👋\u0026quot;,\u0026quot;timestamp\u0026quot;:...}. To grab a value from the URL path instead of the query string, use @PathVariable:\n@GetMapping(\u0026#34;/users/{id}\u0026#34;) public Greeting user(@PathVariable Long id) { ... } And to accept a JSON request body on a POST, take a @RequestBody parameter — Jackson deserializes the incoming JSON into your object.\nCommon gotchas Controller not found (404 on everything). It\u0026rsquo;s almost always component scanning — your controller lives outside the package of the @SpringBootApplication class. Move it under that package tree. Required param missing → 400. Add a defaultValue or required = false if the param is optional. Returning a String when you meant JSON. A raw String return is sent as text/plain. Return an object/record for JSON. Port 8080 already in use. Set server.port in application.properties, or stop whatever\u0026rsquo;s squatting on the port. FAQ @RestController vs @Controller — which do I use? Use @RestController for APIs that return data (JSON/text). @Controller is for server-rendered views (Thymeleaf, etc.) and needs @ResponseBody on methods that should return data instead of a view name. Do I need to install Tomcat? No. spring-boot-starter-web bundles an embedded Tomcat, so java -jar yourapp.jar (or running the main class) starts the server. There\u0026rsquo;s nothing separate to install or deploy to. How do I read the request body? Add a @RequestBody parameter of your DTO/record type to a @PostMapping method. Jackson deserializes the JSON automatically. How do I change the port? Set server.port=9090 in application.properties (or application.yml), or pass --server.port=9090 on startup. Key takeaway: A Spring Boot REST controller is one @RestController class with @GetMapping/@PostMapping methods; add spring-boot-starter-web, keep the controller under your main package, and return objects to get JSON for free. Five minutes, no servlet config.\nNext steps: handle errors cleanly with a global exception handler, then lock it down with JWT authentication.\n","permalink":"https://coderboi.com/posts/spring-boot-rest-controller-5-min/","summary":"\u003cp\u003eYou want to ship an API. You don\u0026rsquo;t want to hand-configure Tomcat, write a \u003ccode\u003eweb.xml\u003c/code\u003e, or fight a servlet container. A Spring Boot REST controller gets you from zero to a working HTTP endpoint in about five minutes — and this guide is the copy-paste-run version, plus just enough explanation that you actually understand what you pasted.\u003c/p\u003e\n\u003ch2 id=\"what-you-need-first\"\u003eWhat you need first\u003c/h2\u003e\n\u003cp\u003eOne dependency: \u003ccode\u003espring-boot-starter-web\u003c/code\u003e. Add it to your \u003ccode\u003epom.xml\u003c/code\u003e (or \u003ccode\u003ebuild.gradle\u003c/code\u003e) and Spring Boot pulls in Spring MVC, Jackson (for JSON), and an embedded Tomcat. That last part is the magic — there\u0026rsquo;s no external server to install or deploy to. Your app \u003cem\u003eis\u003c/em\u003e the server. Generate a skeleton at \u003ca href=\"https://start.spring.io\"\u003estart.spring.io\u003c/a\u003e if you don\u0026rsquo;t have one (more on that in \u003ca href=\"/posts/spring-boot-3-getting-started/\"\u003eSurviving Day 1 with Spring Boot 3\u003c/a\u003e).\u003c/p\u003e","title":"Your First Spring Boot REST Controller in 5 Minutes"},{"content":"I\u0026rsquo;m Touheed Khan, Staff Engineer with 11 years of experience delivering IT solutions in telecom, finance, and SEO. I use open-source tech to build industry-grade applications and reliable systems. My stack includes Java, Spring Boot, Angular, Microservices, and AI.\nI\u0026rsquo;m currently Tech Lead at Nagarro, designing and developing modules and helping run scrum. I like solving real-world problems with tech, whether that\u0026rsquo;s scalable microservices, AI solutions, or leading teams to ship.\nThis blog (CoderBoi) is where I dump my brain when my IDE inevitably crashes. It’s a messy mix of Spring Boot wizardry, AI tinkering, and desperately trying to keep Docker containers alive. I write this stuff down so I don\u0026rsquo;t have to Google my own problems 6 months from now. If you learn something, great. If you copy-paste my code and take down prod\u0026hellip; well, you didn\u0026rsquo;t hear it from me.\nWant to connect?\nPortfolio: touheedkhan.com Contact / collabs: touheedkhan.com/contact You’ll find my LinkedIn, GitHub, and email there. I’m open to new opportunities and collaborations.\n","permalink":"https://coderboi.com/about/","summary":"\u003cp\u003eI\u0026rsquo;m \u003cstrong\u003eTouheed Khan\u003c/strong\u003e, Staff Engineer with 11 years of experience delivering IT solutions in telecom, finance, and SEO. I use open-source tech to build industry-grade applications and reliable systems. My stack includes \u003cstrong\u003eJava\u003c/strong\u003e, \u003cstrong\u003eSpring Boot\u003c/strong\u003e, \u003cstrong\u003eAngular\u003c/strong\u003e, \u003cstrong\u003eMicroservices\u003c/strong\u003e, and \u003cstrong\u003eAI\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eI\u0026rsquo;m currently \u003cstrong\u003eTech Lead at Nagarro\u003c/strong\u003e, designing and developing modules and helping run scrum. I like solving real-world problems with tech, whether that\u0026rsquo;s scalable microservices, AI solutions, or leading teams to ship.\u003c/p\u003e","title":"About"}]