Spring AI 2.0: Build a RAG Application with Spring Boot

May 03, 2026 Abhay 7 min read

Spring AI 2.0: Build a RAG Application with Spring Boot

Spring AI 1.0 GA shipped in May 2025. It brings the Spring programming model to AI development: a unified ChatClient API that works across Claude, OpenAI, Gemini, Ollama, and Azure OpenAI — switching AI providers is changing one dependency.

This guide builds a complete RAG (Retrieval-Augmented Generation) application that answers questions about your documentation using any AI provider.

What Is RAG?

A large language model (LLM) knows everything in its training data but nothing about your specific documents, code, or business data. RAG combines a retrieval step with generation:

User question
     │
     ▼
1. Embed the question → vector
2. Search vector store → top-K similar document chunks
3. Add chunks to LLM prompt as context
4. LLM generates answer grounded in your documents
     │
     ▼
Answer with citations

Without RAG, the LLM guesses. With RAG, it reasons over your actual data.

Setup

Dependencies

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <!-- Spring AI BOM — manages all Spring AI versions -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-bom</artifactId>
        <version>2.0.0</version>
        <type>pom</type>
        <scope>import</scope>
    </dependency>

    <!-- AI provider — pick one (or multiple) -->
    <!-- Claude (Anthropic) -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-anthropic-spring-boot-starter</artifactId>
    </dependency>

    <!-- Or: OpenAI -->
    <!-- <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    </dependency> -->

    <!-- Or: local Ollama (no API key) -->
    <!-- <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    </dependency> -->

    <!-- Vector store — pgvector (PostgreSQL) -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
    </dependency>

    <!-- PDF + document readers -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-pdf-document-reader</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-markdown-document-reader</artifactId>
    </dependency>
</dependencies>

Configuration

# application.yaml — Claude
spring:
  ai:
    anthropic:
      api-key: ${ANTHROPIC_API_KEY}
      chat:
        options:
          model: claude-sonnet-4-6  # or claude-opus-4-7, claude-haiku-4-5
          max-tokens: 4096
    vectorstore:
      pgvector:
        initialize-schema: true
        dimensions: 1536  # match your embedding model dimensions

# PostgreSQL (for pgvector)
spring:
  datasource:
    url: jdbc:postgresql://localhost:5432/ragdb
    username: raguser
    password: ragpass

Step 1: Document Ingestion

The ingestion pipeline reads documents, splits them into chunks, embeds each chunk, and stores embeddings in the vector store.

@Component
public class DocumentIngestionService {

    private final VectorStore vectorStore;
    private final TokenTextSplitter textSplitter;

    public DocumentIngestionService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
        this.textSplitter = new TokenTextSplitter(
            512,   // chunk size (tokens)
            100,   // overlap (tokens)
            5,     // min chunk size
            10000, // max chunk size
            true   // keep separator
        );
    }

    public void ingestPdf(Resource pdfResource) {
        // Read PDF
        PagePdfDocumentReader reader = new PagePdfDocumentReader(pdfResource,
            PdfDocumentReaderConfig.builder()
                .withPageExtractedTextFormatter(
                    ExtractedTextFormatter.builder().withNumberOfTopPagesToSkipBeforeDelete(0).build()
                )
                .build());

        List<Document> documents = reader.get();

        // Split into chunks
        List<Document> chunks = textSplitter.apply(documents);

        // Add metadata
        chunks.forEach(chunk -> {
            chunk.getMetadata().put("source", pdfResource.getFilename());
            chunk.getMetadata().put("ingestedAt", Instant.now().toString());
        });

        // Embed and store (VectorStore handles embedding automatically)
        vectorStore.add(chunks);

        log.info("Ingested {} chunks from {}", chunks.size(), pdfResource.getFilename());
    }

    public void ingestMarkdown(Resource markdownResource) {
        MarkdownDocumentReader reader = new MarkdownDocumentReader(markdownResource,
            MarkdownDocumentReaderConfig.builder()
                .withHorizontalRuleCreateDocument(true)
                .withIncludeCodeBlock(true)
                .build());

        List<Document> chunks = textSplitter.apply(reader.get());
        vectorStore.add(chunks);
    }

    public void ingestUrl(String url) {
        // Web page reader
        YoutubeTranscriptReader reader = new YoutubeTranscriptReader(url);
        // Or: TikaDocumentReader for HTML pages
        TikaDocumentReader tikaReader = new TikaDocumentReader(url);
        List<Document> chunks = textSplitter.apply(tikaReader.get());
        vectorStore.add(chunks);
    }
}

Step 2: RAG Query Pipeline

@Service
public class RagService {

    private final ChatClient chatClient;
    private final VectorStore vectorStore;

    public RagService(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) {
        this.chatClient = chatClientBuilder.build();
        this.vectorStore = vectorStore;
    }

    public String query(String question) {
        // Retrieve relevant documents
        List<Document> relevantDocs = vectorStore.similaritySearch(
            SearchRequest.builder()
                .query(question)
                .topK(5)                    // retrieve top 5 most similar chunks
                .similarityThreshold(0.7)   // minimum similarity score (0-1)
                .build()
        );

        // Build context from retrieved documents
        String context = relevantDocs.stream()
            .map(Document::getFormattedContent)
            .collect(Collectors.joining("\n\n---\n\n"));

        // Generate answer with context
        return chatClient.prompt()
            .system("""
                You are a helpful assistant that answers questions based on the provided context.
                Answer only based on the context provided. If the context doesn't contain enough
                information to answer the question, say so clearly.
                Do not make up information.
                """)
            .user(u -> u.text("""
                Context:
                {context}
                
                Question: {question}
                
                Answer the question based on the context above.
                """)
                .param("context", context)
                .param("question", question))
            .call()
            .content();
    }

    public RagResponse queryWithCitations(String question) {
        List<Document> relevantDocs = vectorStore.similaritySearch(
            SearchRequest.builder().query(question).topK(5).build()
        );

        String context = buildContext(relevantDocs);

        String answer = chatClient.prompt()
            .system("Answer based on the context. Format citations as [Source: filename].")
            .user(u -> u.text("Context:\n{context}\n\nQuestion: {question}")
                .param("context", context)
                .param("question", question))
            .call()
            .content();

        List<String> sources = relevantDocs.stream()
            .map(doc -> (String) doc.getMetadata().get("source"))
            .distinct()
            .toList();

        return new RagResponse(answer, sources);
    }

    private String buildContext(List<Document> docs) {
        return IntStream.range(0, docs.size())
            .mapToObj(i -> String.format("[%d] %s\nSource: %s",
                i + 1,
                docs.get(i).getFormattedContent(),
                docs.get(i).getMetadata().get("source")))
            .collect(Collectors.joining("\n\n"));
    }
}

Step 3: REST API

@RestController
@RequestMapping("/rag")
public class RagController {

    private final RagService ragService;
    private final DocumentIngestionService ingestionService;

    @PostMapping("/ingest")
    public ResponseEntity<Map<String, String>> ingest(@RequestParam("file") MultipartFile file) {
        Resource resource = file.getResource();

        if (file.getOriginalFilename().endsWith(".pdf")) {
            ingestionService.ingestPdf(resource);
        } else if (file.getOriginalFilename().endsWith(".md")) {
            ingestionService.ingestMarkdown(resource);
        } else {
            return ResponseEntity.badRequest()
                .body(Map.of("error", "Unsupported file type. Use PDF or Markdown."));
        }

        return ResponseEntity.ok(Map.of(
            "status", "ingested",
            "file", file.getOriginalFilename()
        ));
    }

    @GetMapping("/query")
    public RagResponse query(@RequestParam String question) {
        return ragService.queryWithCitations(question);
    }
}

Spring AI ChatClient: Full API

Basic chat

String response = chatClient.prompt()
    .user("Explain Spring Boot auto-configuration in two sentences")
    .call()
    .content();

Streaming response (for UI)

@GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> streamChat(@RequestParam String message) {
    return chatClient.prompt()
        .user(message)
        .stream()
        .content();
}

Structured output (parse to Java objects)

public record OrderSummary(
    String orderId,
    String customerName,
    BigDecimal total,
    List<String> items
) {}

OrderSummary summary = chatClient.prompt()
    .user("Extract order details from: " + orderText)
    .call()
    .entity(OrderSummary.class);

Spring AI automatically generates the JSON schema from the record and instructs the model to return matching JSON.

Conversation with memory

@Bean
public ChatMemory chatMemory() {
    return new InMemoryChatMemory();
}

@Service
public class ConversationService {

    private final ChatClient chatClient;

    public ConversationService(ChatClient.Builder builder, ChatMemory chatMemory) {
        this.chatClient = builder
            .defaultAdvisors(new MessageChatMemoryAdvisor(chatMemory))
            .build();
    }

    public String chat(String conversationId, String message) {
        return chatClient.prompt()
            .user(message)
            .advisors(a -> a.param(CHAT_MEMORY_CONVERSATION_ID_KEY, conversationId))
            .call()
            .content();
    }
}

Switching AI Providers

One of Spring AI’s biggest advantages: switch from Claude to OpenAI or Ollama by changing one dependency and two config lines.

# Use Claude
spring.ai.anthropic.api-key: ${ANTHROPIC_API_KEY}
spring.ai.anthropic.chat.options.model: claude-sonnet-4-6

# Use OpenAI (change dependency to spring-ai-openai-spring-boot-starter)
spring.ai.openai.api-key: ${OPENAI_API_KEY}
spring.ai.openai.chat.options.model: gpt-4o

# Use Ollama — local, no API key (change dependency to spring-ai-ollama-spring-boot-starter)
spring.ai.ollama.base-url: http://localhost:11434
spring.ai.ollama.chat.options.model: llama3.1

No code changes. The ChatClient interface is the same regardless of provider.

Vector Store Options

Vector Store	Best for	Spring AI starter
pgvector (PostgreSQL)	Apps already using PostgreSQL	`spring-ai-pgvector-store-spring-boot-starter`
Redis	Low-latency lookups, in-memory	`spring-ai-redis-store-spring-boot-starter`
Pinecone	Managed, large scale	`spring-ai-pinecone-store-spring-boot-starter`
Qdrant	Self-hosted, high performance	`spring-ai-qdrant-store-spring-boot-starter`
Chroma	Development and small projects	`spring-ai-chroma-store-spring-boot-starter`
Weaviate	Hybrid search (vector + keyword)	`spring-ai-weaviate-store-spring-boot-starter`

For most Spring Boot applications already using PostgreSQL, pgvector is the simplest choice — no extra infrastructure needed.

MCP (Model Context Protocol) Integration

Spring AI 2.0 integrates Spring Boot as an MCP server or client. Your Spring Boot service can expose tools that Claude or other AI models call:

@Configuration
public class McpToolsConfig {

    @Bean
    @Description("Find orders by customer ID and date range")
    public Function<OrderSearchRequest, List<Order>> findOrders(OrderService orderService) {
        return request -> orderService.findOrders(
            request.customerId(),
            request.startDate(),
            request.endDate()
        );
    }
}

spring:
  ai:
    mcp:
      server:
        enabled: true
        name: order-service-mcp

AI models connected via MCP can now call findOrders to retrieve real data — combining AI reasoning with live business data.

Local Development with Ollama

Run AI models locally for development (no API costs, no internet required):

# Install Ollama
brew install ollama   # macOS

# Pull a model
ollama pull llama3.1        # 8B model, ~5 GB
ollama pull nomic-embed-text  # embedding model

# Start Ollama server (usually starts automatically)
ollama serve

# application-dev.yaml — use Ollama locally
spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        options:
          model: llama3.1
      embedding:
        options:
          model: nomic-embed-text

Switch to Claude for production by activating the prod profile — zero code changes.

Quick Reference

// Basic RAG query
chatClient.prompt()
    .user(u -> u.text("Context: {ctx}\nQuestion: {q}")
        .param("ctx", context)
        .param("q", question))
    .call()
    .content();

// Vector search
vectorStore.similaritySearch(
    SearchRequest.builder().query(question).topK(5).similarityThreshold(0.7).build()
);

// Ingest document
vectorStore.add(textSplitter.apply(new PagePdfDocumentReader(resource).get()));

// Stream response
chatClient.prompt().user(message).stream().content();  // returns Flux<String>

// Structured output
chatClient.prompt().user(text).call().entity(MyRecord.class);

Summary

Spring AI 2.0 brings RAG to Spring Boot with a clean, provider-neutral API. Build a RAG pipeline in three steps: ingest documents into a vector store, retrieve relevant chunks via similarity search, and prompt the LLM with retrieved context. Use pgvector for PostgreSQL teams (no extra infrastructure). Switch between Claude, OpenAI, and Ollama with a dependency swap. Spring AI’s MCP integration lets AI models call your Spring Boot services as tools — combining AI reasoning with live business data.

Abhay Pratap Singh

DevOps Engineer passionate about automation, cloud infrastructure, and self-hosted tools. I write about Kubernetes, Terraform, DNS, and everything in between.

GitHub LinkedIn RSS