Spring AI 2.0: Build a RAG Application with Spring Boot
Spring AI 1.0 GA shipped in May 2025. It brings the Spring programming model to AI development: a unified ChatClient API that works across Claude, OpenAI, Gemini, Ollama, and Azure OpenAI — switching AI providers is changing one dependency.
This guide builds a complete RAG (Retrieval-Augmented Generation) application that answers questions about your documentation using any AI provider.
What Is RAG?
A large language model (LLM) knows everything in its training data but nothing about your specific documents, code, or business data. RAG combines a retrieval step with generation:
User question
│
▼
1. Embed the question → vector
2. Search vector store → top-K similar document chunks
3. Add chunks to LLM prompt as context
4. LLM generates answer grounded in your documents
│
▼
Answer with citations
Without RAG, the LLM guesses. With RAG, it reasons over your actual data.
Setup
Dependencies
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Spring AI BOM — manages all Spring AI versions -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>2.0.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<!-- AI provider — pick one (or multiple) -->
<!-- Claude (Anthropic) -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-anthropic-spring-boot-starter</artifactId>
</dependency>
<!-- Or: OpenAI -->
<!-- <dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency> -->
<!-- Or: local Ollama (no API key) -->
<!-- <dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency> -->
<!-- Vector store — pgvector (PostgreSQL) -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
</dependency>
<!-- PDF + document readers -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pdf-document-reader</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-markdown-document-reader</artifactId>
</dependency>
</dependencies>
Configuration
# application.yaml — Claude
spring:
ai:
anthropic:
api-key: ${ANTHROPIC_API_KEY}
chat:
options:
model: claude-sonnet-4-6 # or claude-opus-4-7, claude-haiku-4-5
max-tokens: 4096
vectorstore:
pgvector:
initialize-schema: true
dimensions: 1536 # match your embedding model dimensions
# PostgreSQL (for pgvector)
spring:
datasource:
url: jdbc:postgresql://localhost:5432/ragdb
username: raguser
password: ragpass
Step 1: Document Ingestion
The ingestion pipeline reads documents, splits them into chunks, embeds each chunk, and stores embeddings in the vector store.
@Component
public class DocumentIngestionService {
private final VectorStore vectorStore;
private final TokenTextSplitter textSplitter;
public DocumentIngestionService(VectorStore vectorStore) {
this.vectorStore = vectorStore;
this.textSplitter = new TokenTextSplitter(
512, // chunk size (tokens)
100, // overlap (tokens)
5, // min chunk size
10000, // max chunk size
true // keep separator
);
}
public void ingestPdf(Resource pdfResource) {
// Read PDF
PagePdfDocumentReader reader = new PagePdfDocumentReader(pdfResource,
PdfDocumentReaderConfig.builder()
.withPageExtractedTextFormatter(
ExtractedTextFormatter.builder().withNumberOfTopPagesToSkipBeforeDelete(0).build()
)
.build());
List<Document> documents = reader.get();
// Split into chunks
List<Document> chunks = textSplitter.apply(documents);
// Add metadata
chunks.forEach(chunk -> {
chunk.getMetadata().put("source", pdfResource.getFilename());
chunk.getMetadata().put("ingestedAt", Instant.now().toString());
});
// Embed and store (VectorStore handles embedding automatically)
vectorStore.add(chunks);
log.info("Ingested {} chunks from {}", chunks.size(), pdfResource.getFilename());
}
public void ingestMarkdown(Resource markdownResource) {
MarkdownDocumentReader reader = new MarkdownDocumentReader(markdownResource,
MarkdownDocumentReaderConfig.builder()
.withHorizontalRuleCreateDocument(true)
.withIncludeCodeBlock(true)
.build());
List<Document> chunks = textSplitter.apply(reader.get());
vectorStore.add(chunks);
}
public void ingestUrl(String url) {
// Web page reader
YoutubeTranscriptReader reader = new YoutubeTranscriptReader(url);
// Or: TikaDocumentReader for HTML pages
TikaDocumentReader tikaReader = new TikaDocumentReader(url);
List<Document> chunks = textSplitter.apply(tikaReader.get());
vectorStore.add(chunks);
}
}
Step 2: RAG Query Pipeline
@Service
public class RagService {
private final ChatClient chatClient;
private final VectorStore vectorStore;
public RagService(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) {
this.chatClient = chatClientBuilder.build();
this.vectorStore = vectorStore;
}
public String query(String question) {
// Retrieve relevant documents
List<Document> relevantDocs = vectorStore.similaritySearch(
SearchRequest.builder()
.query(question)
.topK(5) // retrieve top 5 most similar chunks
.similarityThreshold(0.7) // minimum similarity score (0-1)
.build()
);
// Build context from retrieved documents
String context = relevantDocs.stream()
.map(Document::getFormattedContent)
.collect(Collectors.joining("\n\n---\n\n"));
// Generate answer with context
return chatClient.prompt()
.system("""
You are a helpful assistant that answers questions based on the provided context.
Answer only based on the context provided. If the context doesn't contain enough
information to answer the question, say so clearly.
Do not make up information.
""")
.user(u -> u.text("""
Context:
{context}
Question: {question}
Answer the question based on the context above.
""")
.param("context", context)
.param("question", question))
.call()
.content();
}
public RagResponse queryWithCitations(String question) {
List<Document> relevantDocs = vectorStore.similaritySearch(
SearchRequest.builder().query(question).topK(5).build()
);
String context = buildContext(relevantDocs);
String answer = chatClient.prompt()
.system("Answer based on the context. Format citations as [Source: filename].")
.user(u -> u.text("Context:\n{context}\n\nQuestion: {question}")
.param("context", context)
.param("question", question))
.call()
.content();
List<String> sources = relevantDocs.stream()
.map(doc -> (String) doc.getMetadata().get("source"))
.distinct()
.toList();
return new RagResponse(answer, sources);
}
private String buildContext(List<Document> docs) {
return IntStream.range(0, docs.size())
.mapToObj(i -> String.format("[%d] %s\nSource: %s",
i + 1,
docs.get(i).getFormattedContent(),
docs.get(i).getMetadata().get("source")))
.collect(Collectors.joining("\n\n"));
}
}
Step 3: REST API
@RestController
@RequestMapping("/rag")
public class RagController {
private final RagService ragService;
private final DocumentIngestionService ingestionService;
@PostMapping("/ingest")
public ResponseEntity<Map<String, String>> ingest(@RequestParam("file") MultipartFile file) {
Resource resource = file.getResource();
if (file.getOriginalFilename().endsWith(".pdf")) {
ingestionService.ingestPdf(resource);
} else if (file.getOriginalFilename().endsWith(".md")) {
ingestionService.ingestMarkdown(resource);
} else {
return ResponseEntity.badRequest()
.body(Map.of("error", "Unsupported file type. Use PDF or Markdown."));
}
return ResponseEntity.ok(Map.of(
"status", "ingested",
"file", file.getOriginalFilename()
));
}
@GetMapping("/query")
public RagResponse query(@RequestParam String question) {
return ragService.queryWithCitations(question);
}
}
Spring AI ChatClient: Full API
Basic chat
String response = chatClient.prompt()
.user("Explain Spring Boot auto-configuration in two sentences")
.call()
.content();
Streaming response (for UI)
@GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> streamChat(@RequestParam String message) {
return chatClient.prompt()
.user(message)
.stream()
.content();
}
Structured output (parse to Java objects)
public record OrderSummary(
String orderId,
String customerName,
BigDecimal total,
List<String> items
) {}
OrderSummary summary = chatClient.prompt()
.user("Extract order details from: " + orderText)
.call()
.entity(OrderSummary.class);
Spring AI automatically generates the JSON schema from the record and instructs the model to return matching JSON.
Conversation with memory
@Bean
public ChatMemory chatMemory() {
return new InMemoryChatMemory();
}
@Service
public class ConversationService {
private final ChatClient chatClient;
public ConversationService(ChatClient.Builder builder, ChatMemory chatMemory) {
this.chatClient = builder
.defaultAdvisors(new MessageChatMemoryAdvisor(chatMemory))
.build();
}
public String chat(String conversationId, String message) {
return chatClient.prompt()
.user(message)
.advisors(a -> a.param(CHAT_MEMORY_CONVERSATION_ID_KEY, conversationId))
.call()
.content();
}
}
Switching AI Providers
One of Spring AI’s biggest advantages: switch from Claude to OpenAI or Ollama by changing one dependency and two config lines.
# Use Claude
spring.ai.anthropic.api-key: ${ANTHROPIC_API_KEY}
spring.ai.anthropic.chat.options.model: claude-sonnet-4-6
# Use OpenAI (change dependency to spring-ai-openai-spring-boot-starter)
spring.ai.openai.api-key: ${OPENAI_API_KEY}
spring.ai.openai.chat.options.model: gpt-4o
# Use Ollama — local, no API key (change dependency to spring-ai-ollama-spring-boot-starter)
spring.ai.ollama.base-url: http://localhost:11434
spring.ai.ollama.chat.options.model: llama3.1
No code changes. The ChatClient interface is the same regardless of provider.
Vector Store Options
| Vector Store | Best for | Spring AI starter |
|---|---|---|
| pgvector (PostgreSQL) | Apps already using PostgreSQL | spring-ai-pgvector-store-spring-boot-starter |
| Redis | Low-latency lookups, in-memory | spring-ai-redis-store-spring-boot-starter |
| Pinecone | Managed, large scale | spring-ai-pinecone-store-spring-boot-starter |
| Qdrant | Self-hosted, high performance | spring-ai-qdrant-store-spring-boot-starter |
| Chroma | Development and small projects | spring-ai-chroma-store-spring-boot-starter |
| Weaviate | Hybrid search (vector + keyword) | spring-ai-weaviate-store-spring-boot-starter |
For most Spring Boot applications already using PostgreSQL, pgvector is the simplest choice — no extra infrastructure needed.
MCP (Model Context Protocol) Integration
Spring AI 2.0 integrates Spring Boot as an MCP server or client. Your Spring Boot service can expose tools that Claude or other AI models call:
@Configuration
public class McpToolsConfig {
@Bean
@Description("Find orders by customer ID and date range")
public Function<OrderSearchRequest, List<Order>> findOrders(OrderService orderService) {
return request -> orderService.findOrders(
request.customerId(),
request.startDate(),
request.endDate()
);
}
}
spring:
ai:
mcp:
server:
enabled: true
name: order-service-mcp
AI models connected via MCP can now call findOrders to retrieve real data — combining AI reasoning with live business data.
Local Development with Ollama
Run AI models locally for development (no API costs, no internet required):
# Install Ollama
brew install ollama # macOS
# Pull a model
ollama pull llama3.1 # 8B model, ~5 GB
ollama pull nomic-embed-text # embedding model
# Start Ollama server (usually starts automatically)
ollama serve
# application-dev.yaml — use Ollama locally
spring:
ai:
ollama:
base-url: http://localhost:11434
chat:
options:
model: llama3.1
embedding:
options:
model: nomic-embed-text
Switch to Claude for production by activating the prod profile — zero code changes.
Quick Reference
// Basic RAG query
chatClient.prompt()
.user(u -> u.text("Context: {ctx}\nQuestion: {q}")
.param("ctx", context)
.param("q", question))
.call()
.content();
// Vector search
vectorStore.similaritySearch(
SearchRequest.builder().query(question).topK(5).similarityThreshold(0.7).build()
);
// Ingest document
vectorStore.add(textSplitter.apply(new PagePdfDocumentReader(resource).get()));
// Stream response
chatClient.prompt().user(message).stream().content(); // returns Flux<String>
// Structured output
chatClient.prompt().user(text).call().entity(MyRecord.class);
Summary
Spring AI 2.0 brings RAG to Spring Boot with a clean, provider-neutral API. Build a RAG pipeline in three steps: ingest documents into a vector store, retrieve relevant chunks via similarity search, and prompt the LLM with retrieved context. Use pgvector for PostgreSQL teams (no extra infrastructure). Switch between Claude, OpenAI, and Ollama with a dependency swap. Spring AI’s MCP integration lets AI models call your Spring Boot services as tools — combining AI reasoning with live business data.
