Performance Tuning: Chunk Size, Connection Pools, and Memory Management
Introduction
A poorly tuned batch job can be 10–100x slower than a well-tuned one. The biggest gains come from a handful of settings — chunk size, MySQL JDBC rewrite, connection pool alignment, and avoiding unnecessary object creation. This article covers each systematically.
Chunk Size — The Most Impactful Setting
Chunk size determines how many items are processed per transaction. Too small = too many round trips to the database. Too large = long transactions, high memory pressure, slower rollback on failure.
Finding the optimal chunk size
There is no universal answer — benchmark with your data and hardware. A practical starting process:
Start at chunk size 100
→ measure throughput (items/second)
→ double to 200 → measure again
→ double to 500 → measure again
→ stop when throughput plateaus or memory pressure increases
Typical sweet spots:
- Simple inserts (no processor): 500–2000
- With processing and enrichment: 100–500
- With external API calls: 50–200
- With large BLOBs or TEXT: 10–50
What chunk size controls
Items per transaction = chunk size
Commits per step = total_items / chunk_size
Network round trips to DB = commits × (1 read + 1 write per commit)
Doubling chunk size halves the commit count and roughly halves network overhead — up to the point where transaction lock time or memory becomes the bottleneck.
MySQL JDBC Tuning
The single most important MySQL performance setting for batch inserts:
spring.datasource.url=jdbc:mysql://localhost:3306/batch_db\
?rewriteBatchedStatements=true\
&useServerPrepStmts=false\
&cachePrepStmts=true\
&prepStmtCacheSize=250\
&prepStmtCacheSqlLimit=2048
| Parameter | Effect |
|---|---|
rewriteBatchedStatements=true | Rewrites INSERT batches into multi-row INSERTs — 10–50x faster for bulk inserts |
useServerPrepStmts=false | Required with rewriteBatchedStatements=true — server-side prepared statements conflict with batch rewriting |
cachePrepStmts=true | Client-side prepared statement cache — avoids re-parsing SQL on every chunk |
prepStmtCacheSize=250 | Cache up to 250 distinct statements |
prepStmtCacheSqlLimit=2048 | Cache statements up to 2048 characters |
Verify rewriting is active
Enable the MySQL general query log in a test environment and confirm you see multi-row INSERTs:
-- Without rewrite:
INSERT INTO orders (...) VALUES (...)
INSERT INTO orders (...) VALUES (...)
-- With rewriteBatchedStatements=true:
INSERT INTO orders (...) VALUES (...), (...), (...)
HikariCP Connection Pool Tuning
# Match pool size to your concurrency model
spring.datasource.hikari.maximum-pool-size=10
spring.datasource.hikari.minimum-idle=5
# Fail fast on connection exhaustion
spring.datasource.hikari.connection-timeout=3000
# Keep connections alive during long-running steps
spring.datasource.hikari.keepalive-time=60000
spring.datasource.hikari.max-lifetime=1800000
# Validate connections before use
spring.datasource.hikari.connection-test-query=SELECT 1
Pool size formula
pool_size = thread_count × connections_per_thread
For a 4-thread step where each thread holds one writer connection:
pool_size = 4 writers + 4 readers (paging queries) + 2 metadata = 10
Over-sizing the pool wastes MySQL file descriptors and memory. Under-sizing causes HikariPool-1 - Connection is not available timeouts.
Hibernate Batch Settings (for JPA-based steps)
spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.generate_statistics=false
spring.jpa.properties.hibernate.cache.use_second_level_cache=false
spring.jpa.properties.hibernate.cache.use_query_cache=false
Without hibernate.jdbc.batch_size, Hibernate issues one INSERT per entity — identical to batch_size=1. Always set this for JpaItemWriter steps.
Memory Management
JVM heap sizing
# Minimum recommended for a batch application
java -Xms512m -Xmx2g -XX:+UseG1GC \
-XX:MaxGCPauseMillis=200 \
-jar batch-app.jar
G1GC is the right choice for batch workloads — it handles large heaps and mixed allocation patterns better than Parallel GC.
What drives memory in batch jobs
- Chunk buffer: all items in the current chunk are held in memory during read→process→write. At chunk size 1000 with 10KB objects, that’s 10MB per chunk thread.
- JPA first-level cache: if using
JpaItemWriter, each entity is tracked in theEntityManager. Useclear()between chunks (covered in Article 7). - Processor caches:
ConcurrentHashMapcaches for enrichment grow unbounded unless evicted. - FlatFileItemReader line buffer: streaming readers hold one line at a time — not a concern.
- JdbcPagingItemReader page buffer: one full page of rows. At page size 1000 and 1KB rows = 1MB per reader.
Limit processor cache size
// Use Caffeine for bounded caching with TTL
@Bean
public Cache<Long, String> customerTierCache() {
return Caffeine.newBuilder()
.maximumSize(50_000)
.expireAfterWrite(Duration.ofMinutes(30))
.build();
}
<dependency>
<groupId>com.github.ben-manes.caffeine</groupId>
<artifactId>caffeine</artifactId>
</dependency>
Profiling Slow Jobs
Query 1: Step throughput per minute
SELECT
se.STEP_NAME,
se.READ_COUNT,
se.WRITE_COUNT,
TIMESTAMPDIFF(SECOND, se.START_TIME, se.END_TIME) AS duration_sec,
ROUND(se.WRITE_COUNT / NULLIF(TIMESTAMPDIFF(SECOND, se.START_TIME, se.END_TIME), 0)) AS rows_per_sec
FROM BATCH_STEP_EXECUTION se
WHERE se.JOB_EXECUTION_ID = ?
ORDER BY se.STEP_EXECUTION_ID;
Query 2: Rollback rate (indicates contention)
SELECT
STEP_NAME,
WRITE_COUNT,
ROLLBACK_COUNT,
ROUND(ROLLBACK_COUNT * 100.0 / NULLIF(COMMIT_COUNT, 0), 2) AS rollback_pct
FROM BATCH_STEP_EXECUTION
WHERE JOB_EXECUTION_ID = ?;
A rollback rate above 5% indicates write contention (deadlocks) or retry storms. Reduce chunk size or add retry with backoff.
Query 3: Skip rate
SELECT
STEP_NAME,
READ_COUNT,
(READ_SKIP_COUNT + WRITE_SKIP_COUNT + PROCESS_SKIP_COUNT) AS total_skips,
ROUND((READ_SKIP_COUNT + WRITE_SKIP_COUNT + PROCESS_SKIP_COUNT) * 100.0 / NULLIF(READ_COUNT, 0), 2) AS skip_pct
FROM BATCH_STEP_EXECUTION
WHERE JOB_EXECUTION_ID = ?;
Enable slow query log on MySQL
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1; -- log queries over 1 second
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';
Check for:
- Missing indexes on
WHERE status = 'PENDING' AND order_id BETWEEN ... - Full table scans on the batch metadata tables (
BATCH_JOB_EXECUTION,BATCH_STEP_EXECUTION) - Lock waits during concurrent chunk writes
Add indexes for batch queries
-- For paging readers that filter by status
ALTER TABLE orders ADD INDEX idx_status_order (status, order_id);
-- For batch metadata queries (auto-created by Spring Batch, verify they exist)
SHOW INDEX FROM BATCH_JOB_EXECUTION;
SHOW INDEX FROM BATCH_STEP_EXECUTION;
Performance Tuning Checklist
| Area | Setting | Recommended value |
|---|---|---|
| JDBC | rewriteBatchedStatements | true |
| JDBC | useServerPrepStmts | false (with rewrite) |
| JDBC | cachePrepStmts | true |
| Chunk size | Start | 100–500, tune up |
| HikariCP | maximum-pool-size | threads × 2 + metadata connections |
| HikariCP | connection-timeout | 3000ms |
| Hibernate | jdbc.batch_size | 50 |
| Hibernate | order_inserts | true |
| JVM | GC | G1GC |
| JVM | Heap | -Xmx = 2–4 GB for typical jobs |
| MySQL | Slow query log | Enable during profiling |
| Indexes | Status + PK | Add composite index on frequently filtered columns |
Key Takeaways
rewriteBatchedStatements=truein the MySQL JDBC URL is the single biggest throughput lever for write-heavy jobs — do this first.- Find your chunk size by doubling from 100 until throughput plateaus; typical optimum is 200–1000 for database jobs.
- Match HikariCP pool size to your thread count — under-sizing causes timeouts, over-sizing wastes resources.
- Profile with the
BATCH_STEP_EXECUTIONrows_per_sec query before adding complexity. Often the bottleneck is a missing index or missingrewriteBatchedStatements.
What’s Next
Article 25 is the final article — a complete Spring Batch best practices reference and checklist covering everything you have learned in this series.