Performance Tuning: Chunk Size, Connection Pools, and Memory Management

Introduction

A poorly tuned batch job can be 10–100x slower than a well-tuned one. The biggest gains come from a handful of settings — chunk size, MySQL JDBC rewrite, connection pool alignment, and avoiding unnecessary object creation. This article covers each systematically.


Chunk Size — The Most Impactful Setting

Chunk size determines how many items are processed per transaction. Too small = too many round trips to the database. Too large = long transactions, high memory pressure, slower rollback on failure.

Finding the optimal chunk size

There is no universal answer — benchmark with your data and hardware. A practical starting process:

Start at chunk size 100
→ measure throughput (items/second)
→ double to 200 → measure again
→ double to 500 → measure again
→ stop when throughput plateaus or memory pressure increases

Typical sweet spots:

  • Simple inserts (no processor): 500–2000
  • With processing and enrichment: 100–500
  • With external API calls: 50–200
  • With large BLOBs or TEXT: 10–50

What chunk size controls

Items per transaction = chunk size
Commits per step = total_items / chunk_size
Network round trips to DB = commits × (1 read + 1 write per commit)

Doubling chunk size halves the commit count and roughly halves network overhead — up to the point where transaction lock time or memory becomes the bottleneck.


MySQL JDBC Tuning

The single most important MySQL performance setting for batch inserts:

spring.datasource.url=jdbc:mysql://localhost:3306/batch_db\
  ?rewriteBatchedStatements=true\
  &useServerPrepStmts=false\
  &cachePrepStmts=true\
  &prepStmtCacheSize=250\
  &prepStmtCacheSqlLimit=2048
ParameterEffect
rewriteBatchedStatements=trueRewrites INSERT batches into multi-row INSERTs — 10–50x faster for bulk inserts
useServerPrepStmts=falseRequired with rewriteBatchedStatements=true — server-side prepared statements conflict with batch rewriting
cachePrepStmts=trueClient-side prepared statement cache — avoids re-parsing SQL on every chunk
prepStmtCacheSize=250Cache up to 250 distinct statements
prepStmtCacheSqlLimit=2048Cache statements up to 2048 characters

Verify rewriting is active

Enable the MySQL general query log in a test environment and confirm you see multi-row INSERTs:

-- Without rewrite:
INSERT INTO orders (...) VALUES (...)
INSERT INTO orders (...) VALUES (...)

-- With rewriteBatchedStatements=true:
INSERT INTO orders (...) VALUES (...), (...), (...)

HikariCP Connection Pool Tuning

# Match pool size to your concurrency model
spring.datasource.hikari.maximum-pool-size=10
spring.datasource.hikari.minimum-idle=5

# Fail fast on connection exhaustion
spring.datasource.hikari.connection-timeout=3000

# Keep connections alive during long-running steps
spring.datasource.hikari.keepalive-time=60000
spring.datasource.hikari.max-lifetime=1800000

# Validate connections before use
spring.datasource.hikari.connection-test-query=SELECT 1

Pool size formula

pool_size = thread_count × connections_per_thread

For a 4-thread step where each thread holds one writer connection:

pool_size = 4 writers + 4 readers (paging queries) + 2 metadata = 10

Over-sizing the pool wastes MySQL file descriptors and memory. Under-sizing causes HikariPool-1 - Connection is not available timeouts.


Hibernate Batch Settings (for JPA-based steps)

spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.generate_statistics=false
spring.jpa.properties.hibernate.cache.use_second_level_cache=false
spring.jpa.properties.hibernate.cache.use_query_cache=false

Without hibernate.jdbc.batch_size, Hibernate issues one INSERT per entity — identical to batch_size=1. Always set this for JpaItemWriter steps.


Memory Management

JVM heap sizing

# Minimum recommended for a batch application
java -Xms512m -Xmx2g -XX:+UseG1GC \
     -XX:MaxGCPauseMillis=200 \
     -jar batch-app.jar

G1GC is the right choice for batch workloads — it handles large heaps and mixed allocation patterns better than Parallel GC.

What drives memory in batch jobs

  1. Chunk buffer: all items in the current chunk are held in memory during read→process→write. At chunk size 1000 with 10KB objects, that’s 10MB per chunk thread.
  2. JPA first-level cache: if using JpaItemWriter, each entity is tracked in the EntityManager. Use clear() between chunks (covered in Article 7).
  3. Processor caches: ConcurrentHashMap caches for enrichment grow unbounded unless evicted.
  4. FlatFileItemReader line buffer: streaming readers hold one line at a time — not a concern.
  5. JdbcPagingItemReader page buffer: one full page of rows. At page size 1000 and 1KB rows = 1MB per reader.

Limit processor cache size

// Use Caffeine for bounded caching with TTL
@Bean
public Cache<Long, String> customerTierCache() {
    return Caffeine.newBuilder()
            .maximumSize(50_000)
            .expireAfterWrite(Duration.ofMinutes(30))
            .build();
}
<dependency>
    <groupId>com.github.ben-manes.caffeine</groupId>
    <artifactId>caffeine</artifactId>
</dependency>

Profiling Slow Jobs

Query 1: Step throughput per minute

SELECT
    se.STEP_NAME,
    se.READ_COUNT,
    se.WRITE_COUNT,
    TIMESTAMPDIFF(SECOND, se.START_TIME, se.END_TIME) AS duration_sec,
    ROUND(se.WRITE_COUNT / NULLIF(TIMESTAMPDIFF(SECOND, se.START_TIME, se.END_TIME), 0)) AS rows_per_sec
FROM BATCH_STEP_EXECUTION se
WHERE se.JOB_EXECUTION_ID = ?
ORDER BY se.STEP_EXECUTION_ID;

Query 2: Rollback rate (indicates contention)

SELECT
    STEP_NAME,
    WRITE_COUNT,
    ROLLBACK_COUNT,
    ROUND(ROLLBACK_COUNT * 100.0 / NULLIF(COMMIT_COUNT, 0), 2) AS rollback_pct
FROM BATCH_STEP_EXECUTION
WHERE JOB_EXECUTION_ID = ?;

A rollback rate above 5% indicates write contention (deadlocks) or retry storms. Reduce chunk size or add retry with backoff.

Query 3: Skip rate

SELECT
    STEP_NAME,
    READ_COUNT,
    (READ_SKIP_COUNT + WRITE_SKIP_COUNT + PROCESS_SKIP_COUNT) AS total_skips,
    ROUND((READ_SKIP_COUNT + WRITE_SKIP_COUNT + PROCESS_SKIP_COUNT) * 100.0 / NULLIF(READ_COUNT, 0), 2) AS skip_pct
FROM BATCH_STEP_EXECUTION
WHERE JOB_EXECUTION_ID = ?;

Enable slow query log on MySQL

SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;  -- log queries over 1 second
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';

Check for:

  • Missing indexes on WHERE status = 'PENDING' AND order_id BETWEEN ...
  • Full table scans on the batch metadata tables (BATCH_JOB_EXECUTION, BATCH_STEP_EXECUTION)
  • Lock waits during concurrent chunk writes

Add indexes for batch queries

-- For paging readers that filter by status
ALTER TABLE orders ADD INDEX idx_status_order (status, order_id);

-- For batch metadata queries (auto-created by Spring Batch, verify they exist)
SHOW INDEX FROM BATCH_JOB_EXECUTION;
SHOW INDEX FROM BATCH_STEP_EXECUTION;

Performance Tuning Checklist

AreaSettingRecommended value
JDBCrewriteBatchedStatementstrue
JDBCuseServerPrepStmtsfalse (with rewrite)
JDBCcachePrepStmtstrue
Chunk sizeStart100–500, tune up
HikariCPmaximum-pool-sizethreads × 2 + metadata connections
HikariCPconnection-timeout3000ms
Hibernatejdbc.batch_size50
Hibernateorder_insertstrue
JVMGCG1GC
JVMHeap-Xmx = 2–4 GB for typical jobs
MySQLSlow query logEnable during profiling
IndexesStatus + PKAdd composite index on frequently filtered columns

Key Takeaways

  • rewriteBatchedStatements=true in the MySQL JDBC URL is the single biggest throughput lever for write-heavy jobs — do this first.
  • Find your chunk size by doubling from 100 until throughput plateaus; typical optimum is 200–1000 for database jobs.
  • Match HikariCP pool size to your thread count — under-sizing causes timeouts, over-sizing wastes resources.
  • Profile with the BATCH_STEP_EXECUTION rows_per_sec query before adding complexity. Often the bottleneck is a missing index or missing rewriteBatchedStatements.

What’s Next

Article 25 is the final article — a complete Spring Batch best practices reference and checklist covering everything you have learned in this series.