Parallel Streams: ForkJoinPool, Spliterators, and When NOT to Parallelize
How Parallel Streams Work
A parallel stream splits its source into sub-sequences, processes each sub-sequence on a separate thread, and merges the results. The mechanism is ForkJoin — specifically ForkJoinPool.commonPool(), a shared thread pool managed by the JVM.
// Sequential — processes on the calling thread
List<String> seq = names.stream()
.map(String::toUpperCase)
.collect(Collectors.toList());
// Parallel — splits work across ForkJoinPool.commonPool()
List<String> par = names.parallelStream()
.map(String::toUpperCase)
.collect(Collectors.toList());
// Convert existing stream to parallel
List<String> par2 = names.stream()
.parallel() // switch to parallel
.map(String::toUpperCase)
.collect(Collectors.toList());
A single call to .parallel() or .sequential() anywhere in a pipeline applies to the entire pipeline.
ForkJoinPool.commonPool
The common pool is shared across the entire JVM process. Its thread count defaults to Runtime.getRuntime().availableProcessors() - 1 (leaving one CPU for the main thread).
System.out.println(ForkJoinPool.commonPool().getParallelism());
// e.g., 7 on an 8-core machine
You can change the pool size with a system property:
-Djava.util.concurrent.ForkJoinPool.common.parallelism=4
The Shared Pool Problem
Because the common pool is shared, one slow parallel stream can starve all others. A blocking I/O call inside a parallel stream blocks a ForkJoin worker thread, which can cascade:
// DANGEROUS: blocking inside parallel stream consumes ForkJoin threads
orders.parallelStream()
.map(order -> httpClient.fetchDetails(order.getId())) // blocks!
.collect(Collectors.toList());
Fix: Use a custom pool for blocking operations:
ForkJoinPool customPool = new ForkJoinPool(8);
List<OrderDetail> results = customPool.submit(() ->
orders.parallelStream()
.map(order -> httpClient.fetchDetails(order.getId()))
.collect(Collectors.toList())
).get();
customPool.shutdown();
Spliterators
The engine behind stream splitting is Spliterator<T> — an iterator that knows how to split itself for parallel processing.
public interface Spliterator<T> {
boolean tryAdvance(Consumer<? super T> action); // process next element
Spliterator<T> trySplit(); // split off half
long estimateSize(); // estimated remaining elements
int characteristics(); // ORDERED, SIZED, DISTINCT, etc.
}
When a stream goes parallel:
trySplit()is called recursively to create sub-tasks down to a threshold- Each sub-task processes its split and produces partial results
- Results are combined using the pipeline’s combiner
Spliterator Characteristics
Characteristics tell the framework what guarantees the data source provides, enabling optimisations:
| Characteristic | Meaning | Example sources |
|---|---|---|
ORDERED | Elements have a defined encounter order | List, LinkedList, Arrays.stream |
SORTED | Elements are sorted | TreeSet, sorted stream |
SIZED | estimateSize() is exact | ArrayList, HashSet |
DISTINCT | No duplicate elements | Set |
NONNULL | No null elements | ConcurrentHashMap |
IMMUTABLE | Source cannot be modified | List.of() (Java 9+) |
SUBSIZED | Sub-spliterators are also SIZED | Arrays |
ArrayList is ORDERED + SIZED + SUBSIZED — it splits perfectly. HashSet is SIZED + DISTINCT but not ORDERED — it can’t guarantee encounter order.
When Parallel Is Actually Faster
Parallel streams have overhead: task splitting, thread coordination, result merging. They are only faster when the computational savings outweigh this overhead.
Conditions for parallel wins
- Large data set — typically 10,000+ elements; below that, overhead dominates
- Computationally expensive per-element operation — CPU-bound work that takes non-trivial time
- Splittable source —
ArrayList, arrays,IntStream.rangesplit evenly;LinkedListdoes not - No ordering requirement — or the ordering is cheap to restore
- No shared mutable state — thread-safe operations only
Quick benchmark: CPU-bound work
long N = 10_000_000L;
// Sequential
long start = System.nanoTime();
long sumSeq = LongStream.range(0, N)
.map(n -> n * n % 1000000007L)
.sum();
long seqMs = (System.nanoTime() - start) / 1_000_000;
// Parallel
start = System.nanoTime();
long sumPar = LongStream.range(0, N)
.parallel()
.map(n -> n * n % 1000000007L)
.sum();
long parMs = (System.nanoTime() - start) / 1_000_000;
System.out.printf("Sequential: %dms, Parallel: %dms%n", seqMs, parMs);
// Expect ~3–7x speedup on an 8-core machine
Sources with good split behaviour
| Source | Parallelism quality | Reason |
|---|---|---|
ArrayList | Excellent | O(1) split, exact size |
int[] / long[] | Excellent | O(1) split, exact size |
IntStream.range | Excellent | O(1) arithmetic split |
TreeSet / TreeMap | Good | Balanced tree splits reasonably |
HashSet / HashMap | Moderate | Splits by bucket, uneven possible |
LinkedList | Poor | O(n) split |
| Files.lines() | Poor | Sequential read only |
When NOT to Use Parallel Streams
Small data sets
// Bad: 10 elements — overhead far exceeds savings
List<Integer> small = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
small.parallelStream().map(n -> n * 2).collect(Collectors.toList());
// Use sequential
small.stream().map(n -> n * 2).collect(Collectors.toList());
Order matters and the source is ordered
// DANGEROUS: parallel + ordered list = correct but slow (re-ordering overhead)
// and may produce unexpected output order in intermediate steps
List<String> ordered = new ArrayList<>(names);
ordered.parallelStream()
.forEach(System.out::println); // order not guaranteed
// Fix: use forEachOrdered (but this negates most parallelism benefit)
ordered.parallelStream()
.forEachOrdered(System.out::println);
Shared mutable state
// BROKEN: ArrayList.add is not thread-safe
List<Integer> results = new ArrayList<>();
numbers.parallelStream()
.filter(n -> n > 5)
.forEach(results::add); // race condition!
// Fix: use collect instead
List<Integer> results = numbers.parallelStream()
.filter(n -> n > 5)
.collect(Collectors.toList()); // thread-safe
I/O-bound operations (blocking)
Use CompletableFuture with a custom thread pool instead of parallel streams for HTTP calls, database queries, or file I/O.
Short pipelines
If the pipeline has only one or two operations and the element count is modest, the overhead of splitting and merging will dominate.
Ordering Guarantees
| Stream type | forEach | collect(toList()) | findFirst |
|---|---|---|---|
| Sequential, ordered | ✓ ordered | ✓ ordered | first element |
| Parallel, ordered | ✗ any order | ✓ ordered (re-ordered) | first element (expensive) |
| Parallel, unordered | ✗ any order | ✗ any order | any element (fast) |
For parallel streams on ordered sources (lists), collect(Collectors.toList()) always preserves encounter order — Java guarantees this even for parallel streams. Only forEach loses order.
To tell the stream “I don’t care about order” and enable optimisations:
names.parallelStream()
.unordered() // removes ORDERED characteristic
.filter(s -> s.length() > 3)
.collect(Collectors.toList()); // may be in any order — faster
Practical Guide
// Template for deciding stream mode
Stream<T> stream = source.stream();
if (source.size() > 10_000 // large enough
&& operationIsCpuBound // not I/O
&& !needsOrderedSideEffects // no System.out::println in forEach
&& operationIsThreadSafe) { // no shared mutable state
stream = stream.parallel();
}
Common patterns that are safe to parallelise
// CPU-heavy computation, result in a list
List<Result> results = items.parallelStream()
.map(item -> expensiveCompute(item))
.collect(Collectors.toList());
// Sum / reduce over large numeric arrays
long total = LongStream.range(0, 1_000_000).parallel().sum();
// Filtering large collections
List<Order> bigOrders = orders.parallelStream()
.filter(o -> o.getTotal() > 10_000)
.collect(Collectors.toList());
Summary
| Concept | Key point |
|---|---|
| How it works | ForkJoin splits source → process in parallel → merge results |
| Default pool | ForkJoinPool.commonPool(), size = CPU count - 1 |
| Spliterator | Enables splitting; ArrayList / arrays split best |
| Ordering | collect(toList()) preserves order; forEach does not |
| Safe use | Large + CPU-bound + stateless + no ordering side effects |
| Avoid when | Small data, I/O-bound, shared mutable state, order matters |
Next Step
Optional: Eliminating NullPointerException the Right Way →
Part of the DevOps Monk Java tutorial series: Java 8 → Java 11 → Java 17 → Java 21