Parallel Streams: ForkJoinPool, Spliterators, and When NOT to Parallelize

Part 8 of 16

May 04, 2026 Abhay 5 min read

Parallel Streams: ForkJoinPool, Spliterators, and When NOT to Parallelize

How Parallel Streams Work

A parallel stream splits its source into sub-sequences, processes each sub-sequence on a separate thread, and merges the results. The mechanism is ForkJoin — specifically ForkJoinPool.commonPool(), a shared thread pool managed by the JVM.

// Sequential — processes on the calling thread
List<String> seq = names.stream()
    .map(String::toUpperCase)
    .collect(Collectors.toList());

// Parallel — splits work across ForkJoinPool.commonPool()
List<String> par = names.parallelStream()
    .map(String::toUpperCase)
    .collect(Collectors.toList());

// Convert existing stream to parallel
List<String> par2 = names.stream()
    .parallel()           // switch to parallel
    .map(String::toUpperCase)
    .collect(Collectors.toList());

A single call to .parallel() or .sequential() anywhere in a pipeline applies to the entire pipeline.

ForkJoinPool.commonPool

The common pool is shared across the entire JVM process. Its thread count defaults to Runtime.getRuntime().availableProcessors() - 1 (leaving one CPU for the main thread).

System.out.println(ForkJoinPool.commonPool().getParallelism());
// e.g., 7 on an 8-core machine

You can change the pool size with a system property:

-Djava.util.concurrent.ForkJoinPool.common.parallelism=4

The Shared Pool Problem

Because the common pool is shared, one slow parallel stream can starve all others. A blocking I/O call inside a parallel stream blocks a ForkJoin worker thread, which can cascade:

// DANGEROUS: blocking inside parallel stream consumes ForkJoin threads
orders.parallelStream()
    .map(order -> httpClient.fetchDetails(order.getId()))  // blocks!
    .collect(Collectors.toList());

Fix: Use a custom pool for blocking operations:

ForkJoinPool customPool = new ForkJoinPool(8);
List<OrderDetail> results = customPool.submit(() ->
    orders.parallelStream()
        .map(order -> httpClient.fetchDetails(order.getId()))
        .collect(Collectors.toList())
).get();
customPool.shutdown();

Spliterators

The engine behind stream splitting is Spliterator<T> — an iterator that knows how to split itself for parallel processing.

public interface Spliterator<T> {
    boolean tryAdvance(Consumer<? super T> action);  // process next element
    Spliterator<T> trySplit();                        // split off half
    long estimateSize();                              // estimated remaining elements
    int characteristics();                            // ORDERED, SIZED, DISTINCT, etc.
}

When a stream goes parallel:

trySplit() is called recursively to create sub-tasks down to a threshold
Each sub-task processes its split and produces partial results
Results are combined using the pipeline’s combiner

Spliterator Characteristics

Characteristics tell the framework what guarantees the data source provides, enabling optimisations:

Characteristic	Meaning	Example sources
`ORDERED`	Elements have a defined encounter order	`List`, `LinkedList`, `Arrays.stream`
`SORTED`	Elements are sorted	`TreeSet`, sorted stream
`SIZED`	`estimateSize()` is exact	`ArrayList`, `HashSet`
`DISTINCT`	No duplicate elements	`Set`
`NONNULL`	No null elements	`ConcurrentHashMap`
`IMMUTABLE`	Source cannot be modified	`List.of()` (Java 9+)
`SUBSIZED`	Sub-spliterators are also `SIZED`	Arrays

ArrayList is ORDERED + SIZED + SUBSIZED — it splits perfectly. HashSet is SIZED + DISTINCT but not ORDERED — it can’t guarantee encounter order.

When Parallel Is Actually Faster

Parallel streams have overhead: task splitting, thread coordination, result merging. They are only faster when the computational savings outweigh this overhead.

Conditions for parallel wins

Large data set — typically 10,000+ elements; below that, overhead dominates
Computationally expensive per-element operation — CPU-bound work that takes non-trivial time
Splittable source — ArrayList, arrays, IntStream.range split evenly; LinkedList does not
No ordering requirement — or the ordering is cheap to restore
No shared mutable state — thread-safe operations only

Quick benchmark: CPU-bound work

long N = 10_000_000L;

// Sequential
long start = System.nanoTime();
long sumSeq = LongStream.range(0, N)
    .map(n -> n * n % 1000000007L)
    .sum();
long seqMs = (System.nanoTime() - start) / 1_000_000;

// Parallel
start = System.nanoTime();
long sumPar = LongStream.range(0, N)
    .parallel()
    .map(n -> n * n % 1000000007L)
    .sum();
long parMs = (System.nanoTime() - start) / 1_000_000;

System.out.printf("Sequential: %dms, Parallel: %dms%n", seqMs, parMs);
// Expect ~3–7x speedup on an 8-core machine

Sources with good split behaviour

Source	Parallelism quality	Reason
`ArrayList`	Excellent	O(1) split, exact size
`int[]` / `long[]`	Excellent	O(1) split, exact size
`IntStream.range`	Excellent	O(1) arithmetic split
`TreeSet` / `TreeMap`	Good	Balanced tree splits reasonably
`HashSet` / `HashMap`	Moderate	Splits by bucket, uneven possible
`LinkedList`	Poor	O(n) split
Files.lines()	Poor	Sequential read only

When NOT to Use Parallel Streams

Small data sets

// Bad: 10 elements — overhead far exceeds savings
List<Integer> small = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
small.parallelStream().map(n -> n * 2).collect(Collectors.toList());

// Use sequential
small.stream().map(n -> n * 2).collect(Collectors.toList());

Order matters and the source is ordered

// DANGEROUS: parallel + ordered list = correct but slow (re-ordering overhead)
// and may produce unexpected output order in intermediate steps
List<String> ordered = new ArrayList<>(names);
ordered.parallelStream()
    .forEach(System.out::println); // order not guaranteed

// Fix: use forEachOrdered (but this negates most parallelism benefit)
ordered.parallelStream()
    .forEachOrdered(System.out::println);

Shared mutable state

// BROKEN: ArrayList.add is not thread-safe
List<Integer> results = new ArrayList<>();
numbers.parallelStream()
    .filter(n -> n > 5)
    .forEach(results::add);  // race condition!

// Fix: use collect instead
List<Integer> results = numbers.parallelStream()
    .filter(n -> n > 5)
    .collect(Collectors.toList());  // thread-safe

I/O-bound operations (blocking)

Use CompletableFuture with a custom thread pool instead of parallel streams for HTTP calls, database queries, or file I/O.

Short pipelines

If the pipeline has only one or two operations and the element count is modest, the overhead of splitting and merging will dominate.

Ordering Guarantees

Stream type	`forEach`	`collect(toList())`	`findFirst`
Sequential, ordered	✓ ordered	✓ ordered	first element
Parallel, ordered	✗ any order	✓ ordered (re-ordered)	first element (expensive)
Parallel, unordered	✗ any order	✗ any order	any element (fast)

For parallel streams on ordered sources (lists), collect(Collectors.toList()) always preserves encounter order — Java guarantees this even for parallel streams. Only forEach loses order.

To tell the stream “I don’t care about order” and enable optimisations:

names.parallelStream()
    .unordered()  // removes ORDERED characteristic
    .filter(s -> s.length() > 3)
    .collect(Collectors.toList()); // may be in any order — faster

Practical Guide

// Template for deciding stream mode
Stream<T> stream = source.stream();

if (source.size() > 10_000           // large enough
        && operationIsCpuBound        // not I/O
        && !needsOrderedSideEffects   // no System.out::println in forEach
        && operationIsThreadSafe) {   // no shared mutable state
    stream = stream.parallel();
}

Common patterns that are safe to parallelise

// CPU-heavy computation, result in a list
List<Result> results = items.parallelStream()
    .map(item -> expensiveCompute(item))
    .collect(Collectors.toList());

// Sum / reduce over large numeric arrays
long total = LongStream.range(0, 1_000_000).parallel().sum();

// Filtering large collections
List<Order> bigOrders = orders.parallelStream()
    .filter(o -> o.getTotal() > 10_000)
    .collect(Collectors.toList());

Summary

Concept	Key point
How it works	ForkJoin splits source → process in parallel → merge results
Default pool	`ForkJoinPool.commonPool()`, size = CPU count - 1
Spliterator	Enables splitting; `ArrayList` / arrays split best
Ordering	`collect(toList())` preserves order; `forEach` does not
Safe use	Large + CPU-bound + stateless + no ordering side effects
Avoid when	Small data, I/O-bound, shared mutable state, order matters

Next Step

Optional: Eliminating NullPointerException the Right Way →

Part of the DevOps Monk Java tutorial series: Java 8 → Java 11 → Java 17 → Java 21