Part 8 of 16

Parallel Streams: ForkJoinPool, Spliterators, and When NOT to Parallelize

How Parallel Streams Work

A parallel stream splits its source into sub-sequences, processes each sub-sequence on a separate thread, and merges the results. The mechanism is ForkJoin — specifically ForkJoinPool.commonPool(), a shared thread pool managed by the JVM.

// Sequential — processes on the calling thread
List<String> seq = names.stream()
    .map(String::toUpperCase)
    .collect(Collectors.toList());

// Parallel — splits work across ForkJoinPool.commonPool()
List<String> par = names.parallelStream()
    .map(String::toUpperCase)
    .collect(Collectors.toList());

// Convert existing stream to parallel
List<String> par2 = names.stream()
    .parallel()           // switch to parallel
    .map(String::toUpperCase)
    .collect(Collectors.toList());

A single call to .parallel() or .sequential() anywhere in a pipeline applies to the entire pipeline.


ForkJoinPool.commonPool

The common pool is shared across the entire JVM process. Its thread count defaults to Runtime.getRuntime().availableProcessors() - 1 (leaving one CPU for the main thread).

System.out.println(ForkJoinPool.commonPool().getParallelism());
// e.g., 7 on an 8-core machine

You can change the pool size with a system property:

-Djava.util.concurrent.ForkJoinPool.common.parallelism=4

The Shared Pool Problem

Because the common pool is shared, one slow parallel stream can starve all others. A blocking I/O call inside a parallel stream blocks a ForkJoin worker thread, which can cascade:

// DANGEROUS: blocking inside parallel stream consumes ForkJoin threads
orders.parallelStream()
    .map(order -> httpClient.fetchDetails(order.getId()))  // blocks!
    .collect(Collectors.toList());

Fix: Use a custom pool for blocking operations:

ForkJoinPool customPool = new ForkJoinPool(8);
List<OrderDetail> results = customPool.submit(() ->
    orders.parallelStream()
        .map(order -> httpClient.fetchDetails(order.getId()))
        .collect(Collectors.toList())
).get();
customPool.shutdown();

Spliterators

The engine behind stream splitting is Spliterator<T> — an iterator that knows how to split itself for parallel processing.

public interface Spliterator<T> {
    boolean tryAdvance(Consumer<? super T> action);  // process next element
    Spliterator<T> trySplit();                        // split off half
    long estimateSize();                              // estimated remaining elements
    int characteristics();                            // ORDERED, SIZED, DISTINCT, etc.
}

When a stream goes parallel:

  1. trySplit() is called recursively to create sub-tasks down to a threshold
  2. Each sub-task processes its split and produces partial results
  3. Results are combined using the pipeline’s combiner

Spliterator Characteristics

Characteristics tell the framework what guarantees the data source provides, enabling optimisations:

CharacteristicMeaningExample sources
ORDEREDElements have a defined encounter orderList, LinkedList, Arrays.stream
SORTEDElements are sortedTreeSet, sorted stream
SIZEDestimateSize() is exactArrayList, HashSet
DISTINCTNo duplicate elementsSet
NONNULLNo null elementsConcurrentHashMap
IMMUTABLESource cannot be modifiedList.of() (Java 9+)
SUBSIZEDSub-spliterators are also SIZEDArrays

ArrayList is ORDERED + SIZED + SUBSIZED — it splits perfectly. HashSet is SIZED + DISTINCT but not ORDERED — it can’t guarantee encounter order.


When Parallel Is Actually Faster

Parallel streams have overhead: task splitting, thread coordination, result merging. They are only faster when the computational savings outweigh this overhead.

Conditions for parallel wins

  1. Large data set — typically 10,000+ elements; below that, overhead dominates
  2. Computationally expensive per-element operation — CPU-bound work that takes non-trivial time
  3. Splittable sourceArrayList, arrays, IntStream.range split evenly; LinkedList does not
  4. No ordering requirement — or the ordering is cheap to restore
  5. No shared mutable state — thread-safe operations only

Quick benchmark: CPU-bound work

long N = 10_000_000L;

// Sequential
long start = System.nanoTime();
long sumSeq = LongStream.range(0, N)
    .map(n -> n * n % 1000000007L)
    .sum();
long seqMs = (System.nanoTime() - start) / 1_000_000;

// Parallel
start = System.nanoTime();
long sumPar = LongStream.range(0, N)
    .parallel()
    .map(n -> n * n % 1000000007L)
    .sum();
long parMs = (System.nanoTime() - start) / 1_000_000;

System.out.printf("Sequential: %dms, Parallel: %dms%n", seqMs, parMs);
// Expect ~3–7x speedup on an 8-core machine

Sources with good split behaviour

SourceParallelism qualityReason
ArrayListExcellentO(1) split, exact size
int[] / long[]ExcellentO(1) split, exact size
IntStream.rangeExcellentO(1) arithmetic split
TreeSet / TreeMapGoodBalanced tree splits reasonably
HashSet / HashMapModerateSplits by bucket, uneven possible
LinkedListPoorO(n) split
Files.lines()PoorSequential read only

When NOT to Use Parallel Streams

Small data sets

// Bad: 10 elements — overhead far exceeds savings
List<Integer> small = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
small.parallelStream().map(n -> n * 2).collect(Collectors.toList());

// Use sequential
small.stream().map(n -> n * 2).collect(Collectors.toList());

Order matters and the source is ordered

// DANGEROUS: parallel + ordered list = correct but slow (re-ordering overhead)
// and may produce unexpected output order in intermediate steps
List<String> ordered = new ArrayList<>(names);
ordered.parallelStream()
    .forEach(System.out::println); // order not guaranteed

// Fix: use forEachOrdered (but this negates most parallelism benefit)
ordered.parallelStream()
    .forEachOrdered(System.out::println);

Shared mutable state

// BROKEN: ArrayList.add is not thread-safe
List<Integer> results = new ArrayList<>();
numbers.parallelStream()
    .filter(n -> n > 5)
    .forEach(results::add);  // race condition!

// Fix: use collect instead
List<Integer> results = numbers.parallelStream()
    .filter(n -> n > 5)
    .collect(Collectors.toList());  // thread-safe

I/O-bound operations (blocking)

Use CompletableFuture with a custom thread pool instead of parallel streams for HTTP calls, database queries, or file I/O.

Short pipelines

If the pipeline has only one or two operations and the element count is modest, the overhead of splitting and merging will dominate.


Ordering Guarantees

Stream typeforEachcollect(toList())findFirst
Sequential, ordered✓ ordered✓ orderedfirst element
Parallel, ordered✗ any order✓ ordered (re-ordered)first element (expensive)
Parallel, unordered✗ any order✗ any orderany element (fast)

For parallel streams on ordered sources (lists), collect(Collectors.toList()) always preserves encounter order — Java guarantees this even for parallel streams. Only forEach loses order.

To tell the stream “I don’t care about order” and enable optimisations:

names.parallelStream()
    .unordered()  // removes ORDERED characteristic
    .filter(s -> s.length() > 3)
    .collect(Collectors.toList()); // may be in any order — faster

Practical Guide

// Template for deciding stream mode
Stream<T> stream = source.stream();

if (source.size() > 10_000           // large enough
        && operationIsCpuBound        // not I/O
        && !needsOrderedSideEffects   // no System.out::println in forEach
        && operationIsThreadSafe) {   // no shared mutable state
    stream = stream.parallel();
}

Common patterns that are safe to parallelise

// CPU-heavy computation, result in a list
List<Result> results = items.parallelStream()
    .map(item -> expensiveCompute(item))
    .collect(Collectors.toList());

// Sum / reduce over large numeric arrays
long total = LongStream.range(0, 1_000_000).parallel().sum();

// Filtering large collections
List<Order> bigOrders = orders.parallelStream()
    .filter(o -> o.getTotal() > 10_000)
    .collect(Collectors.toList());

Summary

ConceptKey point
How it worksForkJoin splits source → process in parallel → merge results
Default poolForkJoinPool.commonPool(), size = CPU count - 1
SpliteratorEnables splitting; ArrayList / arrays split best
Orderingcollect(toList()) preserves order; forEach does not
Safe useLarge + CPU-bound + stateless + no ordering side effects
Avoid whenSmall data, I/O-bound, shared mutable state, order matters

Next Step

Optional: Eliminating NullPointerException the Right Way →

Part of the DevOps Monk Java tutorial series: Java 8Java 11Java 17Java 21