// Deep Technical Guide · Java 21+

Multi-Threading &
High-Concurrency
Patterns

A comprehensive reference covering the Java Memory Model, thread lifecycle, classic and modern concurrency primitives, virtual threads, and production-grade patterns.

Java 21+ JMM java.util.concurrent Virtual Threads Lock-Free Reactive Fork/Join

Java Memory Model (JMM)

The JMM defines the rules by which threads interact through memory. It specifies when writes by one thread become visible to reads in another thread, preventing subtle bugs that arise from CPU caches, out-of-order execution, and compiler optimisations.

happens-before

A guarantee that memory writes of a specific statement are observable to another statement. The ordering relation that makes visibility predictable.

visibility

Without synchronisation, a thread may see stale cached values. volatile and locks force a flush-and-refresh of the CPU's write buffers.

atomicity

64-bit reads/writes (long, double) are NOT atomic on 32-bit JVMs without volatile. Always use volatile or atomics for shared primitives.

reordering

The JVM and CPU can reorder instructions as long as single-thread semantics hold. Barriers (volatile, locks) prevent cross-thread reordering hazards.

Rule of thumb: Every shared mutable variable needs either volatile, synchronized, an Atomic* wrapper, or a happens-before relationship established by a lock/thread-start/thread-join.

volatile — lightweight visibility

VolatileFlag.java
/**
 * volatile guarantees:
 *  1. Visibility: writes are immediately flushed to main memory.
 *  2. Atomicity of single read/write (NOT compound read-modify-write).
 *  3. Prevents reordering across the volatile access (memory fence).
 */
public class VolatileFlag {

    private volatile boolean running = true;  // fence on each access

    public void worker() {
        while (running) {          // always reads fresh value from memory
            doWork();
        }
        System.out.println("Worker stopped cleanly.");
    }

    public void stop() {
        running = false;          // write flushed; visible to worker thread
    }

    // ❌ WRONG — volatile does NOT protect compound ops:
    private volatile int counter = 0;
    public void unsafeIncrement() { counter++; }  // read-modify-write: NOT atomic!

    // ✅ Use AtomicInteger instead for compound operations
}

Thread Lifecycle & Interruption

StateDescriptionTransition From
NEWCreated but not yet started
RUNNABLEExecuting or ready to execute on CPUstart(), unblocked
BLOCKEDWaiting to acquire a monitor locksynchronized entry
WAITINGIndefinitely waiting (wait, join, park)Object.wait(), LockSupport.park()
TIMED_WAITINGWaiting with a timeoutsleep(), wait(ms), join(ms)
TERMINATEDrun() returned or threwcompletion or uncaught exception
InterruptibleWorker.java
public class InterruptibleWorker implements Runnable {

    @Override
    public void run() {
        try {
            while (!Thread.currentThread().isInterrupted()) {
                processChunk();

                // sleep() clears the interrupt flag and throws InterruptedException
                Thread.sleep(100);
            }
        } catch (InterruptedException e) {
            // Restore interrupt status — never silently swallow!
            Thread.currentThread().interrupt();
            cleanup();
        }
    }

    // Blocking I/O that respects interruption
    private void processChunk() throws InterruptedException {
        if (Thread.currentThread().isInterrupted())
            throw new InterruptedException("Task cancelled by caller");
        // ... actual work ...
    }
}
⚠ Anti-pattern: catch (InterruptedException e) { /* ignore */ } — This silently swallows the signal. Always restore the interrupt flag with Thread.currentThread().interrupt() unless you are definitively terminating the thread.

Synchronization Primitives

Java's intrinsic locks (synchronized) use the object's monitor. Every Java object carries a monitor; synchronized methods lock this; static methods lock the Class object.

MonitorPatterns.java
public class BoundedBuffer<T> {
    private final Object[] items;
    private int count, putIdx, takeIdx;

    public BoundedBuffer(int capacity) {
        items = new Object[capacity];
    }

    // wait/notify: canonical producer-consumer on a single condition
    public synchronized void put(T item) throws InterruptedException {
        while (count == items.length)   // ALWAYS loop — guard against spurious wakeups
            wait();
        items[putIdx] = item;
        putIdx = (putIdx + 1) % items.length;
        ++count;
        notifyAll();                     // wake all waiting consumers
    }

    @SuppressWarnings("unchecked")
    public synchronized T take() throws InterruptedException {
        while (count == 0)              // loop — not if!
            wait();
        T item = (T) items[takeIdx];
        items[takeIdx] = null;          // help GC
        takeIdx = (takeIdx + 1) % items.length;
        --count;
        notifyAll();
        return item;
    }
}
Key rule: Always use while (never if) around wait(). Spurious wakeups can occur on any JVM platform per the POSIX spec. The loop re-checks the condition after every wake.

Double-Checked Locking (DCL) — Singleton

SafeSingleton.java
/**
 * Safe DCL requires volatile on the field (Java 5+).
 * volatile prevents partially-constructed object from being observed
 * because it establishes a happens-before for the write to instance.
 */
public class SafeSingleton {
    private static volatile SafeSingleton instance;

    private SafeSingleton() { }

    public static SafeSingleton getInstance() {
        if (instance == null) {               // first check (no lock)
            synchronized (SafeSingleton.class) {
                if (instance == null)           // second check (under lock)
                    instance = new SafeSingleton();
            }
        }
        return instance;
    }

    // ✅ Prefer: Initialization-on-demand holder (zero synchronisation overhead)
    private static class Holder {
        static final SafeSingleton INSTANCE = new SafeSingleton();
    }
    public static SafeSingleton getInstanceIdiomatic() {
        return Holder.INSTANCE;             // class-loading guarantees safe publish
    }
}

Explicit Locks & Conditions

java.util.concurrent.locks.ReentrantLock offers more flexibility than intrinsic locks: timed acquisition, interruptible acquisition, fairness policy, and multiple associated Condition objects.

ExplicitLockBuffer.java
import java.util.concurrent.locks.*;

public class ExplicitLockBuffer<T> {
    private final ReentrantLock lock = new ReentrantLock();
    private final Condition     notFull  = lock.newCondition();
    private final Condition     notEmpty = lock.newCondition();
    private final Object[]      items;
    private int count, putIdx, takeIdx;

    public ExplicitLockBuffer(int cap) { items = new Object[cap]; }

    public void put(T item) throws InterruptedException {
        lock.lock();
        try {
            while (count == items.length) notFull.await();
            items[putIdx] = item;
            putIdx = (putIdx + 1) % items.length;
            ++count;
            notEmpty.signal();    // wake exactly ONE consumer
        } finally {
            lock.unlock();        // ALWAYS in finally
        }
    }

    // ReadWriteLock — many readers, exclusive writer
    private final ReadWriteLock  rwLock = new ReentrantReadWriteLock();
    private final Lock           readLock  = rwLock.readLock();
    private final Lock           writeLock = rwLock.writeLock();
    private Map<String,String>   cache = new HashMap<>();

    public String get(String key) {
        readLock.lock();
        try   { return cache.get(key); }
        finally { readLock.unlock(); }
    }
    public void put(String k, String v) {
        writeLock.lock();
        try   { cache.put(k, v); }
        finally { writeLock.unlock(); }
    }
}

StampedLock — Optimistic Reads

StampedLockExample.java
import java.util.concurrent.locks.StampedLock;

public class Point {
    private double x, y;
    private final StampedLock sl = new StampedLock();

    public double distanceFromOrigin() {
        long stamp = sl.tryOptimisticRead();     // no lock acquired
        double cx = x, cy = y;
        if (!sl.validate(stamp)) {              // check for concurrent write
            stamp = sl.readLock();              // fall back to real read lock
            try { cx = x; cy = y; }
            finally { sl.unlockRead(stamp); }
        }
        return Math.sqrt(cx * cx + cy * cy);
    }

    public void move(double dx, double dy) {
        long stamp = sl.writeLock();
        try { x += dx; y += dy; }
        finally { sl.unlockWrite(stamp); }
    }
}

Atomic Variables & Lock-Free Algorithms

The java.util.concurrent.atomic package uses CPU-native Compare-And-Swap (CAS) instructions, eliminating OS scheduling overhead for many concurrent operations.

AtomicPatterns.java
import java.util.concurrent.atomic.*;

public class AtomicPatterns {

    // ── Basic counter ──────────────────────────────────────────
    private final AtomicLong counter = new AtomicLong(0);

    public long nextId()        { return counter.incrementAndGet(); }
    public long addAndGetStat(long delta) { return counter.addAndGet(delta); }

    // ── CAS spin loop — optimistic update ─────────────────────
    private final AtomicInteger max = new AtomicInteger(0);

    public void updateMax(int candidate) {
        int cur;
        do {
            cur = max.get();
            if (candidate <= cur) return;
        } while (!max.compareAndSet(cur, candidate));  // retry if lost race
    }

    // ── AtomicReference — immutable snapshot swap ──────────────
    record State(List<String> items, long version) {}
    private final AtomicReference<State> stateRef =
        new AtomicReference<>(new State(List.of(), 0));

    public void addItem(String item) {
        stateRef.updateAndGet(old -> {
            var newItems = new ArrayList<>(old.items());
            newItems.add(item);
            return new State(List.copyOf(newItems), old.version() + 1);
        });
    }

    // ── LongAdder — high-throughput counter (Java 8+) ─────────
    // Outperforms AtomicLong under high contention by striping cells
    private final LongAdder hits = new LongAdder();
    public void recordHit()   { hits.increment(); }
    public long totalHits()   { return hits.sum(); }  // only accurate at quiescence
}

Concurrent Collections

ClassInternalsBest Use
ConcurrentHashMapSegment/bin CAS (Java 8+)High-concurrency key-value store
CopyOnWriteArrayListSnapshot on writeRead-heavy, rare writes
LinkedBlockingQueueTwo-lock linked listUnbounded producer-consumer
ArrayBlockingQueueSingle lock + conditionsBounded, back-pressure
PriorityBlockingQueueHeap + single lockPriority scheduling
SynchronousQueueRendezvous transferDirect hand-off (no capacity)
DelayQueuePQ + delay expiryScheduled tasks, TTL caches
ConcurrentLinkedQueueLock-free Michael-Scott queueNon-blocking FIFO
ConcurrentMapPatterns.java
ConcurrentHashMap<String, AtomicInteger> freq = new ConcurrentHashMap<>();

// computeIfAbsent is atomic — safe for lazy initialisation
public void recordWord(String word) {
    freq.computeIfAbsent(word, k -> new AtomicInteger(0))
        .incrementAndGet();
}

// merge — atomic read-modify-write without explicit locking
ConcurrentHashMap<String, Long> scores = new ConcurrentHashMap<>();
public void addScore(String user, long points) {
    scores.merge(user, points, Long::sum);   // atomic accumulate
}

// Parallel aggregation via streams on ConcurrentHashMap
long total = scores.reduceValuesToLong(1, Long::longValue, 0, Long::sum);

// CopyOnWriteArrayList — event listener registry
private final List<Runnable> listeners = new CopyOnWriteArrayList<>();
public void addListener(Runnable r) { listeners.add(r); }
public void fireAll()                { listeners.forEach(Runnable::run); } // snapshot

Executors & Thread Pools

Raw Thread creation is expensive. Thread pools amortise the cost and bound resource usage. ThreadPoolExecutor is the central tunable implementation.

ExecutorPatterns.java
import java.util.concurrent.*;

// ── Tunable pool ──────────────────────────────────────────────
ThreadPoolExecutor pool = new ThreadPoolExecutor(
    4,                                    // corePoolSize
    16,                                   // maximumPoolSize
    60, TimeUnit.SECONDS,                 // keepAliveTime
    new LinkedBlockingQueue<>(1000),      // bounded work queue
    new ThreadFactory() {
        private final AtomicInteger n = new AtomicInteger();
        public Thread newThread(Runnable r) {
            Thread t = new Thread(r, "worker-" + n.getAndIncrement());
            t.setDaemon(true);             // don't prevent JVM shutdown
            return t;
        }
    },
    new ThreadPoolExecutor.CallerRunsPolicy() // back-pressure: caller executes
);

// ── Scheduled execution ──────────────────────────────────────
ScheduledExecutorService scheduler =
    Executors.newScheduledThreadPool(2);

scheduler.scheduleAtFixedRate(
    () -> publishMetrics(),
    0, 30, TimeUnit.SECONDS        // fixed-rate: next run = start + period
);
scheduler.scheduleWithFixedDelay(
    () -> pollExternalService(),
    0, 5, TimeUnit.SECONDS          // fixed-delay: next run = end + delay
);

// ── Graceful shutdown ────────────────────────────────────────
pool.shutdown();                         // no new tasks; drain queue
if (!pool.awaitTermination(30, TimeUnit.SECONDS)) {
    pool.shutdownNow();                  // interrupt running tasks
    if (!pool.awaitTermination(5, TimeUnit.SECONDS))
        log.error("Pool did not terminate");
}

CallerRunsPolicy

Queue full → caller thread executes task. Natural back-pressure; slows producers automatically.

AbortPolicy (default)

Queue full → throw RejectedExecutionException. Caller must handle or circuit-break.

DiscardOldestPolicy

Queue full → discard the head (oldest unstarted task) and retry submission.

DiscardPolicy

Silently drops new submissions. Use only when loss is acceptable (metrics, fire-and-forget logs).

CompletableFuture & Async Pipelines

AsyncPipeline.java
import java.util.concurrent.*;

public class OrderService {

    // ── Linear async pipeline ─────────────────────────────────
    public CompletableFuture<Invoice> processOrder(Order order) {
        return CompletableFuture
            .supplyAsync(() -> validateOrder(order),    ioPool)   // async on I/O pool
            .thenApplyAsync(valid -> chargePayment(valid), ioPool)
            .thenApplyAsync(paid  -> fulfil(paid),         cpuPool)
            .thenApply(Invoice::from)                            // same thread
            .exceptionally(ex -> Invoice.failed(ex.getMessage()));
    }

    // ── Fan-out / fan-in ──────────────────────────────────────
    public CompletableFuture<AggregatedResult> fanOut(List<String> ids) {
        List<CompletableFuture<Data>> futures = ids.stream()
            .map(id -> CompletableFuture.supplyAsync(() -> fetch(id), ioPool))
            .toList();

        return CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]))
            .thenApply(_ ->
                futures.stream()
                    .map(CompletableFuture::join)   // all done — join is safe
                    .collect(collectingAndThen(toList(), AggregatedResult::new))
            );
    }

    // ── Race: first non-null response wins ─────────────────────
    public CompletableFuture<String> firstAvailable() {
        return CompletableFuture.anyOf(
            CompletableFuture.supplyAsync(() -> callRegionA(), ioPool),
            CompletableFuture.supplyAsync(() -> callRegionB(), ioPool)
        ).thenApply(o -> (String) o);
    }

    // ── Timeout (Java 9+) ─────────────────────────────────────
    public CompletableFuture<Data> withTimeout() {
        return CompletableFuture.supplyAsync(this::slowOperation, ioPool)
            .orTimeout(5, TimeUnit.SECONDS)
            .exceptionally(ex -> Data.FALLBACK);
    }
}

Fork/Join Framework

Fork/Join uses work-stealing: idle threads steal tasks from the back of other threads' deques, keeping CPUs saturated. Designed for divide-and-conquer CPU-bound tasks.

ParallelMergeSort.java
import java.util.concurrent.*;

public class ParallelMergeSort extends RecursiveAction {
    private static final int THRESHOLD = 1024;
    private final int[] array;
    private final int  lo, hi;

    public ParallelMergeSort(int[] a, int lo, int hi) {
        this.array = a; this.lo = lo; this.hi = hi;
    }

    @Override
    protected void compute() {
        if (hi - lo < THRESHOLD) {          // base case: sequential sort
            Arrays.sort(array, lo, hi);
            return;
        }
        int mid = (lo + hi) >>> 1;
        var left  = new ParallelMergeSort(array, lo, mid);
        var right = new ParallelMergeSort(array, mid, hi);

        left.fork();                         // push left to deque
        right.compute();                     // run right in-thread (avoid overhead)
        left.join();                          // steal-compatible wait

        merge(array, lo, mid, hi);
    }

    // Usage: ForkJoinPool.commonPool().invoke(new ParallelMergeSort(arr, 0, arr.length))
    // Or custom pool: new ForkJoinPool(Runtime.getRuntime().availableProcessors())
}

Virtual Threads — Project Loom (Java 21)

Virtual threads are lightweight JVM-managed threads. Millions can exist simultaneously because they unmount from their carrier (OS) thread during blocking operations — no OS thread is pinned.

VirtualThreads.java
import java.util.concurrent.*;

public class VirtualThreadDemo {

    // ── Spawn 1 million virtual threads ──────────────────────
    public static void millionRequests() throws Exception {
        try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
            IntStream.range(0, 1_000_000).forEach(i ->
                executor.submit(() -> {
                    Thread.sleep(Duration.ofMillis(100));  // mounts; unblocks cheaply
                    return i;
                })
            );
        } // auto-shutdown + await
    }

    // ── Structured Concurrency (Java 21 preview) ──────────────
    public Response handleRequest(String id) throws Exception {
        try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
            var user  = scope.fork(() -> fetchUser(id));
            var order = scope.fork(() -> fetchOrder(id));

            scope.join().throwIfFailed();      // cancel both if either fails

            return new Response(user.get(), order.get());
        }
    }

    // ── Pinning pitfall: avoid synchronized inside virtual thread ──
    // synchronized blocks pin the carrier thread — use ReentrantLock instead
    private final ReentrantLock lock = new ReentrantLock();

    public void safeVirtualWork() {
        lock.lock();
        try   { doBlockingIo(); }     // carrier can unmount here ✅
        finally { lock.unlock(); }
    }
}
When to use virtual threads: I/O-bound concurrency (HTTP, JDBC, file I/O). For CPU-bound work, use a bounded ForkJoinPool tuned to available processors. Virtual threads do not improve CPU-bound throughput.

High-Concurrency Design Patterns

Producer-Consumer with Back-Pressure

ProducerConsumer.java
public class Pipeline<T> {
    private final BlockingQueue<T> queue;
    private volatile boolean       running = true;

    public Pipeline(int capacity) {
        this.queue = new ArrayBlockingQueue<>(capacity);   // bounded = back-pressure
    }

    public void produce(T item) throws InterruptedException {
        queue.put(item);    // BLOCKS when full — slows producer naturally
    }

    public void startConsumers(int n, Consumer<T> fn) {
        ExecutorService pool = Executors.newFixedThreadPool(n);
        for (int i = 0; i < n; i++) {
            pool.execute(() -> {
                while (running || !queue.isEmpty()) {
                    try {
                        T item = queue.poll(200, TimeUnit.MILLISECONDS);
                        if (item != null) fn.accept(item);
                    } catch (InterruptedException e) {
                        Thread.currentThread().interrupt(); break;
                    }
                }
            });
        }
    }
}

Thread-Safe Lazy Cache (Memoization)

Memoizer.java
public class Memoizer<K, V> {
    private final ConcurrentHashMap<K, CompletableFuture<V>> cache =
        new ConcurrentHashMap<>();
    private final Function<K, V> loader;

    public Memoizer(Function<K, V> loader) { this.loader = loader; }

    public V get(K key) throws Exception {
        return cache.computeIfAbsent(key, k ->
            CompletableFuture.supplyAsync(() -> loader.apply(k))
        ).get();
        // computeIfAbsent is atomic — only one future per key is created
        // All concurrent requests for same key await the SAME future
    }
}

Read-Write Striped Lock Pattern

StripedLock.java
/**
 * Striped locking: instead of one global lock, use N locks keyed by hash.
 * Reduces contention by factor N with no correctness change.
 * Google Guava: Striped<Lock> / Striped<ReadWriteLock>
 */
public class StripedCounter {
    private static final int  STRIPES = 64;
    private final ReentrantLock[] locks  = new ReentrantLock[STRIPES];
    private final long[]           counts = new long[STRIPES];

    public StripedCounter() {
        Arrays.setAll(locks, _ -> new ReentrantLock());
    }

    private int stripe(Object key) {
        return (key.hashCode() & 0x7FFF_FFFF) % STRIPES;
    }

    public void increment(String key) {
        int s = stripe(key);
        locks[s].lock();
        try   { counts[s]++; }
        finally { locks[s].unlock(); }
    }

    public long total() {
        long sum = 0;
        for (long c : counts) sum += c;
        return sum;
    }
}

Pitfalls, Deadlocks & Diagnostics

Classic Deadlock

DeadlockExample.java
// ❌ Deadlock: T1 holds A, waits for B; T2 holds B, waits for A
void transfer1(Account from, Account to) {
    synchronized(from) { synchronized(to) { move(from, to); } }
}
void transfer2(Account from, Account to) {
    synchronized(to) { synchronized(from) { move(from, to); } }
}

// ✅ Fix: consistent lock ordering by account ID
void transferSafe(Account a, Account b, long amount) {
    Account first  = a.id() < b.id() ? a : b;
    Account second = a.id() < b.id() ? b : a;
    synchronized(first) {
        synchronized(second) {
            first.debit(amount); second.credit(amount);
        }
    }
}

// ✅ Alternative: tryLock with back-off
boolean transferTryLock(ReentrantLock l1, ReentrantLock l2, long amount)
    throws InterruptedException {
    while (true) {
        if (l1.tryLock(10, TimeUnit.MILLISECONDS)) {
            try {
                if (l2.tryLock(10, TimeUnit.MILLISECONDS)) {
                    try   { move(); return true; }
                    finally { l2.unlock(); }
                }
            } finally { l1.unlock(); }
        }
        Thread.sleep(random(1,5));  // randomised back-off
    }
}
01

Livelock — Threads respond to each other but make no progress. Solution: randomised back-off or priority ordering.

02

Starvation — Low-priority threads never get CPU. Use fair locks (new ReentrantLock(true)) or bounded waiting queues.

03

False Sharing — Two fields on the same cache line ping-pong between CPU caches. Pad fields with @Contended (JVM flag: -XX:-RestrictContended).

04

ThreadLocal Leaks — In thread pools, ThreadLocal values outlive the task. Always call remove() in a finally block.

05

Missed Signalnotify() sent before wait() is lost forever. Always check state condition under the lock before waiting.

Diagnostic Commands

diagnostics.sh
# Thread dump — analyse BLOCKED/WAITING stacks for deadlock chains
jstack <pid>
jcmd  <pid> Thread.print -l    # with locked monitors

# Flight Recorder — low-overhead production profiling
jcmd <pid> JFR.start duration=60s filename=recording.jfr
jfr  print --events jdk.VirtualThreadPinned recording.jfr

# Check for virtual thread pinning events
-Djdk.tracePinnedThreads=full   # JVM flag: prints pinning stack traces

# GC impact on locks (STW pauses affect timed waits)
-Xlog:gc*:file=gc.log:time,uptime

# Detect data races with ThreadSanitizer-equivalent
# Use: -ea -Djava.util.concurrent.ForkJoinPool.common.parallelism=N