Implementation Principle of ConcurrentHashMap in Java

Implementation Principle of ConcurrentHashMap in Java

Description
ConcurrentHashMap is a thread-safe HashMap implementation in the Java Collections Framework that supports high-concurrency access. Unlike Hashtable and Collections.synchronizedMap(), which use a "coarse-grained" locking mechanism that synchronizes the entire collection, ConcurrentHashMap achieves higher concurrency performance through finer-grained locking strategies (segmented locking in JDK 1.7 and earlier, and CAS + synchronized in JDK 1.8 and later). It is a high-frequency and in-depth knowledge point in interviews.

Knowledge Explanation

Part 1: Evolution Background and Core Objectives

Why is ConcurrentHashMap needed?
- Problem 1: HashMap is not thread-safe. In a multi-threaded environment, multiple threads performing put operations on a HashMap simultaneously may cause internal linked lists to form a cycle, leading to an infinite loop with 100% CPU usage, or result in data overwrite and loss.
- Problem 2: Hashtable is inefficient. Hashtable is thread-safe, but it uses the synchronized keyword to modify all methods, which is equivalent to locking the entire array object. Only one thread can operate at a time, making it safe but with extremely poor concurrency performance.
Core Objective: To ensure high-throughput concurrent access while guaranteeing thread safety. The key lies in making the "granularity" of locking finer, so that multiple threads do not block each other when accessing different data.

Part 2: Implementation in JDK 1.7 (Segmented Locking Mechanism)

ConcurrentHashMap in JDK 1.7 uses a design called "Segmented Locking."

Data Structure: Segment + HashEntry
- ConcurrentHashMap internally consists of a Segment array. You can think of each Segment as a small Hashtable.
- Each Segment internally contains a HashEntry array, which is where key-value pairs are actually stored. Each HashEntry is a node in a linked list.
"Segmentation" Concept:
- The entire Map is divided into many segments (16 by default). When putting an element, the Segment to which the element belongs is first calculated based on the hash value of the key.
- Then, only that specific Segment is locked. Other threads can simultaneously access other unlocked Segments.
- Analogy: This is like a large library with 16 reading rooms (Segments), each with its own door lock. A reader (thread) wanting to enter Reading Room A to read (modify data) only needs to lock the door of Reading Room A. Other readers can still enter Reading Rooms B, C, D, etc., simultaneously. This is much more efficient than locking the entire library door (the Hashtable approach).
Operation Steps (Taking put as an example):
- Step 1: Locate the Segment. Re-hash the key's hashCode and use the high-order hash value to determine which Segment the element belongs to.
- Step 2: Lock the Segment. Acquire the lock (ReentrantLock) corresponding to this Segment object. If acquisition fails, the thread enters a wait queue.
- Step 3: Locate the HashEntry. Within the locked Segment, hash the key's hashCode again to determine its position in the HashEntry array (i.e., which bucket).
- Step 4: Operate on the Linked List. Perform insertion, update, or traversal operations on the linked list in the found bucket.
- Step 5: Release the Lock. Release the Segment lock after the operation is complete.

Part 3: Implementation in JDK 1.8 and Later (CAS + synchronized)

JDK 1.8 underwent a major refactoring of ConcurrentHashMap, abandoning segmented locking and adopting more optimized techniques.

Simplified Data Structure: Node Array
- The structure became similar to HashMap 1.8, directly using a Node array (replacing HashEntry). Nodes can form linked lists or red-black trees (when the linked list length exceeds a threshold).
New Core Mechanism: CAS + synchronized
- CAS (Compare-And-Swap): An optimistic locking mechanism, it is a lock-free operation. It involves three operands: a memory location (V), an expected original value (A), and a new value (B). If the value at V equals A, then V's value is updated to B; otherwise, nothing is done. The entire operation is atomic. In ConcurrentHashMap, operations like initializing the array or inserting a node into an empty bucket use CAS, avoiding direct locking and achieving high efficiency.
- synchronized: Used to lock the head node of a linked list (or the root node of a tree). Note that the lock granularity has been refined from a Segment (containing multiple buckets) in JDK 1.7 to a single bucket.
Detailed Operation Steps (Taking put as an example):
- Step 1: Calculate Hash. Compute the hash value of the key. This hash value is specially processed to ensure it is positive.
- Step 2: Check if the Table is Null. If the Node array is not yet initialized, initialize it first (ensuring only one thread succeeds via CAS).
- Step 3: Locate the Bucket and Check if it is Empty. Locate the array index (bucket) based on the hash value. If this bucket is empty (null), use a CAS operation to try to place the new node into this bucket.
  - Success: The put operation is nearly complete, very efficient.
  - Failure: Indicates another thread抢先 inserted a node at this position, causing the current thread's CAS to fail. Proceed to the next step.
- Step 4: Handle Non-Empty Bucket. If the bucket is not empty, or if step 3's CAS failed, use the synchronized keyword to lock the head node of this bucket.
  - While locked, check whether it is currently a linked list or a red-black tree.
  - If it is a linked list: Traverse the linked list, update an existing node, or insert a new node at the end of the list.
  - If it is a red-black tree: Insert the node according to the red-black tree method.
- Step 5: Determine if Tree Conversion is Needed. After inserting into the linked list, if the linked list length reaches a threshold (default 8) and the total array length reaches the minimum treeification capacity (64), convert the linked list to a red-black tree to improve query efficiency.
- Step 6: Count Size. Use a variable named baseCount and a CounterCell array to assist in counting (a kind of sharded counting concept) to avoid the size update becoming a performance bottleneck.

Summary and Comparison

JDK 1.7: Segmented locking technology. Lock granularity is the "Segment," which contains multiple buckets. Write operations require acquiring the segment lock.
JDK 1.8+: CAS + synchronized. Lock granularity is a single bucket. synchronized is used to lock the head node only when a hash conflict occurs (bucket is non-empty), while efficient CAS is used when there is no conflict (bucket is empty), greatly improving concurrency performance. Red-black trees are also introduced to solve the problem of degraded query efficiency caused by excessively long linked lists when hash conflicts are severe.

Through this gradual evolution, ConcurrentHashMap achieves nearly optimal concurrency performance while ensuring thread safety, making it the most commonly used Map implementation in high-concurrency scenarios.