The Underlying Implementation Principle of HashMap in Java

The Underlying Implementation Principle of HashMap in Java

Description:
HashMap is one of the most commonly used data structures in the Java Collections Framework, used to store key-value pair mappings. Understanding its underlying implementation principle is crucial for efficient usage and handling interviews. Its core lies in how to quickly locate the corresponding value (value) through the key (key).

Solution/Explanation Process:

We will start from the basic structure and gradually delve into the core methods.

Step 1: Basic Structure - Array + Linked List/Red-Black Tree

You can think of a HashMap as a cabinet with many drawers (this cabinet is the "array").

Array (cabinet): Each position is called a "bucket". The initial length of the array is fixed (default 16), which is the foundation for HashMap's fast lookup.
Linked List/Red-Black Tree (compartments inside the drawer): When you put something into a drawer, if the drawer holds only one item, you find it directly. But if there are multiple items, you need to organize them further within the drawer. Before Java 8, this "organization" used a "linked list". In Java 8 and later, when the linked list becomes too long (exceeds 8), to improve query efficiency, it automatically converts to a more efficient data structure—a "red-black tree".

Therefore, the underlying structure of a HashMap is an array, where each element may be the head node of a linked list or the root node of a red-black tree.

Step 2: Key Process - How to Store Data (put method)

Suppose we execute map.put("apple", 10). This process consists of the following core steps:

Calculate Index (deciding which drawer to put it in):
- First, HashMap calls the hashCode() method of the key "apple" to obtain an integer hash value. This value can be very large (e.g., 123456789).
- Then, HashMap does not directly use this huge hash value as the array index. Instead, it performs a secondary calculation on this hash value using a "perturbation function". The purpose of the perturbation function is to allow every bit of the hash value to participate in subsequent calculations, reducing "hash collisions" (different keys calculating the same array index).
- Finally, the perturbed hash value is combined with (array length - 1) using a & (AND) operation to obtain the final array index for storage.
- Simplified formula: index = HashCode(key) & (ArrayLength - 1)
- Assuming the calculated index is 2, the key-value pair will attempt to be placed at position 2 in the array.
Handle Collision (solving the problem of multiple items in a drawer):
- Scenario 1: The current position is empty. This is the simplest case; directly create a node (containing key, value, hash value, etc.) and place it in this position.
- Scenario 2: The current position is not empty (a hash collision occurs). At this point, comparisons are needed:
  - a. First, compare whether the hash value of the new key equals the hash value of the existing node's key.
  - b. If the hash values are equal, then compare whether the two keys are "equal" (using the equals() method).
  - If both are equal: It means it's the same key, so the new value overwrites the old value.
  - If the keys are not equal: It means they are different keys but calculated the same index, which is a true "hash collision". In this case, the "separate chaining" method is used, adding the new key-value pair as a node to the end of the linked list (or into the red-black tree) of the current bucket.

Step 3: Key Process - How to Retrieve Data (get method)

The process of map.get("apple") is the reverse of put, following the same logic:

Calculate the hash value of the key "apple", use the same perturbation function and & operation to locate the array index (e.g., still 2).
Check the bucket at that index position:
- If it is empty, return null.
- If it is not empty, traverse the linked list or red-black tree in that bucket. First compare hash values; if the hash values are the same, then use the equals() method to compare the keys. If an equal key is found, return its corresponding value.

Step 4: Dynamic Resizing - What to do when the cabinet is full

HashMap is not fixed in size. When the number of stored key-value pairs exceeds a threshold (threshold = array capacity × load factor, default load factor is 0.75), HashMap performs "resizing" to reduce hash collisions and maintain efficiency.

Trigger Condition: Resizing is triggered when size > capacity * loadFactor (e.g., default 16*0.75=12).
Resize Operation:
- Create a new array, twice the size of the old one (default 16 -> 32).
- Iterate through every bucket and every node within each bucket in the old array, recalculating the index position of each node in the new array. Because the array length has changed, the result of index = HashCode(key) & (NewArrayLength - 1) is likely different from before.
- "Migrate" all nodes to the correct positions in the new array. This is a performance-intensive operation.

Step 5: Java 8 Optimization - Red-Black Tree

Before Java 8, when severe hash collisions occurred, the linked list in a HashMap could become very long, causing query efficiency to degrade from O(1) to O(n). Java 8 introduced the red-black tree to solve this problem:

Conversion Condition: When the number of nodes in the same bucket exceeds a certain threshold (TREEIFY_THRESHOLD, default 8) and the current array length reaches a certain minimum value (MIN_TREEIFY_CAPACITY, default 64), the linked list is converted to a red-black tree.
Degradation Condition: When, due to resizing or deletion operations, the number of nodes in the red-black tree becomes too low (less than UNTREEIFY_THRESHOLD, default 6), the red-black tree degenerates back to a linked list.
Purpose: To improve the worst-case query time complexity from O(n) to O(log n), significantly enhancing performance.

Summary:
The core principle of HashMap is a structure of "array + linked list/red-black tree". It quickly locates the array index through the key's hash value. It stores data using the separate chaining method to resolve hash collisions, and ensures efficient query and insertion performance (approximately O(1)) in most cases through dynamic resizing and the introduction of the red-black tree. Understanding the role of the hashCode() and equals() methods (determining the index position and judging key equality) is key to mastering HashMap.