Specific Implementation of Hash Tables in C++ (unordered_map)
The primary implementation of hash tables in the C++ Standard Library is std::unordered_map. It is an associative container that stores key-value pairs and provides average constant-time complexity for lookup, insertion, and deletion operations.
1. Basic Structure and Template Parameters
std::unordered_map is a class template, declared roughly as follows:
template<
class Key,
class T,
class Hash = std::hash<Key>,
class KeyEqual = std::equal_to<Key>,
class Allocator = std::allocator<std::pair<const Key, T>>
> class unordered_map;
- Key: The type of the key.
- T: The type of the value.
- Hash: The type of the hash function object used to compute the hash value of a key. Defaults to
std::hash<Key>, for which the standard library provides specializations for fundamental types (like int, string). - KeyEqual: The type of the key equality comparison function object, used to compare keys for equality when hash collisions occur. Defaults to
std::equal_to<Key>(i.e., usingoperator==). - Allocator: The memory allocator.
3. Internal Data Structure: Bucket Array and Linked List
The typical implementation of unordered_map combines a dynamic array (bucket array) and singly linked lists.
- Bucket Array: A dynamically allocated array of pointers. Each array element is called a "bucket". The size of the bucket array is usually a prime number, which helps distribute hash values more uniformly.
- Linked List Nodes: Each bucket does not directly store key-value pairs but points to the head node of a linked list. This list stores all key-value pairs that hash to the same bucket. In common C++ standard library implementations (like GCC's libstdc++ and LLVM's libc++), this list is a singly linked list.
A simplified node structure might look like this:
struct node {
std::pair<const Key, T> data; // Stores the key-value pair, the key is const
node* next; // Pointer to the next node
// ... Possibly other implementation-specific metadata
};
The overall structure can be visualized as an "array + linked list" form:
Bucket Array: [0] -> nullptr
[1] -> node{K1,V1} -> node{K2,V2} -> nullptr
[2] -> nullptr
...
[n] -> node{K3,V3} -> nullptr
4. Core Operation Processes
Insertion Operation (insert/emplace/operator[])
- Calculate Hash Value: For a given key
key, use the hash function objectHashto compute its hash value:hash_value = Hash{}(key). - Map to Bucket Index: Map the hash value to an index in the bucket array via modulo operation (or other compression function):
bucket_index = hash_value % bucket_count. This index determines which bucket's linked list the key-value pair should be placed into. - Handle Collisions and Check for Duplicate Keys:
- Traverse the linked list corresponding to the calculated bucket index.
- For each node in the list, use the
KeyEqualcomparison function to check if the node's key is equal to the key to be inserted. - If an equal key is found: The operation concludes based on the insertion policy (e.g.,
insertfails and returns an iterator to the existing element, whileoperator[]modifies its value). - If no equal key is found: Create a new node to store the key-value pair, then insert this new node into the head (or tail) of the linked list (head insertion is more common for speed). This is a constant-time operation.
Lookup Operation (find)
- Calculate Hash Value and Bucket Index: Exactly the same as the first two steps of the insertion operation.
- Traverse the Linked List: Traverse the linked list corresponding to the calculated bucket index, comparing keys for equality node by node.
- Return Result: If an equal key is found, return an iterator pointing to that key-value pair; if the entire list is traversed without finding it, return the
end()iterator.
Deletion Operation (erase)
- Locate the Node: First perform steps similar to lookup to find the node to be deleted and its predecessor node (because it's a singly linked list, the predecessor is needed to modify the
nextpointer). - Modify the Linked List: Set the predecessor node's
nextpointer to point to the node after the one to be deleted. - Free Memory: Delete the node and free the memory it occupied.
5. Dynamic Resizing (Rehashing)
To maintain operational efficiency, unordered_map needs to resize (rehash) when the number of elements becomes too large, causing linked lists to become too long.
- Trigger Condition: Automatic resizing is triggered when the
load factor(i.e.,size() / bucket_count(), the number of elements divided by the number of buckets) exceeds themax_load_factor(default is approximately 1.0). Manual resizing can also be triggered by callingrehash. - Resizing Process:
- Allocate a new, larger bucket array (the new size is typically a prime number larger than the current bucket count).
- Traverse all buckets in the old bucket array and all nodes on the linked lists attached to each bucket.
- For each key-value pair in each node, recalculate its bucket index based on the new bucket array size (
new_index = hash_value % new_bucket_count). - Remove each node from the old list and insert it into the new position in the linked list of the corresponding bucket in the new bucket array.
- Performance Impact: Resizing is an O(n) operation (where n is the number of elements), so a single insertion operation can be O(n) in the worst case. However, using amortized analysis, the average time complexity over many insertions remains O(1).
6. Iterators
The iterator of unordered_map must be able to traverse all elements. Internally, it needs to maintain two pointers:
- One pointing to the current node (within some linked list).
- One pointing to the current bucket's position in the bucket array.
When iteration finishes the current linked list, the iterator must move to the head of the next non-empty bucket's linked list in the bucket array. Therefore, the order of traversal for anunordered_mapis unpredictable, depending on the distribution of key hash values and the bucket array size.
Summary
std::unordered_map in C++ resolves hash collisions via a "bucket array + singly linked list" structure, providing efficient key-value pair storage. Its core principle is quickly locating a bucket via the hash function and then performing precise operations within a short list. The dynamic resizing mechanism ensures stable performance. Understanding its internal structure helps in using this container more effectively in C++.