Principles and Collision Resolution of Hash Tables
A hash table is a data structure that uses a hash function to map keys to array indices, enabling efficient lookup. Its core idea is to convert a key into an array subscript, making the time complexity of search, insertion, and deletion operations close to O(1).
1. Hash Function
The hash function is the core of a hash table, responsible for converting keys of any size into array indices within a fixed range. An ideal hash function should satisfy:
- Fast computation
- Uniform distribution (to reduce collisions)
For example, hashing the string "key": calculate the sum of the ASCII values of each character, then take the modulo of the array size.
2. Hash Collision
A collision occurs when different keys produce the same index through the hash function. For example:
- Array size: 10
- ASCII sum of key "ab": 97+98=195 → 195%10=5
- ASCII sum of key "ba": 98+97=195 → 195%10=5
Both map to the same index 5.
3. Collision Resolution Methods
3.1 Separate Chaining
- Each array position maintains a linked list (or another container)
- When a collision occurs, the new element is added to the end of the corresponding list
- Example: After inserting "ab" and "ba", the linked list at index 5 contains two nodes
3.2 Open Addressing
When a collision occurs, search for the next available position according to a specific rule:
- Linear Probing: Sequentially check the next positions (index+1, index+2...)
- Quadratic Probing: Check index+1², index+2²... to avoid clustering
- During lookup, compare sequentially along the probe sequence until found or an empty slot is encountered
4. Load Factor and Resizing
Load factor = number of elements / array size. When the load factor is too high (e.g., >0.75):
- Collision probability increases, performance degrades
- A larger array needs to be created, and all elements must be rehashed to new positions
5. Time Complexity Analysis
- Ideal case (no collisions): O(1)
- Worst case (all collisions): O(n)
- Average case (with a good hash function): O(1)