Principles and Collision Resolution of Hash Tables

Principles and Collision Resolution of Hash Tables

A hash table is a data structure that uses a hash function to map keys to array indices, enabling efficient lookup. Its core idea is to convert a key into an array subscript, making the time complexity of search, insertion, and deletion operations close to O(1).

1. Hash Function
The hash function is the core of a hash table, responsible for converting keys of any size into array indices within a fixed range. An ideal hash function should satisfy:

  • Fast computation
  • Uniform distribution (to reduce collisions)
    For example, hashing the string "key": calculate the sum of the ASCII values of each character, then take the modulo of the array size.

2. Hash Collision
A collision occurs when different keys produce the same index through the hash function. For example:

  • Array size: 10
  • ASCII sum of key "ab": 97+98=195 → 195%10=5
  • ASCII sum of key "ba": 98+97=195 → 195%10=5
    Both map to the same index 5.

3. Collision Resolution Methods
3.1 Separate Chaining

  • Each array position maintains a linked list (or another container)
  • When a collision occurs, the new element is added to the end of the corresponding list
  • Example: After inserting "ab" and "ba", the linked list at index 5 contains two nodes

3.2 Open Addressing
When a collision occurs, search for the next available position according to a specific rule:

  • Linear Probing: Sequentially check the next positions (index+1, index+2...)
  • Quadratic Probing: Check index+1², index+2²... to avoid clustering
  • During lookup, compare sequentially along the probe sequence until found or an empty slot is encountered

4. Load Factor and Resizing
Load factor = number of elements / array size. When the load factor is too high (e.g., >0.75):

  • Collision probability increases, performance degrades
  • A larger array needs to be created, and all elements must be rehashed to new positions

5. Time Complexity Analysis

  • Ideal case (no collisions): O(1)
  • Worst case (all collisions): O(n)
  • Average case (with a good hash function): O(1)