Cuckoo Hashing Principles and Implementation

Cuckoo Hashing Principles and Implementation

Problem Description
Cuckoo hashing is an open-addressing strategy for resolving hash table collisions. It uses two hash functions and two hash tables (or two parts of a single table) and finds empty slots for new elements by "kicking out" existing elements. Explain its core principles, insertion process, lookup and deletion operations, and analyze its advantages and disadvantages.


1. Core Concept
Cuckoo hashing is named after the brood parasitic behavior of cuckoos (kicking other eggs out of the nest). Its core design is as follows:

  • Two Hash Tables: Usually two parts of one table (Table1 and Table2), corresponding to two hash functions \(h_1(x)\) and \(h_2(x)\).
  • Two Candidate Positions per Key: A key \(x\) can only reside at the position \(h_1(x)\) or \(h_2(x)\).
  • Insertion Strategy: If both candidate positions are occupied, randomly kick out an existing element and re-insert it into its other candidate position. This process is performed recursively until all elements are stable or a maximum loop count is reached.

2. Detailed Insertion Process
Assume the current table size is \(m\), and the hash functions are \(h_1(x) = x \mod m\) and \(h_2(x) = (x \div m) \mod m\) (where \(\div\) denotes integer division).
Step Example: Insert key sequence [20, 50, 53, 75, 100, 67, 105] (table size \(m=8\)):

  1. Insert 20:

    • \(h_1(20)=4\), \(h_2(20)=2\), position 4 is empty → place directly.
    • Table state: Table[4]=20.
  2. Insert 50:

    • \(h_1(50)=2\), \(h_2(50)=6\), position 2 is empty → place.
    • Table state: Table[2]=50, Table[4]=20.
  3. Insert 53:

    • \(h_1(53)=5\), \(h_2(53)=6\), position 5 is empty → place.
    • Table state: Table[2]=50, Table[4]=20, Table[5]=53.
  4. Insert 75:

    • \(h_1(75)=3\), \(h_2(75)=9\mod8=1\), positions 3 and 1 are both empty → place at position 3.
  5. Insert 100:

    • \(h_1(100)=4\) (conflict, 20 is there), \(h_2(100)=12\mod8=4\) (conflict) → need to kick out an element.
    • Kick out 20 (at position 4), place 100 at position 4.
    • Re-insert 20: \(h_1(20)=4\) (occupied by 100), \(h_2(20)=2\) (occupied by 50) → kick out 50.
    • Re-insert 50: \(h_1(50)=2\) (empty), place directly.
    • Final table state: Table[2]=50, Table[4]=100, Table[5]=53, Table[3]=75.
  6. Insert 67:

    • \(h_1(67)=3\) (conflict, 75 is there), \(h_2(67)=8\mod8=0\) (empty) → kick 75 to its other position \(h_2(75)=1\) (empty) → place 67 at 3, place 75 at 1.
  7. Insert 105:

    • May trigger multiple rounds of kicking (detailed process omitted). If a cycle occurs (e.g., two keys repeatedly kick each other), resizing or rehashing is required.

3. Lookup and Deletion Operations

  • Lookup: Compute \(h_1(x)\) and \(h_2(x)\), check if \(x\) is stored at either position. Time complexity is \(O(1)\).
  • Deletion: Similar to lookup. After locating, mark as deleted (must avoid breaking subsequent lookup chains).

4. Key Issues and Optimizations

  • Kick Cycles: Insertion may enter an infinite loop (e.g., keys A and B repeatedly kick each other). Solutions:
    • Set a maximum kick count (e.g., 500). Exceeding this triggers resizing or rehashing.
    • Use more hash functions (as in variants of cuckoo hashing that support multiple hash functions).
  • Load Factor: Traditional hash tables often use a load factor of 0.7. Cuckoo hashing can support higher loads (e.g., above 0.9), but insertion efficiency decreases as load increases.
  • Space Overhead: Typically requires a larger table than linear probing to ensure stability.

5. Advantages and Disadvantages Analysis

  • Advantages:
    • Worst-case lookup time is \(O(1)\) (only need to check two fixed positions).
    • Maintains efficient lookups even under high load factors.
  • Disadvantages:
    • Insertion time is unpredictable; worst-case is \(O(n)\) (requiring rehashing).
    • Deletion operations require careful handling (need tombstone markers or lazy deletion).
    • Sensitive to hash function quality; poor functions lead to frequent conflicts.

Summary
Cuckoo hashing trades increased insertion complexity for stable lookup performance via dual hash functions and a kick-out mechanism. It is suitable for read-heavy, write-light scenarios sensitive to lookup latency (e.g., forwarding tables in network devices). Practical implementations must control kick counts and design collision-resistant hash functions.