Principles and Implementation of Trie (Prefix Tree)

Principles and Implementation of Trie (Prefix Tree)

Problem Description
Trie (also known as a prefix tree or digital tree) is a highly efficient tree-like data structure specifically designed for handling collections of strings. Its core concept is to utilize the common prefixes among strings to reduce query time. Typical applications include search engine autocomplete, spell checking, and IP routing tables. In interviews, the principles of Trie, implementation of its basic operations, and complexity analysis are often tested.


1. Basic Concepts of Trie

  • Node Structure: Each node contains two key components:
    • Child Node Pointer Array (or Map): Used to store references from the current character to the next level of nodes. For instance, if only lowercase letters are supported, the array length is 26.
    • End Flag (boolean type): Marks whether the path from the root node to the current node forms a complete word.
  • Root Node: Does not store a character; serves as the starting point for all words. For example, when inserting the word "cat", the path is: root → c → a → t (with the 't' node marked as the end).

2. Detailed Steps for Trie Operations
(1) Insertion Operation
Objective: Insert a word character by character into the Trie, ensuring that common prefixes share the same path.
Steps:

  1. Start from the root node and traverse each character of the word.
  2. If a child node corresponding to the current character exists, move to that child node.
  3. If the child node does not exist, create a new node and link it to the pointer array of the current node.
  4. After processing the last character, set the end flag at that node (e.g., isEnd=true).

Example: Inserting "cat" and "can"

  • Insert "cat": The root node creates a child node c → the c node creates a child node a → the a node creates a child node t, marking t as the end.
  • Insert "can": The root node already has a child node c → move to the c node; the c node already has a child node a → move to the a node; the a node creates a child node n, marking n as the end.

(2) Search Operation
Objective: Determine whether a word exists in the Trie.
Steps:

  1. Start from the root node and traverse downward character by character.
  2. If a child node corresponding to a character does not exist, immediately return false.
  3. After traversing all characters, check the end flag of the current node: if it is true, return true; otherwise, return false (indicating the word is a prefix rather than a complete word).

Example: Searching for "can"

  • root → c → a → n, the end flag of the n node is true, return true.
    Searching for "ca":
  • root → c → a, but the end flag of the a node is false, return false.

(3) Prefix Search
Objective: Determine whether any word in the Trie starts with a specified prefix (the difference from a full search is that it does not require checking the end flag).
Steps:

  1. Similar to the search operation, but after traversing all characters of the prefix, return true as long as the path exists.

Example: Prefix search for "ca"

  • root → c → a, the path exists, return true.

3. Complexity Analysis

  • Time Complexity:
    • Insertion/Search/Prefix Search: O(L), where L is the length of the word. Independent of the total number of words stored in the Trie.
  • Space Complexity:
    • Worst-case: O(N × L × K), where N is the number of words, L is the average length, and K is the size of the character set. However, due to prefix sharing, the actual space is typically less than that of a hash table.

4. Key Points for Code Implementation (Using Lowercase Letters as an Example)

class TrieNode:
    def __init__(self):
        self.children = [None] * 26  # 26 lowercase letters
        self.is_end = False

class Trie:
    def __init__(self):
        self.root = TrieNode()
    
    def _char_to_index(self, ch):
        return ord(ch) - ord('a')  # Map character to index 0-25
    
    def insert(self, word):
        node = self.root
        for ch in word:
            index = self._char_to_index(ch)
            if not node.children[index]:
                node.children[index] = TrieNode()  # Create new node
            node = node.children[index]
        node.is_end = True  # Mark word end
    
    def search(self, word):
        node = self.root
        for ch in word:
            index = self._char_to_index(ch)
            if not node.children[index]:
                return False
            node = node.children[index]
        return node.is_end  # Must be a complete word
    
    def startsWith(self, prefix):
        node = self.root
        for ch in prefix:
            index = self._char_to_index(ch)
            if not node.children[index]:
                return False
            node = node.children[index]
        return True  # Prefix existence is sufficient

5. Practical Application Scenarios

  • Input Method Suggestions: Quickly recommend candidate words when a prefix is entered.
  • Routing Table Matching: Longest prefix matching for IP addresses.
  • Spell Checking: Combined with DFS, it can provide error correction suggestions (e.g., whether a word becomes valid after modifying one character).

Through the above steps, the core principles and implementation of Trie are fully covered. Its advantage lies in the efficiency of prefix sharing, but note the potential space overhead when the character set is large (consider using a hash table to store child nodes instead).