Principles and Implementation of Trie Tree (Prefix Tree)

Principles and Implementation of Trie Tree (Prefix Tree)

1. Knowledge Point Description
A Trie tree (pronounced like "try"), also known as a dictionary tree or prefix tree, is a highly efficient tree-shaped data structure specifically designed for handling sets of strings. Its core idea is to utilize the common prefixes of strings to reduce query time. It is commonly used in scenarios such as search engine autocomplete, spell checking, IP routing tables, and more. The characteristics of a Trie tree are:

  • The root node does not contain a character; each node other than the root contains only one character.
  • The characters along the path from the root node to a particular node, when concatenated, form the string corresponding to that node.
  • All child nodes of any node contain different characters.

2. Basic Structure of a Trie Tree

  1. Node Structure Design
    Each Trie node contains two core components:
  • An array (or map) of child node pointers: Stores pointers to the next character.
  • An end flag: Marks whether the path from the root to the current node forms a complete word.

Taking words composed of lowercase letters as an example, a node can be defined as follows:

class TrieNode:
    def __init__(self):
        self.children = {}  # Mapping from character to child node
        self.is_end = False  # Flag indicating end of a word

3. Basic Operations of a Trie Tree

  1. Insert Operation (Building Character by Character)
    Steps:
  • Start from the root node.
  • Traverse each character of the word to be inserted:
    • If the current character exists among the child nodes of the current node, move to the corresponding child node.
    • If it does not exist, create a new node and add it to the child nodes of the current node.
  • After processing all characters, mark the last node as the end of a word.

Example: Inserting "cat"

Root node → Check 'c': Not present → Create node for 'c'
Node 'c' → Check 'a': Not present → Create node for 'a'
Node 'a' → Check 't': Not present → Create node for 't'
Node 't' → Set is_end=True
  1. Search Operation (Exact Match)
    Steps:
  • Start from the root node.
  • Traverse each character of the word to be searched:
    • If the current character is not among the child nodes, return False.
    • Otherwise, move to the corresponding child node.
  • Check the is_end flag of the final node; it must be True to indicate the word exists.
  1. Prefix Search Operation
    The steps are similar to the search operation, but there is no need to check the is_end flag. As long as all prefix characters can be traversed, it indicates the prefix exists.

4. Detailed Implementation of Trie Tree

Complete code implementation:

class Trie:
    def __init__(self):
        self.root = TrieNode()
    
    def insert(self, word: str) -> None:
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True
    
    def search(self, word: str) -> bool:
        node = self.root
        for char in word:
            if char not in node.children:
                return False
            node = node.children[char]
        return node.is_end
    
    def startsWith(self, prefix: str) -> bool:
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False
            node = node.children[char]
        return True

5. Complexity Analysis of Trie Tree

  1. Time Complexity
  • Insertion: O(L), where L is the length of the word.
  • Search: O(L)
  • Prefix search: O(L)
    The time complexity of all operations depends only on the word length and is independent of the number of words in the dictionary.
  1. Space Complexity
  • Worst case: O(N × L), where N is the number of words and L is the average word length.
  • In practical applications, due to prefix sharing, space consumption is typically better than the theoretical worst case.

6. Optimized Variants of Trie Tree

  1. Compressed Trie Tree
    Reduces space usage by merging nodes that have only one child.

  2. Double-Array Trie
    Uses two arrays, base and check, to represent state transitions, saving memory space.

  3. Triple-Array Trie
    Further optimization based on the double-array Trie by adding a tail array to handle suffixes.

7. Application Scenarios of Trie Tree

  1. Autocomplete Systems
    Quickly find all words starting with a given prefix when the prefix is entered.

  2. Spell Checking
    Quickly determine whether a word exists in a dictionary.

  3. IP Routing Lookup
    Longest prefix matching for router forwarding decisions.

  4. Word Games
    Word validation in text games such as Boggle.

The Trie tree employs a space-for-time strategy, providing efficient query performance in string processing scenarios, particularly for operations related to prefixes.