Event Sourcing Pattern in Distributed Systems

Event Sourcing Pattern in Distributed Systems

Description:
Event Sourcing is an architectural pattern whose core idea is to not store the current state of an application directly, but rather to store the series of events that led to state changes. When a state query is needed, the current state is rebuilt by replaying these events in sequence. This pattern is commonly used in distributed systems for scenarios involving auditing, data lineage, and complex business logic.

Key Concepts:

  • Event: Represents a state change operation that has already occurred in the system (e.g., "User Registered", "Balance Deducted"). Events are immutable.
  • Event Store: A database specifically designed to store the event sequence, typically supporting sequential reads by aggregate ID.
  • Aggregate: A logical unit grouping related events (e.g., all events for a user account).
  • Snapshot: An intermediate state saved periodically to avoid replaying a large number of events.

Problem-Solving Process:

1. Limitations of Traditional Architecture

  • The traditional CRUD model updates the current state directly, with drawbacks including:
    • Loss of historical change records, making auditing difficult.
    • Potential data overwrites during concurrent modifications (e.g., optimistic lock conflicts).
  • Example: Deducting 30 yuan from an account balance of 100 yuan. The traditional method directly updates it to 70 yuan, making it impossible to trace the reason for the deduction.

2. Basic Principles of Event Sourcing

  • Do not save the state directly; instead, save the event stream.
  • The state is calculated by replaying events:
    Initial state: Balance = 100 yuan  
    Event 1: User Registered (balance initialized to 100)  
    Event 2: Deducted 30 yuan → Balance = 100 - 30 = 70  
    Event 3: Deposited 50 yuan → Balance = 70 + 50 = 120  
    
  • To query the current balance, all events must be replayed from the initial event.

3. Key Design Points of the Event Store

  • The event store must guarantee event order and idempotency.
  • Each event contains:
    • Aggregate ID (e.g., account ID)
    • Event Type (e.g., "BalanceDeducted")
    • Event Data (e.g., {amount: 30})
    • Version Number (used to detect concurrent conflicts).
  • Example Event Store Table Structure:
    Aggregate ID Version Event Type Event Data
    acc-001 1 Created {balance:100}
    acc-001 2 Deducted {amount:30}

4. Performance Optimization: Introducing Snapshots

  • Problem: Replaying all events is inefficient when there are too many events.
  • Solution: Periodically save snapshots (e.g., save the current state every 100 events).
  • For queries, start from the latest snapshot and only replay events after that snapshot.

5. Handling External Queries: The Projection Pattern

  • Problem: Complex queries (e.g., "query all accounts with balance > 50") cannot be implemented directly through the event stream.
  • Solution: Use projections to synchronize events in real-time to a read model (e.g., an SQL database):
    • The event store serves as the single source of truth.
    • Consumers subscribe to events and update the read model (e.g., generating a balance table).
    • The read model supports flexible queries and is eventually consistent with the event stream.

6. Fault Tolerance and Time Travel

  • Event Sourcing naturally supports:
    • Auditing: The state at any point in time can be restored by replaying events.
    • Failure Recovery: The system state can be rebuilt by replaying events.
    • Business Rollback: Simulating scenarios like "what if a certain event did not occur" (e.g., undoing an operation).

7. Applicable Scenarios and Challenges

  • Applicable Scenarios:
    • Systems requiring complete audit logs (e.g., finance, compliance domains).
    • Businesses needing rollback or compensation operations (e.g., order processes).
  • Challenges:
    • Version management for event schema changes.
    • Eventual consistency between the read model and the event stream.

Summary:
Event Sourcing provides powerful auditing and time-travel capabilities by storing a sequence of events rather than the current state. Core steps include designing the event store, calculating state through replay, and using snapshots and projections for performance optimization. Its complexity must be weighed against business requirements, making it suitable for scenarios requiring high traceability.