Idempotence Design and Implementation in Distributed Systems

Idempotence Design and Implementation in Distributed Systems

Problem Description
In distributed systems, due to network latency, retry mechanisms, or duplicate client submissions, the same request may be sent to the server multiple times. Idempotence refers to the property that multiple executions of the same operation have the same effect as a single execution. For example, repeated calls to a payment interface should not result in the user being charged twice. Interviews often require designing a solution to guarantee idempotence and analyzing its applicable scenarios and limitations.

1. Why is Idempotence Needed?

Example Scenario: When a user clicks the "Submit Order" button, they might resubmit due to network jitter causing delayed responses; in microservices architecture, service A calling service B might retry due to timeouts, causing B to receive duplicate requests.
Core Issue: Repeated execution of non-idempotent operations (such as order creation, payment deduction) can lead to data inconsistency or business logic errors.

2. Core Principles of Idempotence

Mathematical Definition: If a function satisfies \(f(f(x)) = f(x)\), it is idempotent. In distributed systems, it can be understood as:
- First request: Execute the operation and return the result.
- Subsequent duplicate requests: Directly return the result of the first request without performing the actual operation.
Key Point: Idempotence relies on the server-side state, not on the duplicate nature of client requests.

3. How to Implement Idempotence?
Step 1: Identify Idempotent vs. Non-idempotent Operations

Naturally Idempotent: Queries (GET), deletion (DELETE), updates (e.g., SET operations) are usually idempotent.
Non-idempotent: Creation (POST), partial updates (PATCH), etc., require additional design.

Step 2: Common Implementation Solutions
Solution 1: Token Mechanism (Anti-duplicate Token)

Process:
1. The client first requests a unique Token (e.g., UUID) from the server. The server stores the Token in a cache with a short validity period (e.g., 5 minutes).
2. The client initiates a business request carrying the Token. The server checks if the Token exists in the cache:
  - If it exists: Delete the Token and execute the operation.
  - If it does not exist: Reject the request (indicating it has already been processed).
Applicable Scenarios: Frequent frontend interaction scenarios (e.g., order submission).
Limitations: Requires additional maintenance of Token state and adds one round of interaction.

Solution 2: Unique Index Constraint

Principle: Utilize database unique indexes to prevent duplicate data insertion.
- Example: Create a unique index for "order number + business field" in the order table. Duplicate insertion will throw an error, and the server catches the exception and directly returns the existing result.
Applicable Scenarios: Database write operations with unique identifiers (e.g., transaction serial numbers).
Limitations: Only applicable to insertion operations and requires the business itself to have a unique key.

Solution 3: State Machine Idempotence

Principle: Design business states as a one-way flow (e.g., "pending payment → paid → completed"). Check if the current state allows the operation before each execution.
- Example: After successful payment, the state changes to "paid". Duplicate payment requests are rejected because the state does not match.
Applicable Scenarios: Businesses with clear state transitions (e.g., ticket approval).
Limitations: Requires a carefully designed state machine and persistent state storage.

Solution 4: Pessimistic Locking / Optimistic Locking

Pessimistic Locking: Lock data using SELECT FOR UPDATE to prevent concurrent modifications.
Optimistic Locking: Add a version number field to the data. Validate the version number during updates (e.g., UPDATE table SET value=new_value, version=version+1 WHERE id=1 AND version=old_version).
Applicable Scenarios: Scenarios with frequent concurrent write operations (e.g., inventory deduction).

4. Solution Comparison and Selection Suggestions

Solution	Applicable Scenarios	Pros and Cons
Token Mechanism	Frontend interaction business (e.g., form submission)	Simple to implement, but requires maintaining Token state
Unique Index	Data insertion operations (e.g., order generation)	Relies on database capabilities, no additional code logic needed
State Machine	Stateful flow business (e.g., order process)	Highly coupled with business logic, requires state logic design
Optimistic Locking	High-concurrency updates (e.g., flash sale inventory)	Good performance, but requires retry or alert on conflicts

5. Practical Case: Idempotence Design for a Payment Interface

Requirement: User payment requests may be sent repeatedly due to network issues.
Design:
1. Generate a payment transaction number (unique ID) as the idempotence key.
2. Before payment, check if the transaction number has been processed:
  - If processed: Directly return the payment result.
  - If not processed: Execute the deduction and store the transaction number and result in the database (with a unique index to prevent duplicates).
3. Set a cache expiration time for the transaction number to avoid long-term storage occupation.

6. Summary

Idempotence is the cornerstone of fault tolerance in distributed systems. The appropriate solution should be selected based on business scenarios.
Core idea: Distinguish requests through unique identifiers and combine storage layers (databases, caches) or business logic (state machines) to avoid duplicate effects.
Interview assessment points: Ability to clearly distinguish between idempotent and non-idempotent operations and flexibly combine various technologies to solve problems.