Clock Synchronization and Logical Clocks in Distributed Systems

Clock Synchronization and Logical Clocks in Distributed Systems

Problem Description
In a distributed system, due to the physical separation of nodes, each node has its own independent hardware clock (physical clock). Even after calibration, these hardware clocks gradually drift apart due to slight differences in crystal oscillation frequencies (clock drift), leading to inconsistencies in each node's perception of the "current time." However, many distributed applications (such as determining event ordering, implementing distributed locks, fault diagnosis, etc.) rely on a consistent order of events. The clock synchronization problem explores how to make all nodes agree on the order of event occurrences in such a system.

Core Concepts and Challenges

Physical Clock Synchronization: The goal is to make the readings of all nodes' hardware clocks as close as possible to a standard time (e.g., UTC). However, achieving high-precision absolute time synchronization is very difficult and costly due to the uncertainty of network latency.
Logical Clocks: If an application only cares about the causal relationship between events (i.e., "event A happened before event B") and not the specific absolute time, then logical clocks can be used. Logical clocks assign logical timestamps to events that reflect causal relationships by exchanging information between nodes.

Now, let's explain the solutions step by step.

Step 1: Naive Idea for Physical Clock Synchronization and Its Challenges

The most intuitive idea is to have one node act as a time server, with other nodes requesting the time from it.

Process:
- Client A sends a time request to time server S at time T1 (according to A's local clock).
- Server S receives the request at time T2 (according to S's local clock) and immediately includes time T2 in the reply.
- Client A receives the reply at time T3 (according to A's local clock).
Calculating Offset: Assuming the network transmission times for the request and reply are approximately equal, both denoted as t. Then, client A can estimate the offset between its clock and the server's clock. T2 is approximately between T1 + t and T3 - t. Therefore, a simple offset θ can be estimated as: θ = T2 - [(T1 + T3) / 2]. Then A adjusts its clock forward or backward by θ.
Challenge: The core problem with this method is that network delay t is variable and uncertain. If the network is congested, t can be large, leading to significant estimation errors. Therefore, this simple method has limited accuracy, typically only reaching 10-100 milliseconds, making it difficult to meet the needs of applications requiring high-precision timing.

Step 2: More Precise Physical Clock Synchronization Protocols (e.g., NTP)

The Network Time Protocol (NTP) is a widely used clock synchronization protocol on the Internet. It uses more complex algorithms to counteract the uncertainty of network delay.

Multiple Measurements: An NTP client performs multiple round-trip communications with several time servers, collecting a large number of (T1, T2, T3) data pairs.
Filtering and Selection: The NTP algorithm filters out measurements with abnormally large delays (possibly caused by network congestion) because they are the least reliable.
Clock Filtering and Combination: For the filtered valid data, more precise statistical algorithms (such as the least squares method) are used to estimate clock offset and frequency drift (i.e., the trend of the local clock running fast or slow). It can not only correct the current time but also predict future drift, thereby fine-tuning the local clock's tick frequency.
Hierarchical Architecture: NTP uses a hierarchical (Stratum) structure. Stratum 0 represents the highest-precision clock sources (e.g., atomic clocks, GPS receivers). Stratum 1 servers synchronize directly with Stratum 0 sources, Stratum 2 servers synchronize with Stratum 1, and so on. This prevents all clients from accessing the limited number of high-precision sources, creating a scalable architecture.
In this way, NTP can achieve millisecond or even sub-millisecond synchronization accuracy under good network conditions.

Step 3: Logical Clocks—When Causality Matters More Than Absolute Time

For many distributed applications (such as distributed databases, version control systems), we only care about the order of events (causality), not whether they occurred at "3:01 PM" or "3:02 PM." Logical clocks are designed for this purpose.

Lamport Logical Clocks (1978)
- Core Idea: Each process maintains a monotonically increasing counter as its logical clock. Rules update this counter so that if event A happens before event B (denoted A -> B), then A's logical timestamp is guaranteed to be less than B's logical timestamp. Note: The converse is not true (a smaller timestamp does not necessarily mean it happened earlier).
- Rules:
  - Rule 1: Intra-process rule. Within the same process, for every event that occurs (e.g., computation, sending a message, receiving a message), the local logical clock counter C is incremented by 1.
  - Rule 2: Inter-process rule. When sending a message, the process sends the message m along with its current logical timestamp C. When the receiving process receives message m, it adjusts its logical clock to max(local clock, received message timestamp C_m) + 1.
- Example:
  - Process P1 sends message m at timestamp 1.
  - Process P2 receives message m at local timestamp 5. According to Rule 2, P2 adjusts its clock to max(5, 1) + 1 = 6.
- Limitation: Lamport clocks can only guarantee causal order (if A->B then L(A) < L(B)), but cannot determine causality from timestamps alone (L(A) < L(B) does not imply A->B). For example, two unrelated events may be assigned timestamps with a size relationship.

Step 4: Vector Clocks—Fully Capturing Causality

To overcome the limitations of Lamport clocks, vector clocks were proposed. They can truly capture causal relationships between events.

Core Idea: In a system with N processes, each process maintains a vector V of length N. V[i] represents the number of events that process Pi knows have occurred at process Pj.
Rules:
- Rule 1: Intra-process rule. Every time an event occurs at process Pi, it increments its own vector component V[i] by 1.
- Rule 2: Inter-process rule. When Pi sends a message, it attaches its current entire vector V to the message. When process Pj receives the message, for each component k in the vector, it executes Vj[k] = max(Vj[k], V_received[k]). Then, Pj applies Rule 1, incrementing Vj[j] by 1 (representing the event of receiving the message itself).
Comparison Rules: For two events A and B with vector clocks V_A and V_B respectively.
- A happens before B (A->B): If and only if all components of V_A are less than or equal to the corresponding components of V_B, and at least one component is strictly less.
- A and B are concurrent: If and only if V_A and V_B are incomparable. That is, there exist some components where V_A[i] > V_B[i] and others where V_A[j] < V_B[j].
Example:
- The system has two processes, P1 and P2.
- P1 has event A, changing its vector clock from (0,0) to (1,0).
- P1 sends message m to P2, attaching vector (1,0).
- P2 receives m after its local event B (vector (0,1)). Upon receipt, P2 first merges vectors: max((0,1), (1,0)) = (1,1), then increments for its receiving event C: (1, 2).
- Now, compare A(1,0) and B(0,1): (1,0) and (0,1) are incomparable (1>0 but 0<1), so events A and B are concurrent. Comparing A(1,0) and C(1,2): all components (1<=1, 0<=2) and at least one is strictly less (0<2), so A->C.

Conclusion
Clock synchronization is a fundamental problem in distributed systems.

Physical Clock Synchronization (e.g., NTP) addresses the question of "when did it happen," providing an absolute time reference, but its precision is limited by the network.
Logical Clocks (Lamport Clocks) provide a lightweight method to establish a partial order of events, guaranteeing causality, but cannot detect concurrency.
Vector Clocks fully capture causal relationships between events, able to definitively determine whether events are causally related or concurrent, but their overhead grows linearly with system size (number of processes).

In practical systems, the choice of which scheme to use depends on the specific needs of the application: whether high-precision absolute time is required, or only reliable event causality is needed.