Detailed Explanation of TCP Fast Recovery Mechanism

Detailed Explanation of TCP Fast Recovery Mechanism

Knowledge Point Description
TCP's Fast Recovery mechanism is a crucial component of the TCP congestion control algorithm, working in tandem with the Fast Retransmit mechanism. Its primary goal is to recover data transmission more efficiently upon detecting packet loss (judged by duplicate ACKs) while the network is not completely paralyzed. It avoids having the TCP connection fall back to the time-consuming Slow Start phase, thereby improving network bandwidth utilization.

Core Problem
With only the Fast Retransmit mechanism, when the sender receives 3 duplicate ACKs and retransmits the lost packet, it reduces the congestion window (cwnd) to 1 Maximum Segment Size (MSS) and enters the Slow Start phase. This approach is overly conservative because the occurrence of duplicate ACKs indicates the network is still capable of transmitting data (the receiver is still receiving subsequent packets). Drastically reducing the window to 1 at this point severely degrades throughput.

Solution: Fast Recovery
Fast Recovery initiates immediately after Fast Retransmit. Its core idea is: during the retransmission of the lost packet, instead of shrinking the window to 1 and entering Slow Start, it adjusts the window in a more gentle manner to maintain a relatively stable number of packets "in flight" in the network, thereby smoothly resuming the data flow.

Step-by-Step Explanation

Step 1: Trigger Condition – Fast Retransmit

The sender transmits multiple packets sequentially, e.g., with sequence numbers 1, 2, 3, 4, 5.
Assume packet #2 is lost in the network, while packets #3, #4, #5 successfully reach the receiver.
For each packet arriving with a sequence number higher than the expected one (i.e., packet #2), such as #3, #4, #5, the receiver replies with an acknowledgment for packet #2 (i.e., ACK #2). These repeated acknowledgments for the same packet are "duplicate ACKs."
When the sender receives the 3rd duplicate ACK (the 3rd ACK #2), it infers that packet #2 is likely lost, not just out of order. At this moment, Fast Retransmit is triggered: the sender immediately retransmits packet #2 without waiting for the retransmission timer to expire.

Step 2: Enter Fast Recovery State and Adjust Windows

Simultaneously with performing Fast Retransmit (retransmitting packet #2), TCP enters the Fast Recovery state.
The key operations involve setting new values for the congestion window (cwnd) and the slow start threshold (ssthresh):
- ssthresh = cwnd / 2: Sets the slow start threshold to half of the current congestion window. This records the "safe" window size when congestion is detected.
- cwnd = ssthresh + 3 * MSS: Why this formula?
  - ssthresh: Represents our new estimate of the current network capacity.
  - + 3 * MSS: This is because we received 3 duplicate ACKs. Each duplicate ACK implies that one packet has left the network (successfully received by the receiver) and a new space has opened in the receiver's buffer. Therefore, we can assume the network's capacity allows us to "inject" 3 new packets at this time. This operation effectively "tricks" the system to maintain the number of packets in the network.

Step 3: Data Transmission During Fast Recovery

At this point, the congestion window (cwnd) is set to a new value (e.g., changing from 20 MSS to 10 + 3 = 13 MSS).
Under this new cwnd limit, the sender can continue sending new packets (as long as sequence numbers allow). This is fundamentally different from Slow Start, which almost halts new transmissions after a retransmission, whereas Fast Recovery allows the data flow to continue.
For each additional duplicate ACK received by the sender (the 4th, 5th, ...), it increases cwnd by 1 MSS. This is because each extra duplicate ACK similarly indicates another packet successfully reached the receiver, meaning the network can accommodate one more packet. This helps maintain the amount of data in the pipeline.

Step 4: Exiting Fast Recovery – Receiving a New ACK

When the sender's retransmitted packet (packet #2) finally arrives at the receiver, the receiver may have already buffered packets #3, #4, #5.
The receiver checks its buffer and finds it can now acknowledge up to packet #5 consecutively. Thus, it replies to the sender with a cumulative acknowledgment, e.g., ACK #6 (indicating it expects packet with sequence number 6).
This ACK #6 is a new, higher-level ACK (it acknowledges previously unacknowledged data), not a duplicate ACK. This is the signal to exit the Fast Recovery state.
Upon receiving this new ACK, the sender performs the following operations:
- cwnd = ssthresh: Sets the congestion window to the slow start threshold calculated in Step 2.
- Exits the Fast Recovery state.

Step 5: Post-Recovery Phase – Congestion Avoidance

After exiting Fast Recovery, cwnd is set to ssthresh. At this point, TCP enters the Congestion Avoidance phase.
In the Congestion Avoidance phase, cwnd growth is no longer exponential (Slow Start), but linear (cwnd increases by 1 MSS per Round-Trip Time, RTT), known as "Additive Increase," allowing more cautious probing of the remaining bandwidth.

Summary
The brilliance of the Fast Recovery mechanism lies in its distinction between timeout-triggered retransmission and duplicate-ACK-triggered retransmission. A timeout typically indicates more severe congestion (the network might be down), necessitating an aggressive fallback to Slow Start. Duplicate ACKs indicate the network is still operational with only a few packet losses, warranting a smoother recovery strategy. This significantly improves TCP performance under mild congestion. Together with Fast Retransmit, Slow Start, and Congestion Avoidance, it forms the TCP congestion control system.