TCP Keep-alive Mechanism
Description:
The TCP keep-alive mechanism is a liveliness mechanism used to detect whether an idle TCP connection is still valid. In practical network communication, the two connected endpoints may experience an actual connection interruption due to factors such as timeout policies of intermediate network devices (e.g., firewalls, NAT), peer host crashes, or reboots, leaving the local end unaware. Keep-alive periodically sends probe packets to check the validity of the connection, avoiding the maintenance of "half-open connections" and thereby timely releasing resources.
Detailed Explanation:
-
Why is keep-alive needed?
- Intermediate devices in the network (e.g., firewalls, NAT routers) typically set idle timeout periods for TCP connections. If a connection remains idle for an extended period, these devices may discard the connection state, causing subsequent data packets to be dropped.
- The peer host may crash or reboot unexpectedly, but the local end cannot actively detect this, still considering the connection valid, leading to failures when sending data over an invalid connection.
- Servers need to promptly clean up invalid connections to free up resources (e.g., ports, memory).
-
How keep-alive works:
Keep-alive is disabled by default in most operating systems and must be manually enabled via the socket API. Its operation is controlled by three core parameters (using Linux as an example):tcp_keepalive_time: The duration of idle time before sending the first probe packet (default: 7200 seconds).tcp_keepalive_intvl: The interval between sending probe packets (default: 75 seconds).tcp_keepalive_probes: The number of consecutive probe packets to send (default: 9 times).
Specific Steps:
- Step 1: When the connection idle time reaches
tcp_keepalive_time, send the first probe packet (empty content, sequence number equal to the current sequence number minus 1). - Step 2: If an ACK response is received from the peer, the connection is considered normal. Reset the idle timer and return to Step 1.
- Step 3: If no ACK is received, wait for
tcp_keepalive_intvland resend the probe packet. Repeat this process untiltcp_keepalive_probesis reached. - Step 4: If no response is received for all probes, the connection is deemed invalid. Close the connection and return an error.
-
Difference between keep-alive and application-layer heartbeat packets:
- Keep-alive: Implemented by the operating system kernel, transparent to the application, but lacks flexibility (e.g., default intervals are too long).
- Heartbeat packets: Custom-defined at the application layer (e.g., WebSocket's Ping/Pong frames), allowing flexible control over frequency and data content, but requiring the application to implement timeout handling.
-
Considerations:
- Frequent keep-alive packets may increase network load, so parameters should be adjusted based on the scenario.
- In HTTP/1.1,
Connection: keep-aliveis a mechanism for connection reuse and is unrelated to TCP keep-alive (which is a transport-layer mechanism).
Summary:
TCP keep-alive is a conservative connection liveliness solution suitable for scenarios requiring long-term connection maintenance with infrequent data exchange (e.g., database long connections). In practical development, if finer control is needed, it is recommended to combine it with application-layer heartbeat packets.