Detailed Explanation of WebSocket Handshake Process and Heartbeat Mechanism

Detailed Explanation of WebSocket Handshake Process and Heartbeat Mechanism

I. Knowledge Point Description
The WebSocket handshake is a crucial process for establishing a WebSocket connection. It switches the protocol from HTTP to WebSocket via the HTTP upgrade mechanism, subsequently enabling full-duplex communication. The heartbeat mechanism is used to maintain connection liveliness, detect connection health status, and prevent silent disconnections due to timeouts or network issues. Together, these two mechanisms ensure the stability and real-time performance of WebSocket connections.

II. Detailed Explanation of Knowledge Points

Part 1: WebSocket Handshake Process

1. Handshake Overview

  • The WebSocket handshake is an upgrade request based on HTTP/1.1. The client initiates a special HTTP request, the server responds with confirmation, and the protocol is then upgraded to WebSocket.
  • The entire handshake process is a single HTTP request-response exchange, but it's not a regular HTTP request; it's a protocol upgrade request.
  • After the handshake completes, all subsequent communication uses the WebSocket protocol's data frame format, not the HTTP protocol.

2. Client Handshake Request
The client's handshake request contains the following key elements:

GET /chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat

Detailed Explanation of Header Fields:

  • Upgrade: websocket

    • Indicates the client wishes to upgrade to the WebSocket protocol.
    • This is the signature header for protocol upgrade.
  • Connection: Upgrade

    • Indicates this is a connection upgrade request.
    • Must be used together with the Upgrade header.
  • Sec-WebSocket-Key: [16-byte random value, base64 encoded]

    • This is the security key of the handshake, preventing cache proxies from forwarding WebSocket traffic.
    • The client generates a 16-byte random number, encodes it in base64, and sends it.
    • This value is random and unique for each connection.
    • Note: This is not an encryption key, but merely an anti-forgery token.
  • Sec-WebSocket-Version: 13

    • Specifies the WebSocket protocol version.
    • 13 represents RFC 6455 (the current standard).
  • Sec-WebSocket-Protocol: [Subprotocol list]

    • Optional field.
    • A comma-separated list of subprotocols supported by the client.
    • The server selects one from the list to respond with, or ignores it.
  • Origin: [Origin address]

    • Automatically added in browser environments.
    • Used for cross-origin security checks.

3. Server Handshake Response
After verifying the request, the server returns an upgrade response:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat

Detailed Explanation of Header Fields:

  • Status Code: 101 Switching Protocols

    • Indicates successful protocol switching.
    • This is the only valid success response code.
  • Upgrade: websocket and Connection: Upgrade

    • Confirms the protocol upgrade.
  • Sec-WebSocket-Accept: [Calculated value]

    • This is the core verification step of the handshake.
    • The server concatenates the received Sec-WebSocket-Key with a fixed GUID, calculates the SHA-1 hash, and finally base64 encodes it.
    • Specific calculation formula: base64(sha1(Sec-WebSocket-Key + "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"))
    • The fixed GUID "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" comes from RFC 6455.
    • The client must verify this value to ensure it's a legitimate WebSocket server.
  • Sec-WebSocket-Protocol: [Selected subprotocol]

    • Optional field.
    • The server selects one subprotocol from those supported by the client.
    • If none is selected, this field is omitted.

4. Handshake Failure Cases

  • If the server does not support WebSocket, it returns a non-101 status code (e.g., 200, 404, etc.).
  • If Sec-WebSocket-Accept verification fails, the client must close the connection.
  • If there's a version mismatch, the server returns a 400 response containing Sec-WebSocket-Version.

5. In-depth Analysis of Handshake Security Mechanism
Why is the Sec-WebSocket-Key/Accept mechanism needed?

  • Preventing Cache Attacks: The main purpose is to prevent opaque proxy servers from caching the WebSocket handshake.
  • Specific Scenario: If a malicious client sends an HTTP request resembling WebSocket, a proxy might cache the response. Subsequent genuine WebSocket requests might then receive the cached response, failing to establish a connection.
  • Role of the GUID: The fixed UUID ensures the calculation process is predictable, preventing the server from using a simple fixed value.
  • Randomness Requirement: The Key is different for each connection, preventing replay attacks.

Part 2: WebSocket Heartbeat Mechanism

1. Necessity of the Heartbeat Mechanism

  • Network Intermediate Device Timeouts: Firewalls, proxies, load balancers, etc., may automatically close idle connections.
  • Detecting Connection Health: Timely detection of network interruptions, server crashes, etc.
  • Maintaining NAT Mappings: In NAT environments, periodically sending data maintains NAT mapping entries.
  • Avoiding False Connections: If a client or server crashes, the other end might still believe the connection is valid.

2. Implementation Methods of Heartbeat

Method 1: Ping/Pong Control Frames (Recommended)

  • The WebSocket protocol defines dedicated control frames: Ping (opcode 0x9) and Pong (opcode 0xA).
  • Ping frames can contain application data; Pong frames must contain the same data as the corresponding Ping.
  • The specification requires: upon receiving a Ping, a Pong must be replied; Pong can also be actively sent as a one-way heartbeat.

Ping Frame Sending Example:

// Client or server sends Ping
function sendPing() {
    // WebSocket API provides the ping() method
    if (typeof ws.ping === 'function') {
        ws.ping('heartbeat');
    } else {
        // Or manually construct a Ping frame
        const pingFrame = new Uint8Array([0x89, 0x00]); // FIN=1, RSV=0, Opcode=9, Mask=0, Length=0
        ws.send(pingFrame);
    }
}

Pong Frame Automatic Handling:
Modern browser WebSocket APIs automatically reply with Pong, but you can listen for it:

ws.on('pong', (data) => {
    console.log('Received pong response');
});

Method 2: Application-layer Heartbeat

  • Custom message types, such as {type: 'ping', timestamp: Date.now()}.
  • The other party replies with {type: 'pong', timestamp: ...} upon receiving.
  • Suitable for older environments that do not support Ping/Pong.

3. Heartbeat Parameter Configuration

Key Parameters:

  • Heartbeat Interval (heartbeatInterval): Interval for sending heartbeats, typically 25-30 seconds.
  • Timeout (timeout): Maximum waiting time for a response, typically 2-3 times the heartbeat interval.
  • Retry Count (maxRetries): Number of retries after timeout.

Configuration Considerations:

  1. Network Intermediate Device Timeout: Common values are 30-60 seconds; heartbeat interval should be less than this.
  2. Application Real-time Requirements: Shorter intervals for higher real-time requirements.
  3. Server Performance: High frequency of heartbeats affects performance with a large number of connections.
  4. Mobile Network Characteristics: Consider power-saving modes, network switching.

4. Implementation Steps of the Heartbeat Mechanism

Step 1: Initialize Timers

class WebSocketHeartbeat {
    constructor(ws, options = {}) {
        this.ws = ws;
        this.interval = options.interval || 25000; // 25 seconds
        this.timeout = options.timeout || 5000;    // 5 seconds timeout
        this.maxRetries = options.maxRetries || 3;
        this.retryCount = 0;
        this.pingTimeout = null;
        this.heartbeatInterval = null;
    }
}

Step 2: Start Heartbeat

start() {
    this.stop(); // First, stop any existing timers
    
    this.heartbeatInterval = setInterval(() => {
        this.sendPing();
        this.waitForPong();
    }, this.interval);
}

Step 3: Send Ping and Wait for Pong

sendPing() {
    if (this.ws.readyState === WebSocket.OPEN) {
        // Record sending time
        this.lastPingTime = Date.now();
        
        // Send Ping
        if (typeof this.ws.ping === 'function') {
            this.ws.ping();
        } else {
            // Application-layer heartbeat
            this.ws.send(JSON.stringify({type: 'ping', id: Date.now()}));
        }
    }
}

waitForPong() {
    // Clear previous timeout timer
    if (this.pingTimeout) clearTimeout(this.pingTimeout);
    
    // Set new timeout timer
    this.pingTimeout = setTimeout(() => {
        this.handleTimeout();
    }, this.timeout);
}

Step 4: Handle Pong Response

// Listen for Pong event
this.ws.onpong = () => {
    this.handlePong();
};

// Or listen to message event for application-layer pong
this.ws.onmessage = (event) => {
    try {
        const data = JSON.parse(event.data);
        if (data.type === 'pong') {
            this.handlePong();
        }
    } catch (e) {
        // Non-JSON message
    }
};

handlePong() {
    // Response received, clear timeout timer
    if (this.pingTimeout) clearTimeout(this.pingTimeout);
    this.retryCount = 0; // Reset retry count
    
    // Calculate round-trip time (optional)
    if (this.lastPingTime) {
        const rtt = Date.now() - this.lastPingTime;
        console.log(`Heartbeat RTT: ${rtt}ms`);
    }
}

Step 5: Handle Timeout and Reconnection

handleTimeout() {
    this.retryCount++;
    
    if (this.retryCount > this.maxRetries) {
        // Exceeded maximum retries, close connection
        console.error('Heartbeat timeout, connection closed');
        this.ws.close();
        this.stop();
        
        // Trigger reconnection mechanism
        this.reconnect();
    } else {
        // Retry sending heartbeat
        console.warn(`Heartbeat timeout, retry ${this.retryCount}`);
        this.sendPing();
        this.waitForPong();
    }
}

reconnect() {
    // Implement exponential backoff reconnection
    const delay = Math.min(1000 * Math.pow(2, this.reconnectAttempts), 30000);
    setTimeout(() => {
        new WebSocket(url); // Reconnect
    }, delay);
}

Step 6: Cleanup Resources

stop() {
    if (this.heartbeatInterval) {
        clearInterval(this.heartbeatInterval);
        this.heartbeatInterval = null;
    }
    if (this.pingTimeout) {
        clearTimeout(this.pingTimeout);
        this.pingTimeout = null;
    }
}

5. Optimization Strategies for the Heartbeat Mechanism

Strategy 1: Dynamic Heartbeat Interval

// Adjust heartbeat interval based on network condition
adjustIntervalBasedOnNetwork(connectionQuality) {
    if (connectionQuality === 'poor') {
        this.interval = 15000; // Poor network, 15 seconds
    } else if (connectionQuality === 'good') {
        this.interval = 30000; // Good network, 30 seconds
    }
}

Strategy 2: Adaptive Timeout

// Adjust timeout based on historical RTT
updateTimeoutBasedOnRTT() {
    const avgRTT = this.calculateAverageRTT();
    this.timeout = Math.max(avgRTT * 3, 3000); // 3 times RTT, minimum 3 seconds
}

Strategy 3: Idle Detection Optimization

  • Reset heartbeat timer when data is sent/received.
  • Avoid unnecessary heartbeats during active communication.

Strategy 4: Mobile Network Optimization

  • Consider mobile network switching (WiFi to 4G).
  • Send heartbeat on visibilitychange event (page switching).
  • Reduce heartbeat frequency when the screen is off.

6. Server-side Heartbeat Implementation

Node.js Example (ws library):

const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 8080 });

wss.on('connection', (ws) => {
    console.log('Client connected');
    
    // Initialize heartbeat
    let isAlive = true;
    
    // Periodically send Ping
    const heartbeatInterval = setInterval(() => {
        if (!isAlive) {
            clearInterval(heartbeatInterval);
            return ws.terminate();
        }
        
        isAlive = false;
        ws.ping();
    }, 30000);
    
    // Listen for Pong response
    ws.on('pong', () => {
        isAlive = true;
    });
    
    // Cleanup timer
    ws.on('close', () => {
        clearInterval(heartbeatInterval);
    });
});

7. Common Issues and Solutions

Issue 1: High CPU Usage Due to Heartbeat

  • Optimization: Use setTimeout instead of setInterval to avoid timer stacking.
  • Optimization: Skip the next heartbeat if data is sent within the interval.

Issue 2: False Pong Responses

  • Scenario: Network devices might automatically reply with Pong.
  • Solution: Include a random number in Ping, verify the random number in Pong.

Issue 3: Heartbeat Storm with Large-scale Connections

  • Scenario: Thousands of connections sending heartbeats simultaneously.
  • Solution: Stagger sending times, set different phase offsets for each connection.
// Set different start times for each connection
const phaseOffset = Math.random() * this.interval;
setTimeout(() => {
    this.startHeartbeat();
}, phaseOffset);

8. Relationship Between Handshake and Heartbeat

  1. Timing Relationship: Heartbeat starts only after a successful handshake.
  2. State Synchronization: Heartbeat failure may trigger reconnection, which requires a new handshake.
  3. Error Handling: Handshake failure leads directly to reconnection; heartbeat failure involves retries first, then reconnection.
  4. Resource Management: Heartbeat timers must be cleaned up when the connection closes.

III. Summary
The WebSocket handshake securely establishes a connection via the HTTP upgrade mechanism, with the Sec-WebSocket-Key/Accept mechanism preventing cache attacks. The heartbeat mechanism maintains connection liveliness through periodic Ping/Pong frames, requiring reasonable configuration of interval, timeout, and retry strategies. Together, they ensure the reliability and real-time performance of WebSocket connections, forming the foundational guarantee for real-time web applications. In practical applications, heartbeat strategies need to be dynamically adjusted based on network environment, device characteristics, and application requirements to balance real-time performance and resource consumption.