WebSocket Protocol and Principles and Implementation of Real-time Communication

WebSocket Protocol and Principles and Implementation of Real-time Communication

I. Description
WebSocket is a protocol for full-duplex communication over a single TCP connection, which simplifies data exchange between clients and servers, allowing the server to actively push data to clients. In the traditional HTTP protocol, communication can only be initiated by the client, whereas WebSocket establishes a persistent connection through a single handshake, enabling true bidirectional real-time communication. Its core principles include the protocol handshake, data frame format, heartbeat mechanism, and connection management.

II. Principle Analysis and Implementation Steps

1. Protocol Handshake (Handshake)

  • Purpose: Before establishing a WebSocket connection, the client and server need to upgrade the protocol through an HTTP handshake.
  • Client Request: The client sends a special HTTP GET request, with headers including:
    • Connection: Upgrade: Indicates the connection needs to be upgraded.
    • Upgrade: websocket: Specifies the protocol to upgrade to as WebSocket.
    • Sec-WebSocket-Key: A Base64-encoded 16-byte random value used for security verification.
    • Sec-WebSocket-Version: Specifies the WebSocket protocol version (e.g., 13).
  • Server Response: After validating the request, the server returns an HTTP 101 Switching Protocols response:
    • Status code is 101.
    • Includes Connection: Upgrade and Upgrade: websocket.
    • Sec-WebSocket-Accept: Generated by concatenating the client's Sec-WebSocket-Key with the fixed GUID "258EAFA5-E914-47DA-95CA-C5AB0DC85B11", performing SHA-1 hashing, and then Base64 encoding. The client verifies this value to ensure a successful handshake.

2. Data Frame Format (Data Framing)

  • Purpose: Data transmitted by WebSocket is split into frames, each with a specific format to support features like fragmentation and masking.
  • Frame Structure (Key Fields):
    • FIN (1 bit): Indicates if this is the final frame of a message.
    • Opcode (4 bits): Operation code, defines the frame type (e.g., 1 for text frame, 2 for binary frame, 8 for connection close, 9 for Ping, 10 for Pong).
    • Mask (1 bit): Indicates whether the payload data is masked (must be masked from client to server).
    • Payload length (7/7+16/7+64 bits): Length of the payload data, which may extend to 2 or 8 bytes depending on the value.
    • Masking-Key (0 or 4 bytes): Present when Mask is 1, used to decode the payload data.
    • Payload data: The actual transmitted data.
  • Masking Process: Data sent by the client must be masked using the Masking-Key to XOR the payload; the server uses the same key to unmask upon receipt, preventing proxy cache poisoning.

3. Heartbeat Mechanism (Heartbeat)

  • Purpose: To keep the connection alive and detect if the connection is still valid.
  • Ping/Pong Frames:
    • The server or client can send a Ping frame (Opcode 9), and the receiver must reply with a Pong frame (Opcode 10).
    • The Pong frame can contain the same application data as the Ping frame (e.g., a timestamp).
    • If a Pong response is not received within a certain period, the connection can be judged as broken and actively closed.

4. Connection Management

  • State Maintenance: The server needs to maintain all active WebSocket connections (e.g., using a socket list or Map structure).
  • Message Broadcasting: When data needs to be pushed to multiple clients, the server iterates through the connection list and sends data frames.
  • Exception Handling: Handle abnormal connection closures (e.g., network interruption), frame parsing errors, etc., to ensure resource release.

III. Implementation Example (Simplified Server Logic)
The following is a pseudo-code example illustrating the core process of a WebSocket server:

# 1. Handshake Handling
def handle_handshake(request):
    key = request.headers['Sec-WebSocket-Key']
    accept_key = base64_encode(sha1(key + "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"))
    response = (
        "HTTP/1.1 101 Switching Protocols\r\n"
        "Upgrade: websocket\r\n"
        "Connection: Upgrade\r\n"
        "Sec-WebSocket-Accept: " + accept_key + "\r\n\r\n"
    )
    return response

# 2. Data Frame Decoding
def decode_frame(data):
    fin = (data[0] & 0x80) != 0
    opcode = data[0] & 0x0F
    masked = (data[1] & 0x80) != 0
    length = data[1] & 0x7F
    offset = 2
    if length == 126:
        length = (data[2] << 8) + data[3]
        offset += 2
    elif length == 127:
        # Handle 64-bit length (omitted)
        pass
    if masked:
        masking_key = data[offset:offset+4]
        offset += 4
        payload = data[offset:offset+length]
        # Unmask: XOR each byte with masking_key[i % 4]
        decoded = bytearray([payload[i] ^ masking_key[i % 4] for i in range(length)])
        return opcode, decoded
    return opcode, data[offset:offset+length]

# 3. Main Loop (Listening for Messages)
while True:
    for socket in active_connections:
        data = socket.recv()  # Non-blocking read
        if data:
            opcode, payload = decode_frame(data)
            if opcode == 1:  # Text frame
                broadcast_message(payload)  # Broadcast to all clients
            elif opcode == 8:  # Close frame
                close_connection(socket)
            elif opcode == 9:  # Ping frame
                send_pong(socket, payload)  # Reply with Pong

IV. Summary
WebSocket upgrades the connection through a protocol handshake, uses a lightweight frame structure for data transmission, and employs a heartbeat mechanism to ensure connection reliability. Its core strength lies in bidirectional real-time communication capability, making it suitable for scenarios such as online chat, real-time gaming, and stock tickers. Implementation must strictly adhere to the frame format specification and properly manage the connection lifecycle.