Detailed Explanation of Zero-Copy Technology for Backend Performance Optimization

Detailed Explanation of Zero-Copy Technology for Backend Performance Optimization

Knowledge Point Description
Zero-copy is a technique that avoids copying data multiple times in memory, improving I/O performance by reducing CPU copy operations and unnecessary data movements. In scenarios like file transfers and network communication, traditional methods require multiple data copies, while zero-copy technology can reduce the number of copies from 4 to 2 or even 0, significantly lowering CPU usage and memory bandwidth consumption.

Performance Bottlenecks of Traditional File Transfer

  1. Data flow in traditional methods:

    • Disk file → kernel buffer (DMA copy)
    • Kernel buffer → user buffer (CPU copy)
    • User buffer → kernel socket buffer (CPU copy)
    • Socket buffer → NIC buffer (DMA copy)
  2. Issues:

    • 4 context switches (user mode/kernel mode switching)
    • 2 CPU copy operations consuming resources
    • Data copied repeatedly between kernel and user space

Zero-Copy Technology Implementation Solutions

Solution 1: mmap + write

  1. Implementation principle:

    • Use mmap() to map the kernel buffer to user space
    • Processes directly operate on the mapped area, eliminating one copy
    • Data flow: disk → kernel buffer → socket buffer → NIC
  2. Specific process:

    // Pseudo-code example
    FileChannel fileChannel = new FileInputStream("file.txt").getChannel();
    MappedByteBuffer mappedBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, fileChannel.size());
    
    SocketChannel socketChannel = SocketChannel.open();
    socketChannel.write(mappedBuffer);
    
  3. Optimization effect:

    • Reduces 1 CPU copy (4→3)
    • Still requires 4 context switches

Solution 2: sendfile System Call

  1. Linux 2.1+ introduces sendfile:

    • Data is transferred directly in kernel space
    • Completely avoids user space involvement
    • System call: sendfile(out_fd, in_fd, offset, count)
  2. Data flow:

    • Disk file → kernel buffer (DMA)
    • Kernel buffer → socket buffer (CPU copy)
    • Socket buffer → NIC (DMA)
  3. Optimization effect:

    • Reduces to 2 copies (entirely within kernel)
    • 2 context switches

Solution 3: sendfile + DMA Gather Copy

  1. Linux 2.4+ further optimization:

    • Introduces scatter/gather DMA capability
    • Kernel buffer data can be directly transmitted to NIC
    • Only passes data descriptors (file position, length information)
  2. Final data flow:

    • Disk file → kernel buffer (DMA)
    • Kernel buffer → NIC buffer (DMA)
    • Achieves true "zero CPU copy"
  3. Core technologies:

    • NIC supports Gather operation
    • Kernel buffer and NIC buffer share data descriptors

Practical Application Scenarios

  1. File download servers:

    # Nginx configuration
    sendfile on;
    tcp_nopush on;
    
  2. Kafka message transmission:

    • Extensively uses sendfile for log segment transfers
    • Enables efficient message persistence and network transmission
  3. Java NIO implementation:

    FileChannel.transferTo(0, fileChannel.size(), socketChannel);
    

Performance Comparison Data

  • Traditional method: ~60% CPU usage, 800MB/s throughput
  • Zero-copy: ~20% CPU usage, 1600MB/s throughput
  • Performance improvement: CPU usage reduced by 2/3, throughput doubled

Notes

  1. Applicable scenarios:

    • Large file transfers (>4KB shows significant effect)
    • Network I/O-intensive applications
    • Read-only operations where data modification is not needed
  2. Limitations:

    • Small files may not show advantages
    • Requires hardware and operating system support
    • Cannot process or transform data

Zero-copy, by reducing unnecessary memory copies and fully utilizing DMA capabilities and hardware features, is a key technology in modern high-performance backend systems.