Read-Write Splitting Architecture Design in Distributed Systems

Read-Write Splitting Architecture Design in Distributed Systems

Problem Description
Read-write splitting is a common database architecture optimization strategy. Its core idea is to distribute read and write operations to different server nodes. The primary goal is to improve the overall system throughput and availability by dispersing the load. You need to understand its underlying principles, applicable scenarios, key technical implementation details, and potential challenges.

Solution Process

Core Idea and Basic Principles
- Objective: Write operations (e.g., INSERT, UPDATE, DELETE) typically involve complex operations such as locking and transaction management, which have a significant performance impact and are the source of data consistency. Read operations (SELECT) can theoretically be executed concurrently and occur far more frequently than writes.
- Splitting: Therefore, we set up a master server (Master) dedicated to handling all write operations. Simultaneously, we set up one or more slave servers (Slaves) that synchronize data from the master via replication technology and are dedicated to handling read operations.
- Effect: This disperses read and write pressure across different servers, avoiding resource contention on a single machine. Write operations are concentrated on the master node, while read operations can be horizontally scaled across multiple slave nodes, thereby significantly improving the system's read performance and processing capacity.
Core Components and Data Synchronization Mechanism
This is the technical foundation for implementing read-write splitting.
- Master Database (Master):
  - Responsibility: Handles all write operations (Create, Update, Delete).
  - Key Behavior: Records data changes in its local binary log (Binlog).
- Slave Database (Slave):
  - Responsibility: Handles the vast majority of read operations (Query).
  - Key Behavior:
    1. I/O Thread: Connects to the master database, reads the master's Binlog, and writes it to the slave's local relay log (Relay Log).
    2. SQL Thread: Reads the relay log and executes the SQL statements recorded within it, thereby keeping the slave's data synchronized with the master's.
- This process is asynchronous: The master responds to the client immediately after completing a write operation, while data synchronization to the slave incurs millisecond-level latency. This is the root cause of "eventual consistency" in read-write splitting architectures.
How the Application Layer Implements Read-Write Splitting
The database itself provides master-slave replication, but how does the application know which database to connect to? There are two main approaches:
- Method 1: Direct Separation at the Application Layer
  - Description: Explicitly configure two data sources in the application code: one pointing to the master (for writes) and one or more pointing to slaves (for reads).
  - Implementation: During business logic execution, developers manually choose which data source to use. For example, write methods in the service layer use the master data source, while read methods use a slave data source.
  - Advantages: Simple and straightforward to implement.
  - Disadvantages: Strong code intrusion, coupled with business logic. If there are multiple slaves, load balancing logic also needs to be implemented at the application layer, making maintenance difficult.
- Method 2: Using Middleware Proxy (Database Gateway)
  - Description: Introduce a middleware layer (e.g., ShardingSphere, MyCat, ProxySQL) between the application and the database cluster.
  - Implementation:
    1. The application no longer connects directly to specific databases but uniformly connects to this middleware.
    2. The middleware intelligently routes incoming SQL statements based on their type.
      - If it's a write operation (INSERT/UPDATE/DELETE) or a statement explicitly opening a transaction (BEGIN/START TRANSACTION), route it to the master.
      - If it's a read operation (SELECT), route it to a slave according to a configured load balancing strategy (e.g., random, round-robin).
  - Advantages: Transparent to the application; business code doesn't need to worry about database topology. The middleware can uniformly manage data sources, load balancing, failover, etc., greatly simplifying application development.
  - Disadvantages: Introduces a new component, increasing system architecture complexity. The middleware itself can become a performance bottleneck or a single point of failure (requires high availability setup).
Key Challenges and Mitigation Strategies
Read-write splitting is not a silver bullet; it introduces new problems.
- Challenge 1: Data Replication Lag (Master-Slave Synchronization Delay)
  - Problem: Due to asynchronous master-slave synchronization, a user writing data to the master and immediately querying a slave might not see the newly written data, causing "read-your-writes" inconsistency.
  - Mitigation Strategies:
    1. Force Read from Master: For scenarios requiring strong consistency reads (e.g., querying user balance, order payment status), force the query request to be sent to the master. This can be achieved by marking it in the code or configuring specific SQL hints in the middleware.
    2. Semi-Synchronous Replication: The master waits until at least one slave has received and written the transaction to its relay log before returning a response to the client. This reduces the risk of data loss but increases write latency.
    3. Lag Monitoring and Routing: The middleware monitors the replication lag of slaves and only routes read requests to slaves whose lag is within an acceptable threshold.
- Challenge 2: Write Pressure Bottleneck on Master
  - Problem: While read capacity is scaled, all write operations are still concentrated on a single master. When write operations are very frequent, the master becomes a bottleneck.
  - Mitigation Strategy: Read-write splitting primarily addresses scenarios with many reads and few writes. If write pressure is also high, more advanced solutions need to be considered, such as sharding (database/table partitioning), to distribute data across multiple masters.
- Challenge 3: High Availability and Failover
  - Problem: If the master fails, the system cannot process writes.
  - Mitigation Strategy: A master-slave failover mechanism needs to be deployed.
    1. Monitoring: Use tools like Keepalived, MHA to monitor the master's health status.
    2. Failover: When the master fails, automatically or manually promote a slave to become the new master.
    3. Service Discovery: Notify the application layer or middleware of the new master's address. This process must ensure data consistency and avoid split-brain scenarios.

Summary
Read-write splitting is an effective architecture for improving system read performance and scalability by distributing database read and write operations to different nodes. Its core lies in master-slave replication and read-write routing. During implementation, special attention must be paid to the consistency issues arising from data replication lag, and strategies like "force read from master" should be adopted to address them. It is suitable for business scenarios with many reads and few writes, where data real-time requirements are not extremely stringent, and is one of the foundational technologies for building high-performance, highly available data layers.