Principles of Database Master-Slave Replication and Data Synchronization Mechanisms

Principles of Database Master-Slave Replication and Data Synchronization Mechanisms

Problem Description
Database master-slave replication is a common data redundancy and high-availability solution. Its core concept involves synchronizing data changes from the primary database to one or more replica databases, enabling read-write splitting, load balancing, and fault recovery. Please explain in detail the core principles, synchronization process, data consistency guarantee mechanisms, and solutions to common issues (such as replication lag and data conflicts) of master-slave replication.

Solution Explanation

1. Basic Architecture of Master-Slave Replication

Objective: Understand the role division in master-slave replication.

Primary Database: Receives all write operations (INSERT/UPDATE/DELETE) and records data changes in the binary log.
Replica Database: Connects to the primary, pulls the primary's binary log, and replays these changes locally to keep data consistent with the primary.
Core Components:
- Primary's Binlog: Log files recording all data changes.
- Replica's I/O Thread: Responsible for pulling the Binlog from the primary and saving it as the local relay log.
- Replica's SQL Thread: Reads the relay log, parses, and executes the SQL statements.

Why is a Relay Log Needed?
To avoid performance bottlenecks caused by the replica directly reading and writing the Binlog, while allowing the SQL thread to asynchronously handle data replay.

2. Core Replication Process (Using MySQL as an Example)

Step 1: Primary Records Changes

When the primary executes a write operation, it writes the change as an event into the Binlog (supporting multiple formats: Statement-Based, Row-Based, Mixed).
Key Points:
- Row-Based Format: Records the values of each row before and after the change, avoiding inconsistencies due to non-deterministic functions (e.g., NOW()).
- Binlog Commit Order: Ensures operation order through transaction IDs.

Step 2: Replica Pulls Logs

The replica's I/O thread connects to the primary using an authorized account and requests incremental Binlog content.
The primary sends Binlog events to the replica via a Binlog Dump Thread.
The replica writes the received logs to the local Relay Log.

Step 3: Replica Replays Logs

The SQL thread reads the relay log, parses the events, and sequentially executes the same SQL or row changes on the replica.
Key Mechanisms:
- Single-threaded Replay: In early versions, the SQL thread executed single-threaded, potentially causing lag (later versions support multi-threaded parallel execution).
- Position Recording: The replica records the processed Binlog position (Relay_Master_Log_File and Exec_Master_Log_Pos) for resumable synchronization.

3. Data Consistency Guarantee Mechanisms

Problem: Replication lag or network failures may lead to data inconsistency.
Solutions:

Semi-Synchronous Replication
- The primary must wait for acknowledgment from at least one replica that it has received the Binlog before committing the transaction.
- Balances performance and consistency: avoids blocking from full synchronization while being more reliable than asynchronous replication.
GTID (Global Transaction Identifier)
- Assigns a unique ID (Primary UUID + Transaction Sequence Number) to each transaction. The replica uses the GTID to locate the synchronization position.
- Advantages:
  - Avoids manual configuration errors in positions;
  - Supports automatic failover switching (e.g., continuing replication from the new primary after the original primary fails).
Parallel Replication
- The replica replays logs with multiple threads, executing transaction groups in parallel (WRITESET-based parallel replication can reduce conflicts).

4. Common Issues and Optimization Strategies

Issue 1: Replication Lag

Causes:
- Slow single-threaded replay;
- High write concurrency on the primary, making it difficult for the replica to catch up;
- Insufficient network bandwidth.
Solutions:
- Enable multi-threaded parallel replication (set slave_parallel_workers);
- Use row-format Binlog to reduce lock contention;
- Limit large transactions on the primary (e.g., split bulk operations into smaller transactions).

Issue 2: Data Conflicts

Scenarios: Accidental writes to the replica, or replication failure due to different primary/replica configurations.
Solutions:
- Set read_only=ON to prohibit write operations on the replica;
- Use consistency checking tools (e.g., pt-table-checksum) to periodically detect differences.

5. Extension: Application Scenarios of Master-Slave Replication

Read-Write Splitting: Send write operations to the primary and read operations to replicas to improve system throughput.
Backup and Disaster Recovery: Use replicas as backup nodes for quick failover if the primary fails.
Data Analysis: Run heavy analytical queries on replicas to avoid impacting primary performance.

Notes:

The business must tolerate brief inconsistencies (e.g., data just written to the primary may not be immediately visible on replicas);
Regularly monitor replication status (check the Seconds_Behind_Master field in SHOW SLAVE STATUS).

Summary
The core of master-slave replication is log synchronization + data replay. Through Binlog, relay logs, and multi-threaded collaboration, data flow is ensured. Understanding its process and consistency mechanisms helps in designing highly available database architectures in practical scenarios.