Database Master-Slave Replication Principles and Practice

Database Master-Slave Replication Principles and Practice

Topic Description
Database master-slave replication is a common data redundancy and read-write separation technology. By synchronizing data changes from the primary database (Master) to one or more secondary databases (Slaves), it achieves data backup, load balancing, and high availability. Interviews often examine its core principles, synchronization processes, data consistency guarantees, and typical problem solutions.

I. Basic Architecture of Master-Slave Replication

Role Definition:
- Master: Receives all write operations and records data change logs (e.g., binary log, Binlog).
- Slave: Subscribes to the master's change logs and replays these operations asynchronously or semi-synchronously to maintain data consistency with the master.
Core Components:
- Binlog (Binary Log): Log files on the master that record data changes, supporting Statement (SQL statements), Row (row data changes), or Mixed (mixed mode) formats.
- I/O Thread: On the slave, connects to the master, pulls the Binlog, and writes it to the local relay log (Relay Log).
- SQL Thread: On the slave, reads the Relay Log, parses and executes the SQL or row change operations within.

II. Complete Process of Master-Slave Replication

Enable Binlog on Master:
- Configure the master's server-id and log_bin parameters to ensure each change operation (e.g., INSERT/UPDATE) is recorded in the Binlog.
- Example: In MySQL, enable via SET GLOBAL server_id=1 and log_bin=mysql-bin.

Configure Slave Connection to Master:

Use the CHANGE MASTER TO command to specify the master's IP, port, Binlog filename, and position (or GTID).

Example:

CHANGE MASTER TO  
MASTER_HOST='192.168.1.100',  
MASTER_USER='repl_user',  
MASTER_LOG_FILE='mysql-bin.000001',  
MASTER_LOG_POS=154;

Data Synchronization Process:
- Step 1: The slave's I/O thread initiates a connection to the master, requesting to pull the Binlog starting from a specified position.
- Step 2: The master's Binlog Dump thread sends Binlog content to the slave's I/O thread.
- Step 3: The I/O thread sequentially writes the received Binlog events into the slave's Relay Log.
- Step 4: The slave's SQL thread reads the Relay Log, parses events, and replays SQL operations (e.g., executing INSERT).
- Step 5: The slave periodically updates the current synchronized Binlog position information to ensure resumption after restart.

III. Data Consistency Guarantees in Master-Slave Replication

Asynchronous Replication (Default):
- The master returns success immediately after committing a transaction, without waiting for slave confirmation.
- Risk: Master failure may lead to data loss on slaves (weak consistency).
Semi-Synchronous Replication (Semi-Sync):
- After committing a transaction, the master waits until at least one slave receives and writes the event to its Relay Log before returning success.
- Configuration (MySQL):
```
INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';  
SET GLOBAL rpl_semi_sync_master_enabled=1;  
```
Fully Synchronous Replication (Group Replication):
- Based on the Paxos protocol, requires all nodes to confirm the transaction before commit, ensuring strong consistency (but with lower performance).

IV. Typical Issues and Solutions

Replication Lag:
- Causes: Single-threaded replay by the slave's SQL thread (before MySQL 5.6), network latency, high write pressure on the master.
- Optimizations:
  - Enable parallel replication (e.g., MySQL's slave_parallel_workers).
  - Use row-based Binlog format (Row mode) to reduce lock contention.
Data Conflicts:
- Scenario: Accidental writes on the slave leading to master-slave data inconsistency.
- Prevention: Set the slave to read_only=ON, prohibiting non-replication write operations.
Master-Slave Switchover:
- Manual Switchover:
  1. Stop writes on the master, ensure the slave catches up with the Binlog.
  2. Promote the slave to be the new master, and redirect other slaves to the new master.
- Tool Assistance: Use MHA (Master High Availability) or Orchestrator for automated switchover.

V. Practical Recommendations

Monitoring Metrics:
- Seconds_Behind_Master: Replication lag in seconds.
- Slave_IO_Running/Slave_SQL_Running: Replication thread status.
Disaster Recovery Drills: Regularly simulate master failures to test switchover procedures and data integrity.

Through the above steps, the core principles, synchronization processes, and common problem solutions of master-slave replication are systematically presented, providing theoretical support for deployment in actual production environments.