Cache Design and Optimization in High-Concurrency Scenarios
Problem Description
In high-concurrency systems, caching is one of the core technologies for enhancing performance. Please elaborate on how to design a high-performance, highly available caching solution and address common issues such as cache penetration, cache breakdown, and cache avalanche.
Detailed Key Points
1. Basic Role and Design Principles of Caching
Question: Why is caching needed?
Answer:
- Reduce Database Pressure: Store frequently accessed data in memory to avoid direct database queries.
- Accelerate Data Reads: Memory access speed is several orders of magnitude faster than disk access.
- Design Principles:
- Cache only hot data (e.g., the 20% high-frequency data from the Pareto principle).
- Ensure consistency between cached and source data (e.g., via expiration times, update strategies).
2. Cache Penetration and Solutions
Question: A large number of requests query non-existent data, causing requests to bypass the cache and directly access the database.
Solution Steps:
- Cache Empty Values:
- Cache a short-lived empty value (e.g., set to expire in 5 minutes) for queries that return no data.
- Note: Empty values consume minimal memory; guard against malicious attacks exhausting space.
- Bloom Filter:
- Add a Bloom Filter layer before the cache, mapping keys of existing data to a bit array.
- Requests first pass through the Bloom Filter: if it returns "not present," reject directly; if it returns "may be present," then query the cache or database.
3. Cache Breakdown and Solutions
Question: When a hot key expires, a large number of concurrent requests simultaneously miss the cache, directly impacting the database.
Solution Steps:
- Mutex Lock:
- When cache expires, allow only one thread to query the database and rebuild the cache; other threads wait.
- Example code logic (pseudocode):
public String getData(String key) { String data = cache.get(key); if (data == null) { if (lock.tryLock()) { // Acquire distributed lock try { data = db.get(key); // Query database cache.set(key, data, expireTime); } finally { lock.unlock(); } } else { // Other threads wait and retry Thread.sleep(100); return getData(key); } } return data; }
- Logical Expiration Time:
- Store both the actual data and a logical expiration time in the cached value.
- If logical expiration is detected, spawn a separate thread to asynchronously update the cache while the current thread returns the stale data.
4. Cache Avalanche and Solutions
Question: A large number of cache keys expire simultaneously or the caching service crashes, causing the database to be overwhelmed by batch requests.
Solution Steps:
- Differentiated Expiration Times:
- Add random offsets to expiration times for similar keys (e.g., base time + random offset) to prevent simultaneous expiration.
- High Availability Caching:
- Use Redis clusters (e.g., master-slave replication, sharded clusters) for multi-node backup.
- Deploy local caches (e.g., Caffeine) as secondary caches to reduce reliance on central caching.
- Service Circuit Breaking and Degradation:
- When database pressure is too high, temporarily reject requests via circuit breaking, returning default values or error pages.
5. Cache Consistency Guarantees
Question: How to ensure data consistency between the cache and the database?
Solution Steps:
- Update Strategy Selection:
- Update Database First, Then Delete Cache (Recommended): Avoid long-term existence of dirty data during concurrent updates.
- If cache deletion fails, compensate via retry mechanisms or message queues.
- Delayed Double Deletion:
- Delete the cache both before and after updating the database, with the second deletion delayed (e.g., 1 second later) to ensure stale cache from read requests is cleared.
Summary
Cache optimization requires a combination of strategies tailored to business scenarios:
- Use Bloom Filters to solve penetration; use locks or asynchronous updates to handle breakdown; use random expiration and cluster fault tolerance to prevent avalanches.
- Scenarios with high consistency requirements need reliable update processes and monitoring of cache hit rates and system load.