Application of Bloom Filters in Chrome Safe Browsing
Application of Bloom Filters in Chrome Safe Browsing
I. Background and Requirements
The Chrome browser needs to protect users from malicious websites, but storing all malicious URLs globally on the local device is impractical. Bloom filters provide an efficient solution: storing a compact data structure locally to quickly determine whether a URL might be malicious.
II. Bloom Filter Configuration
- Bit Array Size: Chrome uses a bit array of approximately 20MB, capable of storing hundreds of millions of URL fingerprints.
- Number of Hash Functions: Employs 3-5 independent hash functions to balance false positive rates and performance.
- Update Mechanism: The Bloom filter data is updated from the Safe Browsing service every 30 minutes.
III. Workflow
User visits URL → URL hashing → Query Bloom Filter
↓
[Exists] → Possibly malicious → Server-side verification → Final decision
↓
[Not exists] → Safe → Directly allowed
IV. Implementation Details
-
URL Normalization
- Remove protocol headers (http/https)
- Convert to lowercase uniformly
- Handle special character encoding
- Example: https://example.com/path → example.com/path
-
Multi-layer Filtering Strategy
- First layer: Full URL Bloom filter
- Second layer: Domain-level Bloom filter
- Third layer: IP address Bloom filter
- Layered design improves detection coverage.
-
Hash Function Design
- Use fast hash functions like MurmurHash, CityHash
- Different hash functions generate independent fingerprints for the same URL
- Example: Compute 3 different hash values for "example.com"
V. False Positive Handling Mechanism
- Probabilistic Nature: URLs judged as "possibly existing" locally
- Server-side Verification: Send verification requests to Google Safe Browsing API
- Caching Strategy: Confirmed safe URLs are added to a local whitelist cache
- False Positive Statistics: Actual false positive rate is controlled below 0.1%
VI. Performance Optimization
- Batch Operations: Check all resource links on a page simultaneously
- Asynchronous Verification: Avoid blocking page loading
- Memory Mapping: Use memory-mapped files for the bit array to reduce memory usage
- SIMD Optimization: Use Single Instruction Multiple Data to accelerate bit operations
VII. Actual Results
- Detection latency: <1 ms (local filtering)
- Memory usage: Approximately 20MB
- Update frequency: Every 30 minutes
- Protection scope: Covers millions of malicious URLs
This implementation ensures user security while avoiding the storage and update pressure associated with storing all URL data locally.