Data Encryption and Key Management in Databases

Data Encryption and Key Management in Databases

Description
Data encryption is one of the core technologies for database security. By encrypting sensitive data (such as user ID numbers, bank card information, etc.), even if the data is illegally obtained, attackers cannot directly read the plaintext content. Key management involves lifecycle operations of encryption keys, such as generation, storage, rotation, and destruction, and is the foundation for ensuring the effectiveness of encryption. In practical applications, it is necessary to balance security and performance to avoid impacting query efficiency due to excessive encryption.

Step-by-Step Explanation of Knowledge Points

  1. Basic Principles of Encryption

    • Goal: Convert plaintext (e.g., "123456") into ciphertext (e.g., "aB3x9p") using an encryption algorithm and a key. Only those possessing the key can decrypt it.
    • Core Concepts:
      • Symmetric Encryption (e.g., AES algorithm): Uses the same key for encryption and decryption. It is fast and suitable for encrypting large volumes of data, but key distribution requires a secure channel.
      • Asymmetric Encryption (e.g., RSA algorithm): Uses a public key for encryption and a private key for decryption. It offers high security but is computationally slower, often used for key exchange or digital signatures.
    • Database Scenario Selection: Symmetric encryption (e.g., AES-256) is typically used to encrypt data, while asymmetric encryption is used to protect the transmission of symmetric keys.
  2. Database Encryption Layers and Implementation Methods

    • Application-Level Encryption:
      • Process: Data is encrypted by the application calling encryption libraries (e.g., OpenSSL) before being written to the database. The database only stores ciphertext.
      • Advantages: The database cannot access the plaintext; even database administrators cannot leak the data.
      • Disadvantages: Loses the ability to query encrypted fields (e.g., cannot search by last name), requiring decryption queries to be implemented at the application layer.
    • Database-Level Encryption:
      • Transparent Data Encryption (TDE):
        • Principle: The database engine automatically encrypts disk files (e.g., data files, log files), decrypting them in memory during read/write operations.
        • Examples: Oracle TDE, MySQL Enterprise Edition TDE.
        • Characteristics: Protects against physical file theft but does not defend against attacks like SQL injection since data in memory is plaintext.
      • Column-Level Encryption:
        • Principle: Encrypts specified columns (e.g., the credit_card column) using built-in database functions (e.g., MySQL's AES_ENCRYPT()).
        • Operation Example:
          -- Encrypted Insertion
          INSERT INTO users (name, card_cipher) 
          VALUES ('Alice', AES_ENCRYPT('1234-5678', 'secret_key'));
          -- Decryption Query
          SELECT name, AES_DECRYPT(card_cipher, 'secret_key') AS card_plain 
          FROM users;
          
        • Disadvantages: Hardcoding keys in SQL is prone to leakage, and fuzzy queries and indexes become ineffective.
  3. Core Practices of Key Management

    • Key Lifecycle:
      1. Generation: Use cryptographically secure random number generators (e.g., /dev/urandom), avoiding weak keys (e.g., simple strings).
      2. Storage:
        • Principle: Store keys separately from encrypted data; avoid writing them directly in code or configuration files.
        • Solutions:
          • Hardware Security Module (HSM): Dedicated hardware stores keys, supporting fast encryption/decryption without keys leaving the hardware.
          • Cloud Services: Such as AWS KMS, Alibaba Cloud KMS, where keys are accessed via APIs to avoid local storage.
          • File Isolation: If resources are limited, store key files in a restricted access area outside the database server.
      3. Rotation:
        • Reason: Long-term use of the same key increases the risk of leakage.
        • Steps:
          • Generate a new key, decrypt existing data with the old key, and re-encrypt it into new ciphertext.
          • Alternatively, use Key Layering: Encrypt the data key with the new key, marking the old key as invalid without directly deleting it.
      4. Destruction: Securely delete key records to ensure encrypted data cannot be recovered.
  4. Advanced Solutions for Balancing Security and Query Efficiency

    • Format-Preserving Encryption (FPE):
      • Ciphertext retains the plaintext format (e.g., encrypted numbers remain numbers), allowing partial range queries, but security is slightly lower than AES.
    • Homomorphic Encryption:
      • Allows direct operations on ciphertext (e.g., summation), with decrypted results matching those of plaintext operations. However, it currently has significant performance overhead and is not widely commercialized.
    • Index-Assisted Queries:
      • Create hash indexes for encrypted fields (e.g., SHA-256 hash for names), but they only support exact match queries.
  5. Practical Case: E-commerce Database Encryption Design

    • Requirement: Protect user phone numbers while supporting queries by phone number prefix.
    • Solution:
      1. Split the phone number into a prefix (e.g., "138") and the remaining part (e.g., "00001234").
      2. Store the prefix in plaintext for queries, and encrypt the remaining part using AES for storage.
      3. Dynamically obtain keys via KMS API, and decrypt and concatenate the full phone number at the application layer.
    • Trade-off: Sacrifices some security (prefix exposure) for query convenience.

Summary
Data encryption and key management require selecting appropriate layers and algorithms based on business scenarios, with a focus on ensuring key storage security and regular rotation. For sensitive data, prioritize application-level encryption and manage keys via KMS/HSM, while using techniques like data splitting and hash indexing to minimize the impact on query functionality.