Understanding and Application of Database Normal Forms
Problem Description
Database normalization is a core theory in relational database design, aimed at reducing data redundancy and avoiding data anomalies (such as insertion/update/deletion anomalies). Common normal forms include First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), and Boyce-Codd Normal Form (BCNF). Interviews often require explaining the definitions and differences between these normal forms, and analyzing how to optimize table structures through normalization.
1. Basic Goals of Normalization
- Minimize Data Redundancy: Avoid storing the same data repeatedly.
- Ensure Data Consistency: Avoid inconsistencies caused by updating only part of the data.
- Maintain Operational Safety: Prevent the destruction of data integrity during insertion or deletion operations.
Key Problem Example:
Assume there is a "Student Course Selection" table containing the following fields:
StudentID, Name, DepartmentName, DepartmentHead, CourseName, Grade.
If designed as a single table directly, the following issues arise:
- Insertion Anomaly: When a new department is established but hasn't enrolled any students, information about the department head cannot be added.
- Deletion Anomaly: After deleting all course selection records for a student, the student's basic information is lost.
- Update Redundancy: When modifying the head of a department, all related student records need to be updated.
2. First Normal Form (1NF)
Definition: Every field in the table must be atomic (indivisible), and each column must contain values of the same type.
Checking Steps:
- Ensure no repeating groups (e.g., "Course1, Course2" should be split into multiple rows).
- Each row has a unique identifier (such as a composite primary key).
Example Optimization:
If the original table combines course names and grades into a single "Course-Grade" string (e.g., "Math-90"), it violates 1NF. It should be split into separate fields: CourseName, Grade.
3. Second Normal Form (2NF)
Prerequisite: Already satisfies 1NF.
Definition: All non-key attributes must be fully functionally dependent on the entire primary key (eliminate partial dependencies).
Judgment Steps:
- Determine the primary key: For example,
(StudentID, CourseName)as a composite primary key. - Analyze dependencies:
Gradedepends on the entire primary key (specific student + specific course).NameandDepartmentNamedepend only onStudentID(irrelevant to the course), representing partial dependencies.
Solution:
- Split the table into:
StudentTable(StudentID, Name, DepartmentName, DepartmentHead)CourseSelectionTable(StudentID, CourseName, Grade)
4. Third Normal Form (3NF)
Prerequisite: Already satisfies 2NF.
Definition: Eliminate transitive dependencies (non-key attributes cannot depend on other non-key attributes).
Judgment Steps:
In the split StudentTable:
DepartmentHeaddepends onDepartmentName, andDepartmentNamedepends onStudentID, forming a transitive dependency.- If modifying the department head, multiple rows of data need to be updated.
Solution:
- Further split:
StudentTable(StudentID, Name, DepartmentName)DepartmentTable(DepartmentName, DepartmentHead)CourseSelectionTableremains unchanged.
5. Boyce-Codd Normal Form (BCNF)
Definition: Stricter than 3NF, requiring that all determinants (left-hand side) must contain a candidate key.
Common Scenario: Resolving partial dependencies of non-key attributes on key attributes.
Example: Assume a table Teaching(Course, Teacher, ReferenceBook), where each teacher teaches only one course, and each course has multiple reference books.
- Candidate keys:
(Course, Teacher, ReferenceBook)or(Teacher, ReferenceBook). - Problem:
Coursedepends only onTeacher, butTeacheris not a superkey.
Solution: Split intoTeachingAssignment(Teacher, Course)andCourseMaterials(Course, ReferenceBook).
6. Trade-offs in Normalization
- Advantages of Normalization: High data consistency, high update efficiency.
- Disadvantages: Queries may require multi-table joins, increasing complexity.
- Denormalization: In data warehouse or high-frequency query scenarios, allowing moderate redundancy to improve performance.
Summary:
Normal forms are the theoretical foundation of database design and should be applied flexibly based on actual business needs. In interviews, one must clearly articulate the problems solved by each normal form level and demonstrate the splitting process through examples.