Detailed Explanation of Class Loading Mechanism in Java
The class loading mechanism is the process by which the Java Virtual Machine (JVM) loads class bytecode files (.class files) into memory, verifies, transforms, parses, and initializes the data, ultimately forming Java types that can be directly used by the JVM. This process is core to Java's "write once, run anywhere" capability and dynamic extensibility.
I. Timing of Class Loading
The JVM specification does not mandate when a class must be loaded but has strict rules for the initialization phase. There are precisely six situations where a class must be immediately "initialized" (with loading, verification, and preparation naturally needing to commence beforehand):
- When encountering the bytecode instructions
new,getstatic,putstatic, orinvokestatic(corresponding scenarios: instantiating an object with thenewkeyword, reading or setting a static field of a class, calling a static method of a class). - When making reflective calls to a class using methods from the
java.lang.reflectpackage. - When initializing a class, if its parent class has not been initialized, the parent class's initialization must be triggered first.
- When the JVM starts, the user needs to specify a main class to execute (the class containing the
main()method), and the JVM will initialize this main class first. - When using the dynamic language support added in JDK 7, if the final resolution result of a
java.lang.invoke.MethodHandleinstance is a method handle of typeREF_getStatic,REF_putStatic, orREF_invokeStatic, and the class corresponding to this method handle has not been initialized, its initialization must be triggered first. - When an interface defines default methods (
defaultmethods) added in JDK 8, if an implementation class of this interface is initialized, the interface must be initialized before it.
II. Process of Class Loading
The full lifecycle of class loading includes seven stages: Loading, Verification, Preparation, Resolution, Initialization, Using, and Unloading. The verification, preparation, and resolution stages are collectively called Linking. We focus on the first five core stages.
1. Loading
- Task: Find and load the binary byte stream of the class (.class file) into JVM memory.
- Process:
- Obtain the binary byte stream defining this class using its fully qualified name (e.g.,
java.lang.String). - Transform the static storage structure represented by this byte stream into the runtime data structure of the method area.
- Generate a
java.lang.Classobject representing this class in memory (heap area), which serves as the access point to the various data of this class in the method area.
- Obtain the binary byte stream defining this class using its fully qualified name (e.g.,
- Details: The means of obtaining the byte stream are very flexible; it can come from ZIP/JAR/WAR packages, networks, runtime generation (dynamic proxies), generation from other files (JSP), etc.
2. Verification
- Purpose: Ensure that the information contained in the byte stream of the Class file complies with all constraints of the "Java Virtual Machine Specification" and will not harm the security of the JVM itself.
- Main Steps:
- File Format Verification: Verify that the byte stream conforms to the Class file format specification (e.g., magic number, version number).
- Metadata Verification: Perform semantic analysis on the information described by the bytecode to ensure it conforms to Java language specifications (e.g., whether the class has a parent class, whether it inherits a class not allowed to be inherited).
- Bytecode Verification: Through data flow and control flow analysis, ensure the program semantics are legal and logical (e.g., ensuring jump instructions do not jump to bytecode instructions outside the method body).
- Symbolic Reference Verification: Occurs during the resolution stage, ensuring the resolution action can proceed normally (e.g., whether the class corresponding to the fully qualified name described by a string can be found).
3. Preparation
- Task: Allocate memory for class variables (static variables) and set their initial values.
- Key Points:
- Only class variables are allocated memory; instance variables are allocated in the Java heap along with the object during object instantiation.
- The initial value set is typically the zero value of the data type. For example:
public static int value = 123;After the preparation phase,valueis0, not123.
- Special Case: If the field attribute table of the class field contains a
ConstantValueattribute (i.e., modified byfinal static), then during preparation, the variablevalueis initialized to the value specified by theConstantValueattribute.public final static int value = 123;After the preparation phase,valueis123.
4. Resolution
- Task: The process of replacing symbolic references in the constant pool with direct references.
- Symbolic Reference: A set of symbols describing the referenced target; symbols can be any form of literal, independent of the memory layout implemented by the virtual machine.
- Direct Reference: Can be a direct pointer to the target, a relative offset, or a handle that can indirectly locate the target, related to the memory layout implemented by the virtual machine.
- Resolution Targets: Mainly resolves symbolic references for seven categories: classes or interfaces, fields, class methods, interface methods, method types, method handles, and call site qualifiers.
5. Initialization
- Task: Execute the class constructor
<clinit>()method. This is the final step of the class loading process. - What is the
<clinit>()method?- It is automatically generated by the compiler by collecting all assignment actions for class variables and statements in static statement blocks (
static{}blocks) in the class. The order of collection by the compiler is determined by the order in which statements appear in the source file. - It does not need to be explicitly defined.
- It is automatically generated by the compiler by collecting all assignment actions for class variables and statements in static statement blocks (
- Detailed Explanation of Initialization Steps:
- The JVM ensures that the parent class's
<clinit>()method has completed execution before the child class's<clinit>()method is executed. - Since the parent class's
<clinit>()method executes first, static blocks defined in the parent class take precedence over variable assignment operations in the child class. - The
<clinit>()method is not mandatory for a class or interface. If a class has no static blocks and no assignment operations for class variables, the compiler may not generate a<clinit>()method for it. - Interfaces also have a
<clinit>()method, but executing an interface's<clinit>()method does not require executing the parent interface's<clinit>()method first. A parent interface is initialized only when variables defined in it are used. - The JVM ensures that a class's
<clinit>()method is correctly locked and synchronized in a multi-threaded environment. This means that if multiple threads simultaneously attempt to initialize a class, only one thread can execute the class's<clinit>()method, and other threads must block and wait.
- The JVM ensures that the parent class's
III. Class Loaders
Class loaders are the code modules that implement the action of "obtaining the binary byte stream describing a class through its fully qualified name."
1. Parental Delegation Model
The Java virtual machine uses a three-layer class loader hierarchy:
- Bootstrap Class Loader: The topmost layer, implemented in C++, responsible for loading core class libraries in the
<JAVA_HOME>/libdirectory (e.g.,rt.jar). - Extension Class Loader: Implemented in Java, responsible for loading extension class libraries in the
<JAVA_HOME>/lib/extdirectory. - Application Class Loader: Implemented in Java, responsible for loading class libraries specified on the user classpath (ClassPath). Typically, this is the default class loader in a program.
Workflow (Parental Delegation):
- When a class loader receives a class loading request, it first does not try to load the class itself but delegates the request to its parent class loader.
- This is true for each level of class loader, so all loading requests should ultimately be passed to the topmost Bootstrap Class Loader.
- Only when the parent loader reports that it cannot complete the loading request (it did not find the required class within its search scope) does the child loader attempt to load the class itself.
Advantages:
- Security: Ensures type safety for Java core libraries. For example, if a user defines a custom
java.lang.Objectclass, parental delegation ensures the Bootstrap Class Loader loads theObjectclass from the core library, not the user-defined one, preventing core API tampering. - Prevents Duplicate Loading: Ensures the global uniqueness of classes.
2. Breaking the Parental Delegation Model
In certain scenarios, breaking this model is necessary, for example:
- SPI (Service Provider Interface) Mechanism: Such as JDBC. Core interfaces are in
rt.jarloaded by the Bootstrap Class Loader, but specific implementations are provided by vendors and located on the ClassPath. This requires the Bootstrap Class Loader to request the Application Class Loader to perform the loading, using the Thread Context ClassLoader for reverse delegation. - OSGi, JNDI, and other modular hot-deployment technologies.