Detailed Explanation of Garbage Collection Mechanism and Algorithms in Java

Detailed Explanation of Garbage Collection Mechanism and Algorithms in Java

Knowledge Point Description:
Java Garbage Collection (GC) is the core mechanism of JVM's automatic memory management, responsible for automatically reclaiming memory space occupied by objects that are no longer in use. Developers do not need to manually release memory, which avoids issues such as memory leaks and dangling pointers. This knowledge point will delve into the working principles of GC, core algorithms, and related concepts.

1. Why is Garbage Collection Needed?

  1. Drawbacks of Manual Memory Management: In C/C++, developers need to manually call malloc/free or new/delete to manage memory, which easily leads to the following problems:
    • Memory Leak: Forgetting to release memory, leading to memory exhaustion.
    • Dangling Pointer: Accessing freed memory by mistake after freeing memory without nullifying the pointer.
  2. Java's Solution: The JVM has a built-in garbage collector that automatically identifies and reclaims unused objects, reducing programming complexity.

2. How to Determine if an Object is Garbage?
The prerequisite for garbage collection is accurately judging whether an object is "alive." There are two commonly used methods:

  1. Reference Counting:

    • Principle: Each object maintains a reference counter. The counter increments by 1 when referenced and decrements by 1 when the reference becomes invalid. The object is considered garbage when the counter is 0.
    • Drawback: Cannot solve circular reference problems (e.g., object A references B, B references A, but when there are no external references, the counters are still not 0).
    • Java does not use this method.
  2. Reachability Analysis:

    • Principle: Starting from a set of root objects called "GC Roots," traverse the reference chains. If there is no path connecting an object to any GC Root, it is determined to be garbage.
    • GC Roots include:
      • Objects referenced by the virtual machine stack (local variable table in stack frames).
      • Objects referenced by static properties in the method area.
      • Objects referenced by constants in the method area.
      • Objects referenced by JNI (i.e., Native methods) in the native method stack.

3. Detailed Explanation of Garbage Collection Algorithms

  1. Mark-Sweep:

    • Steps:
      1. Marking Phase: Starting from GC Roots, traverse and mark all reachable objects.
      2. Sweeping Phase: Reclaim the memory occupied by unmarked objects.
    • Drawbacks:
      • Efficiency Issue: The marking and sweeping processes are not highly efficient.
      • Space Issue: Generates memory fragmentation, making it impossible to allocate large objects.
  2. Copying:

    • Principle: Divide memory into two halves (From space and To space), using only one half at a time. During garbage collection, copy live objects to the other half of memory, then clear the current memory.
    • Advantage: Avoids memory fragmentation.
    • Disadvantage: Memory utilization is only 50%.
    • Application Scenario: The Survivor area in the Java Young Generation.
  3. Mark-Compact:

    • Steps:
      1. Marking Phase: Same as Mark-Sweep, marking all reachable objects.
      2. Compaction Phase: Move live objects towards one end of memory, then clean up the memory beyond the boundary.
    • Advantage: Avoids fragmentation, suitable for the Old Generation.
    • Disadvantage: Higher cost of moving objects.
  4. Generational Collection (Core strategy in practical applications):

    • Core Idea: Divide the heap memory into the Young Generation and the Old Generation based on object life cycles.
    • Young Generation:
      • Characteristics: Low object survival rate, frequent Minor GC occurrences.
      • Partitions: Eden space, Survivor0 (S0), Survivor1 (S1).
      • Process:
        1. New objects are first allocated in the Eden space.
        2. When the Eden space is full, a Minor GC is triggered, copying live objects to S0.
        3. In the next Minor GC, live objects from Eden and S0 are copied to S1 (clearing Eden and S0).
        4. Each time an object survives a Minor GC, its age increases by 1. When the age reaches a threshold (default 15), it is promoted to the Old Generation.
    • Old Generation:
      • Characteristics: High object survival rate, uses Mark-Compact or Mark-Sweep algorithms.
      • Trigger Condition: When the Old Generation space is insufficient, a Full GC (reclaiming the entire heap) is triggered.

4. Common Garbage Collectors

  1. Serial Collector: A single-threaded collector suitable for client programs.
  2. Parallel Scavenge Collector: Multi-threaded parallel collection, focuses on throughput.
  3. CMS (Concurrent Mark Sweep): Aims for the shortest garbage collection pause times, uses the Mark-Sweep algorithm.
  4. G1 (Garbage-First): Divides the heap into multiple Regions, offers predictable pause times, suitable for large heap memory.
  5. ZGC: Introduced in JDK 11, supports TB-level heap memory, with pause times not exceeding 10ms.

Summary:
The Java garbage collection mechanism significantly improves development efficiency through automatic memory management. Understanding generational collection, reachability analysis, and the pros and cons of different algorithms helps optimize application performance and troubleshoot memory issues (such as OOM). In practical development, heap size can be adjusted via JVM parameters (e.g., -Xms, -Xmx), or suitable garbage collectors can be selected.