Memory Management in Python: Object Allocation and Reclamation Mechanisms
1. Description
Python's memory management mechanism mainly consists of object allocation and garbage collection. Developers do not need to manually manage memory, but understanding its principles helps avoid memory leaks and performance issues. This topic will delve into how Python allocates memory, the reference counting mechanism, how the garbage collector works, and how cyclic references are handled.
2. Object Allocation Mechanism
Step 1: Memory Pool (PyMalloc)
- To avoid frequent calls to the operating system's memory allocation functions (such as
malloc), Python introduces a memory pool mechanism. - Small objects (typically smaller than 256KB) are managed by the memory pool, while large objects are allocated directly using the system's
malloc. - The memory pool is divided into blocks of different sizes (e.g., 8, 16, 32 bytes, etc.), allowing fast allocation of small memory blocks via a free list.
Step 2: Object Initialization
- After memory is allocated, Python calls the type's
__new__method to create the object, and then initializes its attributes via the__init__method. - For example:
a = [1, 2, 3]first allocates memory for the list, then initializes the list's contents.
3. Reference Counting
Step 1: Basic Principle
- Each Python object internally maintains a reference count (
ob_refcntfield), which records the number of times it is referenced. - Scenarios where the reference count increases:
a = [1, 2] # List reference count = 1 b = a # Reference count +1, becomes 2 - Scenarios where the reference count decreases:
- A variable goes out of scope (e.g., after a function finishes execution).
- A variable is reassigned (e.g.,
a = None). - A container object is destroyed (e.g., a list is deleted).
Step 2: Operations on Reference Counts
- Increase count:
Py_INCREF(obj) - Decrease count:
Py_DECREF(obj); when the count reaches zero, the object's memory is immediately released.
4. Garbage Collection (Garbage Collector)
Step 1: Cyclic Reference Problem
- Reference counting cannot resolve cyclic references:
class Node: def __init__(self): self.next = None a = Node() b = Node() a.next = b # a references b b.next = a # b references a, creating a cyclic reference # Even after deleting a and b, the reference count remains 1, preventing automatic reclamation
Step 2: Generational Collection (Generational GC)
- Python uses a generational garbage collector (
gcmodule) to detect cyclic references. - Objects are divided into 3 generations (0, 1, 2), with new objects placed in generation 0.
- Collection process:
- Trigger Condition: When the number of objects in a generation exceeds a threshold, garbage collection is initiated.
- Mark Live Objects: Starting from root objects (global variables, variables on the stack, etc.), traverse all reachable objects.
- Clear Unreachable Objects: Reclaim objects involved in cyclic references that are unreachable.
- Generational Promotion: Surviving objects are moved to the next generation.
Step 3: GC Triggering and Tuning
- Manual trigger:
gc.collect(generation=2). - Adjust thresholds:
gc.set_threshold(threshold0, threshold1, threshold2). - Disable GC:
gc.disable()(use with caution, as it may cause memory leaks).
5. Comprehensive Example
import gc
class Data:
def __init__(self, name):
self.name = name
self.other = None
# Create a cyclic reference
d1 = Data("A")
d2 = Data("B")
d1.other = d2
d2.other = d1
# After deleting references, objects still have cyclic references
del d1, d2
# Manually trigger garbage collection
print(gc.collect()) # Outputs the number of objects reclaimed (e.g., 2)
6. Summary
- Python achieves immediate reclamation through reference counting, but requires generational GC to handle cyclic references.
- The memory pool improves allocation efficiency for small objects, and the garbage collector automatically manages the object lifecycle.
- In development, unnecessary global references or cyclic references should be avoided. Use the
weakrefmodule to break cyclic references when necessary.