Descriptors and Attribute Access Control in Python
Descriptors are a powerful feature in Python that allow you to customize what happens when accessing an attribute. Essentially, a descriptor is a class that implements a specific protocol (i.e., defines at least one of the __get__, __set__, or __delete__ methods). This protocol can override the default attribute access behavior.
1. Why are descriptors needed?
Imagine a simple Person class with an age attribute. Logically, age should not be a negative number.
class Person:
def __init__(self, name, age):
self.name = name
self.age = age # If -5 is passed here, it is logically wrong, but the code will not report an error
p = Person("Alice", -5)
print(p.age) # Outputs -5
Using ordinary assignment and access, we cannot automatically validate the value of age. Descriptors are designed to solve such problems, allowing you to bind attribute access (get, set, delete) to specific methods, thereby inserting custom logic.
2. Detailed Explanation of the Descriptor Protocol
A class becomes a descriptor if it implements one or more of the following methods:
__get__(self, obj, type=None) -> object: Called when retrieving the descriptor attribute from an instance (obj) or class (type).__set__(self, obj, value) -> None: Called when setting the descriptor attribute on an instance (obj).__delete__(self, obj) -> None: Called when deleting the descriptor attribute from an instance (obj).
Based on the methods implemented, descriptors are divided into two categories:
- Data Descriptor: Implements the
__set__or__delete__method. It has the highest priority. - Non-Data Descriptor: Implements only the
__get__method. It has lower priority.
3. Step by Step: Building an Age Validation Descriptor
Let's build a descriptor step by step to ensure age is a non-negative number.
Step 1: Create the Descriptor Class
We create a class called NonNegative. It will be used as a data descriptor, so we need to implement __get__ and __set__.
class NonNegative:
def __init__(self):
# We temporarily use a simple instance variable to store the value
self.value = 0
def __get__(self, obj, objtype=None):
# When accessing the attribute, return the stored value
return self.value
def __set__(self, obj, value):
# When setting the attribute, validate first
if value < 0:
raise ValueError("Value cannot be negative")
# If validation passes, store the value
self.value = value
Step 2: Use the Descriptor in the Main Class
Now, we use this descriptor in the Person class. The usage is simple: define a class attribute as an instance of the descriptor class.
class Person:
# age is now a descriptor instance
age = NonNegative()
def __init__(self, name, age):
self.name = name
self.age = age # This assignment triggers the descriptor's __set__ method
# Test
p1 = Person("Bob", 30)
print(p1.age) # Outputs 30. This triggers __get__
try:
p2 = Person("Charlie", -5) # This will trigger __set__ and raise a ValueError
except ValueError as e:
print(e) # Outputs: Value cannot be negative
Step 3: Solving the Problem of Shared Values Across Multiple Instances
The code above has a serious flaw! All Person instances share the same age value. This is because the descriptor instance age is a class attribute of Person. When we execute p1.age = 30 and p2.age = 25, we are modifying the same self.value of the single NonNegative instance.
To solve this, we need the descriptor to store different values for each instance of the class it belongs to. Typically, a dictionary is used, keyed by the instance obj.
class NonNegative:
def __init__(self):
# Use a dictionary to store data per instance
self.data = {}
def __get__(self, obj, objtype=None):
# obj is the instance calling the descriptor (e.g., p1). If accessed via the class (e.g., Person.age), obj is None
if obj is None:
# When accessed via the class, usually return the descriptor itself
return self
# Retrieve the value corresponding to this instance from the dictionary
return self.data.get(id(obj), 0) # If not found, return default value 0
def __set__(self, obj, value):
if value < 0:
raise ValueError("Value cannot be negative")
# Use the instance's memory address id(obj) as the key to store the value in the dictionary
self.data[id(obj)] = value
def __delete__(self, obj):
# When deleting the attribute, remove the instance's data from the dictionary
if id(obj) in self.data:
del self.data[id(obj)]
Now test again:
p1 = Person("Bob", 30)
p2 = Person("Charlie", 25)
print(p1.age) # Outputs 30
print(p2.age) # Outputs 25
# Success! Each instance has its own independent value.
Step 4: Optimizing Memory with Weak References (Advanced)
The dictionary solution above has a potential issue: it holds a strong reference to the instance obj (via id(obj)). This means even if a Person instance is destroyed, its id and value remain in the dictionary, preventing memory from being freed.
To solve this, Python provides the weakref module, which can create weak references that do not prevent garbage collection.
import weakref
class NonNegative:
def __init__(self):
# Use a weak reference dictionary
self.data = weakref.WeakKeyDictionary()
def __get__(self, obj, objtype=None):
if obj is None:
return self
return self.data.get(obj, 0)
def __set__(self, obj, value):
if value < 0:
raise ValueError("Value cannot be negative")
self.data[obj] = value # Use obj directly as the key, WeakKeyDictionary handles it automatically
# __delete__ is no longer strictly necessary because WeakKeyDictionary automatically cleans up entries when the instance is garbage collected.
The keys of WeakKeyDictionary are weak references to objects. When an instance obj is garbage collected, its entry in the dictionary is automatically removed. This is a more professional and recommended approach when building descriptors.
4. Priority of Attribute Lookup
The key to understanding descriptors is knowing Python's attribute lookup order. When you access an attribute instance.attr on an instance, the interpreter searches in the following order:
- Data Descriptor: Check if
attris a data descriptor (implements__set__) intype(instance)(the instance's class) or its parent classes. If yes, prioritize calling its__get__method. - Instance Attribute: Check if there is a key named
attrininstance.__dict__. - Non-Data Descriptor: Check if
attris a non-data descriptor (only implements__get__) intype(instance)or its parent classes. If yes, call its__get__method. - Class Attribute: Check if there is a key named
attrintype(instance).__dict__. - Search Parent Classes: Repeat the above process in parent classes according to the MRO.
- If nothing is found,
AttributeErroris raised.
This order can be simplified as: Data Descriptor > Instance Attribute > Non-Data Descriptor/Class Attribute.
5. Practical Applications of Descriptors
Descriptors are widely used in Python. Many familiar features are built on descriptors:
- Methods: Functions defined in a class are non-data descriptors. When you call
instance.method(), the__get__method is triggered, returning a "bound method" tied to the instance. @propertyDecorator:propertyitself is a class that implements the descriptor protocol.@propertyallows you to easily create getter, setter, and deleter methods for an attribute, all implemented via descriptors.@classmethodand@staticmethod: These decorators are also implemented via descriptors, changing the behavior of how methods are called.
Summary
Descriptors are one of Python's advanced features, providing a powerful mechanism for attribute access control. By implementing the __get__, __set__, and __delete__ methods, you can intercept the getting, setting, and deleting of attributes, thereby inserting custom logic such as data validation, type checking, lazy calculation, and logging. Understanding the priority of data descriptors and non-data descriptors in the attribute lookup chain is key to mastering descriptors. Although writing descriptors directly is relatively rare, understanding them helps you gain a deeper insight into how Python itself works.