Descriptors in Python
Descriptors are an advanced yet crucial concept in Python, allowing you to customize the operations that occur during attribute access. Understanding descriptors is key to mastering the deeper mechanisms of Python's object-oriented programming.
1. What is a Descriptor?
Simply put, a descriptor is a class attribute that "binds behavior." It is not a standalone function but a class that implements a specific protocol (i.e., having at least one of the __get__, __set__, or __delete__ methods).
When a class attribute is defined as a descriptor instance, access to that attribute (reading, assigning, deleting) no longer directly manipulates the instance's attribute dictionary. Instead, it triggers the corresponding method defined in the descriptor class.
2. The Descriptor Protocol
The descriptor protocol consists of three special methods. A class is considered a descriptor if it implements any of the following methods:
__get__(self, obj, type=None) -> object: Called when accessing the descriptor attribute.obj: The instance object accessing the attribute. If the attribute is accessed via the class (e.g.,MyClass.descriptor_attr),objisNone.type: The class to which the instance belongs.
__set__(self, obj, value) -> None: Called when assigning a value to the descriptor attribute.obj: The instance object.value: The value to be assigned.
__delete__(self, obj) -> None: Called when deleting the descriptor attribute.obj: The instance object.
Based on the methods implemented, descriptors can be categorized into two types:
- Data Descriptor: A descriptor that implements the
__set__or__delete__method. - Non-data Descriptor: A descriptor that only implements the
__get__method.
This distinction is very important because it affects Python's attribute lookup priority.
3. Attribute Lookup Priority
When you access an attribute through an instance (e.g., instance.attr), Python looks it up in the following order:
- Data Descriptor: Look for a data descriptor named
attrin the class and its parent classes. If found, call its__get__method. - Instance Attribute: Look for a key named
attrin the instance's__dict__dictionary. - Non-data Descriptor: Look for a non-data descriptor named
attrin the class and its parent classes. If found, call its__get__method. - Class Attribute: Look for a key named
attrin the class's__dict__dictionary. - Parent Class Attribute: Look in the parent classes according to the Method Resolution Order (MRO).
- If none of the above is found, raise an
AttributeError.
Key Point: Data descriptors have higher priority than instance attributes! This means that even if there is an attribute with the same name in the instance's __dict__, Python will still prioritize the data descriptor.
4. Step by Step: From Simple Example to Practical Application
Step 1: Create a Non-data Descriptor
First, let's create the simplest descriptor, which only implements the __get__ method, making it a non-data descriptor.
class SimpleDescriptor:
"""A simple non-data descriptor that logs access count"""
def __init__(self):
self._value = "default value"
def __get__(self, obj, objtype=None):
print(f"Descriptor's __get__ called: obj={obj}, objtype={objtype}")
return self._value
class MyClass:
attr = SimpleDescriptor() # Class attribute is a descriptor instance
# Access via class
print("1. Access via class:")
print(MyClass.attr) # Output: Descriptor's __get__ called: obj=None, objtype=<class '__main__.MyClass'>
# default value
# Access via instance
print("\n2. Access via instance:")
instance = MyClass()
print(instance.attr) # Output: Descriptor's __get__ called: obj=<__main__.MyClass object at ...>, objtype=<class '__main__.MyClass'>
# default value
# Attempt to assign to instance attribute (does not call descriptor's __set__, as we didn't define it)
print("\n3. Assigning to instance attribute:")
instance.attr = "instance attribute value" # This creates an attribute named 'attr' in the instance's __dict__
print(instance.attr) # Output: instance attribute value
print(instance.__dict__) # Output: {'attr': 'instance attribute value'}
print(MyClass.attr) # Output: Descriptor's __get__ called: obj=None, objtype=<class '__main__.MyClass'>
# default value
Explanation:
- When accessing
MyClass.attrvia the class, theobjparameter in__get__isNone. - When accessing
instance.attrvia the instance, theobjparameter in__get__is that instance. - When we execute
instance.attr = "...", sinceSimpleDescriptorhas no__set__method (it's a non-data descriptor), Python creates (or overwrites) a regular attribute namedattrin the instance's__dict__. Subsequent accesses toinstance.attr, according to the priority rules, will find the instance attribute first and will no longer trigger the descriptor's__get__.
Step 2: Create a Data Descriptor
Now, let's create a complete data descriptor to manage an attribute and perform type checking on assignment.
class TypedDescriptor:
"""A data descriptor that performs type checking"""
def __init__(self, name, expected_type):
self.name = name # Attribute name
self.expected_type = expected_type # Expected type
self.private_name = f"_{name}" # Private attribute name for storing value in the instance
def __get__(self, obj, objtype=None):
if obj is None:
# When accessed via class, return the descriptor itself
return self
# Retrieve the stored value from the instance's __dict__
return getattr(obj, self.private_name)
def __set__(self, obj, value):
if not isinstance(value, self.expected_type):
raise TypeError(f"Expected type {self.expected_type}, but got {type(value)}")
# Store the value in the instance's __dict__ using a private name to avoid conflicts
setattr(obj, self.private_name, value)
def __delete__(self, obj):
# Delete the value stored in the instance
delattr(obj, self.private_name)
class Person:
name = TypedDescriptor("name", str) # Class attribute, a descriptor instance
age = TypedDescriptor("age", int) # Another descriptor instance
def __init__(self, name, age):
self.name = name # This calls TypedDescriptor.__set__
self.age = age # This calls TypedDescriptor.__set__
# Normal usage
print("1. Creating object normally:")
person = Person("Alice", 30)
print(person.name) # Output: Alice (calls __get__)
print(person.age) # Output: 30 (calls __get__)
# Type checking works
print("\n2. Testing type error:")
try:
person.age = "thirty" # Not an integer, raises TypeError
except TypeError as e:
print(e) # Output: Expected type <class 'int'>, but got <class 'str'>
# Testing data descriptor priority
print("\n3. Testing data descriptor priority:")
person.__dict__["age"] = "sneakily set string" # Directly manipulate instance dictionary
print(person.age) # Output: 30 (Still 30!)
Explanation:
- Now
TypedDescriptoris a data descriptor (implements__set__). - In
__set__, we perform type checking. - We use a "private" name (like
_age) to store the actual data in the instance's__dict__to avoid infinite recursion caused by having the same name as the descriptor. - The most crucial part is the final test: Even though we directly wrote an
ageattribute into the instance's__dict__, when we accessperson.age, Python still prioritizes and calls the data descriptor's__get__method, returning the correct value30stored in_age. This proves that data descriptors have higher priority than instance attributes.
Step 3: Practical Application Scenarios
Descriptors are widely used in Python. Here are some classic examples:
- Properties (
@property): Python's built-in@propertydecorator essentially creates a data descriptor.@propertyis an elegant, concise syntactic sugar for descriptors. - Methods (
methods): Ordinary functions in a class are also non-data descriptors. When you callinstance.method(), Python binds the function to the instance via the descriptor protocol, creating a bound method. - ORM (Object-Relational Mapping) Frameworks: For example, Django's model fields (
models.CharField,models.IntegerField) are descriptors. They are responsible for converting and validating between Python objects and database column values. - Lazy Evaluation: Descriptors can be used to implement lazy properties, which compute their value only on first access and then cache the result.
Summary
- A Descriptor is a class that implements the
__get__,__set__, or__delete__methods. - Data Descriptors (with
__set__/__delete__) have higher priority than instance attributes; Non-data Descriptors (only__get__) have lower priority than instance attributes. - The core purpose of descriptors is to encapsulate the access logic of a class attribute into a separate class, enabling reuse, validation, lazy computation, and other advanced functionalities.
- Understanding the descriptor protocol and attribute lookup order is key to mastering descriptors.