Descriptors in Python

Descriptors in Python

Descriptors are an advanced yet crucial concept in Python, allowing you to customize the operations that occur during attribute access. Understanding descriptors is key to mastering the deeper mechanisms of Python's object-oriented programming.

1. What is a Descriptor?

Simply put, a descriptor is a class attribute that "binds behavior." It is not a standalone function but a class that implements a specific protocol (i.e., having at least one of the __get__, __set__, or __delete__ methods).

When a class attribute is defined as a descriptor instance, access to that attribute (reading, assigning, deleting) no longer directly manipulates the instance's attribute dictionary. Instead, it triggers the corresponding method defined in the descriptor class.

2. The Descriptor Protocol

The descriptor protocol consists of three special methods. A class is considered a descriptor if it implements any of the following methods:

  • __get__(self, obj, type=None) -> object: Called when accessing the descriptor attribute.
    • obj: The instance object accessing the attribute. If the attribute is accessed via the class (e.g., MyClass.descriptor_attr), obj is None.
    • type: The class to which the instance belongs.
  • __set__(self, obj, value) -> None: Called when assigning a value to the descriptor attribute.
    • obj: The instance object.
    • value: The value to be assigned.
  • __delete__(self, obj) -> None: Called when deleting the descriptor attribute.
    • obj: The instance object.

Based on the methods implemented, descriptors can be categorized into two types:

  • Data Descriptor: A descriptor that implements the __set__ or __delete__ method.
  • Non-data Descriptor: A descriptor that only implements the __get__ method.

This distinction is very important because it affects Python's attribute lookup priority.

3. Attribute Lookup Priority

When you access an attribute through an instance (e.g., instance.attr), Python looks it up in the following order:

  1. Data Descriptor: Look for a data descriptor named attr in the class and its parent classes. If found, call its __get__ method.
  2. Instance Attribute: Look for a key named attr in the instance's __dict__ dictionary.
  3. Non-data Descriptor: Look for a non-data descriptor named attr in the class and its parent classes. If found, call its __get__ method.
  4. Class Attribute: Look for a key named attr in the class's __dict__ dictionary.
  5. Parent Class Attribute: Look in the parent classes according to the Method Resolution Order (MRO).
  6. If none of the above is found, raise an AttributeError.

Key Point: Data descriptors have higher priority than instance attributes! This means that even if there is an attribute with the same name in the instance's __dict__, Python will still prioritize the data descriptor.

4. Step by Step: From Simple Example to Practical Application

Step 1: Create a Non-data Descriptor

First, let's create the simplest descriptor, which only implements the __get__ method, making it a non-data descriptor.

class SimpleDescriptor:
    """A simple non-data descriptor that logs access count"""
    def __init__(self):
        self._value = "default value"

    def __get__(self, obj, objtype=None):
        print(f"Descriptor's __get__ called: obj={obj}, objtype={objtype}")
        return self._value

class MyClass:
    attr = SimpleDescriptor()  # Class attribute is a descriptor instance

# Access via class
print("1. Access via class:")
print(MyClass.attr)  # Output: Descriptor's __get__ called: obj=None, objtype=<class '__main__.MyClass'>
                     #       default value

# Access via instance
print("\n2. Access via instance:")
instance = MyClass()
print(instance.attr) # Output: Descriptor's __get__ called: obj=<__main__.MyClass object at ...>, objtype=<class '__main__.MyClass'>
                     #       default value

# Attempt to assign to instance attribute (does not call descriptor's __set__, as we didn't define it)
print("\n3. Assigning to instance attribute:")
instance.attr = "instance attribute value" # This creates an attribute named 'attr' in the instance's __dict__
print(instance.attr)         # Output: instance attribute value
print(instance.__dict__)     # Output: {'attr': 'instance attribute value'}
print(MyClass.attr)          # Output: Descriptor's __get__ called: obj=None, objtype=<class '__main__.MyClass'>
                             #       default value

Explanation:

  • When accessing MyClass.attr via the class, the obj parameter in __get__ is None.
  • When accessing instance.attr via the instance, the obj parameter in __get__ is that instance.
  • When we execute instance.attr = "...", since SimpleDescriptor has no __set__ method (it's a non-data descriptor), Python creates (or overwrites) a regular attribute named attr in the instance's __dict__. Subsequent accesses to instance.attr, according to the priority rules, will find the instance attribute first and will no longer trigger the descriptor's __get__.

Step 2: Create a Data Descriptor

Now, let's create a complete data descriptor to manage an attribute and perform type checking on assignment.

class TypedDescriptor:
    """A data descriptor that performs type checking"""
    def __init__(self, name, expected_type):
        self.name = name           # Attribute name
        self.expected_type = expected_type # Expected type
        self.private_name = f"_{name}"     # Private attribute name for storing value in the instance

    def __get__(self, obj, objtype=None):
        if obj is None:
            # When accessed via class, return the descriptor itself
            return self
        # Retrieve the stored value from the instance's __dict__
        return getattr(obj, self.private_name)

    def __set__(self, obj, value):
        if not isinstance(value, self.expected_type):
            raise TypeError(f"Expected type {self.expected_type}, but got {type(value)}")
        # Store the value in the instance's __dict__ using a private name to avoid conflicts
        setattr(obj, self.private_name, value)

    def __delete__(self, obj):
        # Delete the value stored in the instance
        delattr(obj, self.private_name)

class Person:
    name = TypedDescriptor("name", str)   # Class attribute, a descriptor instance
    age = TypedDescriptor("age", int)     # Another descriptor instance

    def __init__(self, name, age):
        self.name = name  # This calls TypedDescriptor.__set__
        self.age = age    # This calls TypedDescriptor.__set__

# Normal usage
print("1. Creating object normally:")
person = Person("Alice", 30)
print(person.name)  # Output: Alice (calls __get__)
print(person.age)   # Output: 30 (calls __get__)

# Type checking works
print("\n2. Testing type error:")
try:
    person.age = "thirty" # Not an integer, raises TypeError
except TypeError as e:
    print(e) # Output: Expected type <class 'int'>, but got <class 'str'>

# Testing data descriptor priority
print("\n3. Testing data descriptor priority:")
person.__dict__["age"] = "sneakily set string" # Directly manipulate instance dictionary
print(person.age) # Output: 30 (Still 30!)

Explanation:

  • Now TypedDescriptor is a data descriptor (implements __set__).
  • In __set__, we perform type checking.
  • We use a "private" name (like _age) to store the actual data in the instance's __dict__ to avoid infinite recursion caused by having the same name as the descriptor.
  • The most crucial part is the final test: Even though we directly wrote an age attribute into the instance's __dict__, when we access person.age, Python still prioritizes and calls the data descriptor's __get__ method, returning the correct value 30 stored in _age. This proves that data descriptors have higher priority than instance attributes.

Step 3: Practical Application Scenarios

Descriptors are widely used in Python. Here are some classic examples:

  1. Properties (@property): Python's built-in @property decorator essentially creates a data descriptor. @property is an elegant, concise syntactic sugar for descriptors.
  2. Methods (methods): Ordinary functions in a class are also non-data descriptors. When you call instance.method(), Python binds the function to the instance via the descriptor protocol, creating a bound method.
  3. ORM (Object-Relational Mapping) Frameworks: For example, Django's model fields (models.CharField, models.IntegerField) are descriptors. They are responsible for converting and validating between Python objects and database column values.
  4. Lazy Evaluation: Descriptors can be used to implement lazy properties, which compute their value only on first access and then cache the result.

Summary

  • A Descriptor is a class that implements the __get__, __set__, or __delete__ methods.
  • Data Descriptors (with __set__/__delete__) have higher priority than instance attributes; Non-data Descriptors (only __get__) have lower priority than instance attributes.
  • The core purpose of descriptors is to encapsulate the access logic of a class attribute into a separate class, enabling reuse, validation, lazy computation, and other advanced functionalities.
  • Understanding the descriptor protocol and attribute lookup order is key to mastering descriptors.