Descriptors and Attribute Access Control in Python

Descriptors and Attribute Access Control in Python

Descriptors are a powerful feature in Python that allow you to customize what happens when accessing an attribute. Essentially, a descriptor is a class that implements a specific protocol (i.e., defines at least one of the __get__, __set__, or __delete__ methods). This protocol can override the default attribute access behavior.

1. Why are descriptors needed?

Imagine a simple Person class with an age attribute. Logically, age should not be a negative number.

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age # If -5 is passed here, it is logically wrong, but the code will not report an error

p = Person("Alice", -5)
print(p.age) # Outputs -5

Using ordinary assignment and access, we cannot automatically validate the value of age. Descriptors are designed to solve such problems, allowing you to bind attribute access (get, set, delete) to specific methods, thereby inserting custom logic.

2. Detailed Explanation of the Descriptor Protocol

A class becomes a descriptor if it implements one or more of the following methods:

  • __get__(self, obj, type=None) -> object: Called when retrieving the descriptor attribute from an instance (obj) or class (type).
  • __set__(self, obj, value) -> None: Called when setting the descriptor attribute on an instance (obj).
  • __delete__(self, obj) -> None: Called when deleting the descriptor attribute from an instance (obj).

Based on the methods implemented, descriptors are divided into two categories:

  • Data Descriptor: Implements the __set__ or __delete__ method. It has the highest priority.
  • Non-Data Descriptor: Implements only the __get__ method. It has lower priority.

3. Step by Step: Building an Age Validation Descriptor

Let's build a descriptor step by step to ensure age is a non-negative number.

Step 1: Create the Descriptor Class

We create a class called NonNegative. It will be used as a data descriptor, so we need to implement __get__ and __set__.

class NonNegative:
    def __init__(self):
        # We temporarily use a simple instance variable to store the value
        self.value = 0

    def __get__(self, obj, objtype=None):
        # When accessing the attribute, return the stored value
        return self.value

    def __set__(self, obj, value):
        # When setting the attribute, validate first
        if value < 0:
            raise ValueError("Value cannot be negative")
        # If validation passes, store the value
        self.value = value

Step 2: Use the Descriptor in the Main Class

Now, we use this descriptor in the Person class. The usage is simple: define a class attribute as an instance of the descriptor class.

class Person:
    # age is now a descriptor instance
    age = NonNegative()

    def __init__(self, name, age):
        self.name = name
        self.age = age # This assignment triggers the descriptor's __set__ method

# Test
p1 = Person("Bob", 30)
print(p1.age) # Outputs 30. This triggers __get__

try:
    p2 = Person("Charlie", -5) # This will trigger __set__ and raise a ValueError
except ValueError as e:
    print(e) # Outputs: Value cannot be negative

Step 3: Solving the Problem of Shared Values Across Multiple Instances

The code above has a serious flaw! All Person instances share the same age value. This is because the descriptor instance age is a class attribute of Person. When we execute p1.age = 30 and p2.age = 25, we are modifying the same self.value of the single NonNegative instance.

To solve this, we need the descriptor to store different values for each instance of the class it belongs to. Typically, a dictionary is used, keyed by the instance obj.

class NonNegative:
    def __init__(self):
        # Use a dictionary to store data per instance
        self.data = {}

    def __get__(self, obj, objtype=None):
        # obj is the instance calling the descriptor (e.g., p1). If accessed via the class (e.g., Person.age), obj is None
        if obj is None:
            # When accessed via the class, usually return the descriptor itself
            return self
        # Retrieve the value corresponding to this instance from the dictionary
        return self.data.get(id(obj), 0) # If not found, return default value 0

    def __set__(self, obj, value):
        if value < 0:
            raise ValueError("Value cannot be negative")
        # Use the instance's memory address id(obj) as the key to store the value in the dictionary
        self.data[id(obj)] = value

    def __delete__(self, obj):
        # When deleting the attribute, remove the instance's data from the dictionary
        if id(obj) in self.data:
            del self.data[id(obj)]

Now test again:

p1 = Person("Bob", 30)
p2 = Person("Charlie", 25)

print(p1.age) # Outputs 30
print(p2.age) # Outputs 25
# Success! Each instance has its own independent value.

Step 4: Optimizing Memory with Weak References (Advanced)

The dictionary solution above has a potential issue: it holds a strong reference to the instance obj (via id(obj)). This means even if a Person instance is destroyed, its id and value remain in the dictionary, preventing memory from being freed.

To solve this, Python provides the weakref module, which can create weak references that do not prevent garbage collection.

import weakref

class NonNegative:
    def __init__(self):
        # Use a weak reference dictionary
        self.data = weakref.WeakKeyDictionary()

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        return self.data.get(obj, 0)

    def __set__(self, obj, value):
        if value < 0:
            raise ValueError("Value cannot be negative")
        self.data[obj] = value # Use obj directly as the key, WeakKeyDictionary handles it automatically

    # __delete__ is no longer strictly necessary because WeakKeyDictionary automatically cleans up entries when the instance is garbage collected.

The keys of WeakKeyDictionary are weak references to objects. When an instance obj is garbage collected, its entry in the dictionary is automatically removed. This is a more professional and recommended approach when building descriptors.

4. Priority of Attribute Lookup

The key to understanding descriptors is knowing Python's attribute lookup order. When you access an attribute instance.attr on an instance, the interpreter searches in the following order:

  1. Data Descriptor: Check if attr is a data descriptor (implements __set__) in type(instance) (the instance's class) or its parent classes. If yes, prioritize calling its __get__ method.
  2. Instance Attribute: Check if there is a key named attr in instance.__dict__.
  3. Non-Data Descriptor: Check if attr is a non-data descriptor (only implements __get__) in type(instance) or its parent classes. If yes, call its __get__ method.
  4. Class Attribute: Check if there is a key named attr in type(instance).__dict__.
  5. Search Parent Classes: Repeat the above process in parent classes according to the MRO.
  6. If nothing is found, AttributeError is raised.

This order can be simplified as: Data Descriptor > Instance Attribute > Non-Data Descriptor/Class Attribute.

5. Practical Applications of Descriptors

Descriptors are widely used in Python. Many familiar features are built on descriptors:

  • Methods: Functions defined in a class are non-data descriptors. When you call instance.method(), the __get__ method is triggered, returning a "bound method" tied to the instance.
  • @property Decorator: property itself is a class that implements the descriptor protocol. @property allows you to easily create getter, setter, and deleter methods for an attribute, all implemented via descriptors.
  • @classmethod and @staticmethod: These decorators are also implemented via descriptors, changing the behavior of how methods are called.

Summary

Descriptors are one of Python's advanced features, providing a powerful mechanism for attribute access control. By implementing the __get__, __set__, and __delete__ methods, you can intercept the getting, setting, and deleting of attributes, thereby inserting custom logic such as data validation, type checking, lazy calculation, and logging. Understanding the priority of data descriptors and non-data descriptors in the attribute lookup chain is key to mastering descriptors. Although writing descriptors directly is relatively rare, understanding them helps you gain a deeper insight into how Python itself works.