1. What Is a Descriptor?
Put very simply, a descriptor is a class that can be used to call a method with simple dot notation, also referred to as attribute access, but theres obviously more to it than that. Its difficult to really explain beyond that without digging a little into how theyre implemented. So, heres a high-level view of the descriptor protocol.
A descriptor implements at least one of these three methods: __get__() , __set__() , or __delete__() . Each of these methods has a list of parameters that are needed, which will be discussed later in the book, and each is called by a different sort of access of the attribute the descriptor represents. Doing simple a.x access will call the __get__() method of x , setting the attribute using a.x = value will call the __set__() method of x , and using del a.x will call, as expected, the __delete__() method of x .
As stated earlier, only one of the methods needs to be implemented in order to be considered a descriptor, but any number of them can be implemented. And, depending on the descriptor type and which methods are implemented, not implementing certain methods can restrict certain types of attribute access or provide an interesting alternative behavior for them. There are two types of descriptors based on which sets of these methods are implemented: data and nondata.
Data Descriptors vs. Nondata Descriptors
A data descriptor implements at least __set__() or __delete__() , but can include both. They also often include __get__() since its rare to want to set something without also being able to get it too. You can get the value, even if the descriptor doesnt include __get__() , but its either a roundabout process or the descriptor writes it to the instance. That will be discussed more later in the book.
A nondata descriptor only implements __get__() . If it adds __set__() or __delete__() to its method list, it becomes a data descriptor.
Unfortunately, the PyPy interpreter (up to version 2.4.0; the fix is in the next version, which hadnt been released yet at the time of writing) gets this a little bit wrong. It doesnt take __delete__() into consideration until it knows that its a data descriptor, and PyPy doesnt believe something is a data descriptor unless __set__() is implemented. Luckily, since a majority of data descriptors implement __set__() , this rarely becomes a problem.
It may seem like the distinction is pointless, but it is not. It comes into play upon attribute look up. This will be discussed more later in the book, but basically the distinction is based on the types of uses it provides.
The Use of Descriptors by Python
It is worth noting that descriptors are an inherent part of how Python works. Python is known to be a multiparadigm language, and as such it supports paradigms such as functional programming, imperative programming, and object-oriented programming (among others). This book does not attempt to go into depth about the different paradigms, only the object-oriented programming paradigm will be observed. Descriptors are used implicitly in Python for the languages object-oriented mechanisms. As it will be explained in the chapters that follow, methods are implemented using descriptors. As you may guess from reading this, it is because of descriptors that object-oriented programming is possible in Python. Descriptors are very powerful and advanced, and this book aims to teach Python programmers how to use them fully .
Summary
Descriptors occupy a large part of the Python language, as they can replace attribute access with method calls and even restrict which types of attribute access is allowed. Now that you have a broad idea of how descriptors are implemented as well as their use by the language, lets dig a little deeper yet and gain a better understanding of how they work.
2. The Descriptor Protocol
In order to get a better idea of what descriptors are good for, it is best to finish showing the full descriptor protocol. Its time to see the full signature of the protocols methods and what the parameters are.
__get__(self, instance, owner)
This method is clearly the method for retrieving whatever data or object the descriptor is meant to maintain. Obviously, self is a parameter, since its a method. Also, it receives instance and/or owner .
Lets start with owner , which is the class that the descriptor is accessed from, or the class of the instance its being accessed from. When you make the call A.x , where x is a descriptor object with __get__() , its called with an owner and the instance is set to None . So the lookup gets effectively transformed into A.__dict__['x'].__get__(None, A) . This lets the descriptor know that __get__() is being called from a class, not an instance. The descriptor owner is also often written to have a default value of None , but thats largely an optimization that only built-in descriptors use.
Now, on to the other parameters. The parameter instance is the instance that the descriptor is being accessed from, if it is being accessed from an instance. As mentioned earlier, if None is passed into instance , the descriptor knows that its being called from the class level. But, if instance is not None , then it tells the descriptor which instance its being called from. So an a.x call will be effectively translated to type(a).__dict__['x'].__get__(a, type(a)) . Notice that it still receives the instances class. Notice also that the call still starts with type(a) , not just a , because descriptors are stored on classes. In order to be able to apply per-instance as well as per-class functionality, descriptors are given instance and owner (the class of the instance). How this translation and application happens will be discussed later in the book.
Remember that this applies to __set__() and __delete__() as well and self is an instance of the descriptor itself. It is not the instance that the descriptor is being called from, but instead, the instance parameter is the instance the descriptor is being called from. This may sound confusing at first, but dont worry if you dont understand for now, everything will be explained further.
The __get__() method is the only one that bothers to get the class separately. Thats because its the only method on nondata descriptors, which are generally made at a class level. The built-in decorator classmethod is implemented using descriptors and the __get__() method. In that case, it will make use of the owner parameter alone.
__set__(self, instance, value)