In one project I had to subclass the Python string type (namely str) in order to get some additional features.
Why I decided to do that?
Because I needed something:
- supporting almost all the methods of the standard strings
- with some custom attributes, additional methods
- that could be compared and mixed with strings.
I had almost no choice. But subclassing str is a task that should be handled with special care because it is a so called immutable type.
Example 1: a lowercase string
Let's consider a simple, but very helpful in many circumstances, use case: the implementation of a "lowercase string" type.
To create a similar object, a developer could write something like that:
class BrokenLowerCaseString(str): ''' This is going to fail!'''def__init__(self, value): ''' Return a string instance''' value =str(value).lower() str.__init__(self, value)
This code is going to silently fail in Python 2:
>>> BrokenLowerCaseString('Alice') 'Alice'
Even if the code runs smoothly, the string case is not lowered at all.
In Python 3 it will not even run:
>>> BrokenLowerCaseString('Alice')Traceback (most recent call last):File "<stdin>", line 1, in <module>File "test.py", line 27, in __init__ str.__init__(self, value)TypeError: object.__init__() takes no parameters
The right way to implement this type is to override the __new__ operator instead of the __init__ one.
This is generally true for all the immutable types [1].
class LowerCaseString(str): ''' Provides an object that is like a string but that will always be converted to lowercase'''def__new__(cls, value): ''' Return a string instance''' value =str(value).lower() returnstr.__new__(cls, value)
This time we got the expected result:
>>> LowerCaseString('Alice') 'alice'
This latter class is working because the __new__ operator returns a new instance of a string object created with an already lowered string! The __init__ method, instead, pretends to modify an already created immutable instance.
Once we have got this concept we can give more "superpowers" to our subclassed types.
Example 2: an email string
The next example shows a simple Email type, a string with:
- a constraint
- new attributes
- a property.
class Email(str): ''' Provides an object that is like a string but with additional attributes'''@staticmethoddef _is_valid(value): ''' Very simple validation'''return'@'instr(value) def__new__(cls, value, firstname='', lastname=''): ''' Return a string instance'''ifnot cls._is_valid(value): raiseValueError(value) returnstr.__new__(cls, value) def__init__(self, value, firstname='', lastname=''): ''' Add some attributes to the instance'''self.firstname =str(firstname) self.lastname =str(lastname) @propertydef fullname(self): ''' This property returns the name of the string'''return"".join((self.firstname, self.lastname))
The static method _is_valid accepts only objects that contain a '@' in it, so we can pass any type of object to the constructor:
>>> Email(['@']) "['@']"
Of course the validator could be improved, but for this post it is enough that it raises an error on invalid strings:
>>> Email('Sample string') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "test.py", line 114, in __new__ raise ValueError(value) ValueError: Sample string
I will now construct an Email instance:
>>> alice_email = Email('alice.burton@example.com', 'Alice', 'Burton')
This instance is also an instance of a string:
>>> isinstance(alice_email, str) True
Email instances have a property called fullname:
>>> alice_email.fullname 'Alice Burton'
And the additional attributes can be modified:
>>> alice_email.lastname = 'Cooper' >>> alice_email.fullname 'Alice Cooper'
There are some things to bear in mind when a similar operation is done. The subclassed object compares perfectly to a string:
>>> alice_email == 'alice.burton@example.com' True
The firstname and lastname attributes, in fact, are not taken into account during comparison:
>>> alice_email_noname = Email(u'alice.burton@example.com') >>> alice_email_noname.fullname == alice_email.fullname False >>> alice_email_noname == alice_email True
If you want to compare also the custom attributes, you should implement a custom __cmp__ method [2]. This is generally true when subclassing.
Conclusions and prospects
Subclassing strings (and other immutable types) has to be done in a peculiar way, but when you have to do it, this can give you a lot of power and functionality with very little amount of code. On the next post I will show you production code released on GitHub and pypi showing that, the same technique, applied to the int type, leads to a very elegant and simple solution for a complex problem.
Footnotes
[1] | See this document for further details: http://python-history.blogspot.it/2010/06/inside-story-on-new-style-classes.html |
[2] | See this in the Python data model documentation: https://docs.python.org/2/reference/datamodel.html#object.__cmp__ |
Credits
The picture of David Gilmour playing strings is taken from wikimedia.
The post title is inspired by the Divison bell song "What Do You want from me".