PEP: | 372 |
---|---|
Title: | Adding an ordered dictionary to collections |
Version: | 64410 |
Last-Modified: | 2008-06-19 13:51:39 -0700 (Thu, 19 Jun 2008) |
Author: | Armin Ronacher <armin.ronacher at active-4.com> |
Status: | Draft |
Type: | Standards Track |
Content-Type: | text/x-rst |
Created: | 15-Jun-2008 |
Python-Version: | 2.6, 3.0 |
Post-History: |
Contents
This PEP proposes an ordered dictionary as a new data structure for the collections module, called "odict" in this PEP for short. The proposed API incorporates the experiences gained from working with similar implementations that exist in various real-world applications and other programming languages.
In current Python versions, the widely used built-in dict type does not specify an order for the key/value pairs stored. This makes it hard to use dictionaries as data storage for some specific use cases.
Some dynamic programming languages like PHP and Ruby 1.9 guarantee a certain order on iteration. In those languages, and existing Python ordered-dict implementations, the ordering of items is defined by the time of insertion of the key. New keys are appended at the end, but keys that are overwritten are not moved to the end.
The following example shows the behavior for simple assignments:
>>> d = odict() >>> d['parrot'] = 'dead' >>> d['penguin'] = 'exploded' >>> d.items() [('parrot', 'dead'), ('penguin', 'exploded')]
That the ordering is preserved makes an odict useful for a couple of situations:
XML/HTML processing libraries currently drop the ordering of attributes, use a list instead of a dict which makes filtering cumbersome, or implement their own ordered dictionary. This affects ElementTree, html5lib, Genshi and many more libraries.
There are many ordered dict implementations in various libraries and applications, most of them subtly incompatible with each other. Furthermore, subclassing dict is a non-trivial task and many implementations don't override all the methods properly which can lead to unexpected results.
Additionally, many ordered dicts are implemented in an inefficient way, making many operations more complex then they have to be.
PEP 3115 allows metaclasses to change the mapping object used for the class body. An ordered dict could be used to create ordered member declarations similar to C structs. This could be useful, for example, for future ctypes releases as well as ORMs that define database tables as classes, like the one the Django framework ships. Django currently uses an ugly hack to restore the ordering of members in database models.
The RawConfigParser class accepts a dict_type argument that allows an application to set the type of dictionary used internally. The motivation for this addition was expressly to allow users to provide an ordered dictionary. [1]
Code ported from other programming languages such as PHP often depends on a ordered dict. Having an implementation of an ordering-preserving dictionary in the standard library could ease the transition and improve the compatibility of different libraries.
The ordered dict API would be mostly compatible with dict and existing ordered dicts. (Note: this PEP refers to the Python 2.x dictionary API; the transfer to the 3.x API is trivial.)
The constructor and update() both accept iterables of tuples as well as mappings like a dict does. The ordering however is preserved for the first case:
>>> d = odict([('a', 'b'), ('c', 'd')]) >>> d.update({'foo': 'bar'}) >>> d collections.odict([('a', 'b'), ('c', 'd'), ('foo', 'bar')])
If ordered dicts are updated from regular dicts, the ordering of new keys is of course undefined again unless sort() is called.
All iteration methods as well as keys(), values() and items() return the values ordered by the time the key-value pair was inserted:
>>> d['spam'] = 'eggs' >>> d.keys() ['a', 'c', 'foo', 'spam'] >>> d.values() ['b', 'd', 'bar', 'eggs'] >>> d.items() [('a', 'b'), ('c', 'd'), ('foo', 'bar'), ('spam', 'eggs')]
New methods not available on dict:
Returns the key/value pair for an index, that is, the "position" of a key in the ordered dict. 0 is the first key/value pair, -1 the last.
>>> d.byindex(2) ('foo', 'bar')
If there is no key for index an IndexError is raised. Slices are not supported.
Sorts the odict in place by cmp or key. This works exactly like list.sort(), but the comparison functions are passed a key/value tuple, not only the value.
>>> d = odict([(42, 1), (1, 4), (23, 7)]) d.sort() d collections.odict([(1, 4), (23, 7), (42, 1)])
The Python 3 version of the odict returns dictionary views rather than lists for odict.keys(), odict.values() and odict.items(). The keys view is equivalent to a regular keys view but supports the following extra or changed operations:
odict_keys.__getitem__(index)
Returns the key for an index. This is equivalent to odict.byindex(index).
odict_keys.index(key)
Returns the index for a key. This exists for compatibility with the Sequence abstract base class and is equivalent to odict.index(key).
odict_keys.__iter__()
Has the same semantics as odict.__iter__().
odict_keys.__reversed__()
Has the same semantics as odict.__reversed__().
odict_keys.__cmp__() / odict_keys.__eq__() / odict_keys.__ne__()
Same semantics as the equivalent odict operation. E.g.: when compared to another odict keys view the ordering is taken into account.
What happens if an existing key is reassigned?
The key is not moved but assigned a new value in place. This is consistent with existing implementations and allows subclasses to change the behavior easily:
class moving_odict(collections.odict): def __setitem__(self, key, value): self.pop(key, None) collections.odict.__setitem__(self, key, value)
What happens if keys appear multiple times in the list passed to the constructor?
The same as for regular dicts: The latter item overrides the former. This has the side-effect that the position of the first key is used because the key is actually overwritten:
>>> odict([('a', 1), ('b', 2), ('a', 3)]) collections.odict([('a', 3), ('b', 2)])This behavior is consistent with existing implementations in Python, the PHP array and the hashmap in Ruby 1.9.
Why is there no odict.insert()?
There are few situations where you really want to insert a key at an specified index. To avoid API complication, the proposed solution for this situation is creating a list of items, manipulating that and converting it back into an odict:
>>> d = odict([('a', 42), ('b', 23), ('c', 19)]) >>> l = d.items() >>> l.insert(1, ('x', 0)) >>> odict(l) collections.odict([('a', 42), ('x', 0), ('b', 23), ('c', 19)])
Is the ordered dict a dict subclass?
Yes. Like defaultdict, odict subclasses dict.
Does odict.pop() support list-like popping of items?
No. Neither odict.__getitem__() nor odict.pop() support retrieving or deleting items by index. Slices are not supported either. This would introduce ambiguities if integers or slice objects are used as dict keys.
As a matter of fact, odict does not implement the Sequence interface.
A poorly performing example implementation of the odict written in Python is available:
odict.py
The version for collections should be implemented in C and use a linked list internally.
Other implementations of ordered dicts in various Python projects or standalone libraries, that inspired the API proposed here, are:
With the availability of an ordered dict in the standard library, other libraries may take advantage of that. For example, ElementTree could return odicts in the future that retain the attribute ordering of the source file.
[1] | http://bugs.python.org/issue1371075 |
[2] | http://babel.edgewall.org/browser/trunk/babel/util.py?rev=374#L178 |
[3] | http://code.djangoproject.com/browser/django/trunk/django/utils/datastructures.py?rev=7140#L53 |
[4] | http://www.voidspace.org.uk/python/odict.html |
[5] | http://www.xs4all.nl/~anthon/Python/ordereddict/ |
[6] | http://pypi.python.org/pypi/StableDict/0.2 |
[7] | http://codespeak.net/svn/user/arigo/hack/pyfuse/OrderedDict.py |
This document has been placed in the public domain.