PEP: | 3103 |
---|---|
Title: | A Switch/Case Statement |
Version: | 56033 |
Last-Modified: | 2007-06-18 21:20:07 -0700 (Mon, 18 Jun 2007) |
Author: | guido at python.org (Guido van Rossum) |
Status: | Rejected |
Type: | Standards Track |
Content-Type: | text/x-rst |
Created: | 25-Jun-2006 |
Python-Version: | 3.0 |
Post-History: | 26-Jun-2006 |
Contents
A quick poll during my keynote presentation at PyCon 2007 shows this proposal has no popular support. I therefore reject it.
Python-dev has recently seen a flurry of discussion on adding a switch statement. In this PEP I'm trying to extract my own preferences from the smorgasboard of proposals, discussing alternatives and explaining my choices where I can. I'll also indicate how strongly I feel about alternatives I discuss.
This PEP should be seen as an alternative to PEP 275. My views are somewhat different from that PEP's author, but I'm grateful for the work done in that PEP.
This PEP introduces canonical names for the many variants that have been discussed for different aspects of the syntax and semantics, such as "alternative 1", "school II", "option 3" and so on. Hopefully these names will help the discussion.
A common programming idiom is to consider an expression and do different things depending on its value. This is usually done with a chain of if/elif tests; I'll refer to this form as the "if/elif chain". There are two main motivations to want to introduce new syntax for this idiom:
Both of these complaints are relatively mild; there isn't a lot of readability or performance to be gained by writing this differently. Yet, some kind of switch statement is found in many languages and it is not unreasonable to expect that its addition to Python will allow us to write up certain code more cleanly and efficiently than before.
There are forms of dispatch that are not suitable for the proposed switch statement; for example, when the number of cases is not statically known, or when it is desirable to place the code for different cases in different classes or files.
I'm considering several variants of the syntax first proposed in PEP 275 here. There are lots of other possibilities, but I don't see that they add anything.
I've recently been converted to alternative 1.
I should note that all alternatives here have the "implicit break" property: at the end of the suite for a particular case, the control flow jumps to the end of the whole switch statement. There is no way to pass control from one case to another. This in contrast to C, where an explicit 'break' statement is required to prevent falling through to the next case.
In all alternatives, the else-suite is optional. It is more Pythonic to use 'else' here rather than introducing a new reserved word, 'default', as in C.
Semantics are discussed in the next top-level section.
This is the preferred form in PEP 275:
switch EXPR: case EXPR: SUITE case EXPR: SUITE ... else: SUITE
The main downside is that the suites where all the action is are indented two levels deep; this can be remedied by indenting the cases "half a level" (e.g. 2 spaces if the general indentation level is 4).
This is Fredrik Lundh's preferred form; it differs by not indenting the cases:
switch EXPR: case EXPR: SUITE case EXPR: SUITE .... else: SUITE
Some reasons not to choose this include expected difficulties for auto-indenting editors, folding editors, and the like; and confused users. There are no situations currently in Python where a line ending in a colon is followed by an unindented line.
This is the same as alternative 2 but leaves out the colon after the switch:
switch EXPR case EXPR: SUITE case EXPR: SUITE .... else: SUITE
The hope of this alternative is that it will not upset the auto-indent logic of the average Python-aware text editor less. But it looks strange to me.
This leaves out the 'case' keyword on the basis that it is redundant:
switch EXPR: EXPR: SUITE EXPR: SUITE ... else: SUITE
Unfortunately now we are forced to indent the case expressions, because otherwise (at least in the absence of an 'else' keyword) the parser would have a hard time distinguishing between an unindented case expression (which continues the switch statement) or an unrelated statement that starts like an expression (such as an assignment or a procedure call). The parser is not smart enough to backtrack once it sees the colon. This is my least favorite alternative.
There is one additional concern that needs to be addressed syntactically. Often two or more values need to be treated the same. In C, this done by writing multiple case labels together without any code between them. The "fall through" semantics then mean that these are all handled by the same code. Since the Python switch will not have fall-through semantics (which have yet to find a champion) we need another solution. Here are some alternatives.
Use:
case EXPR:
to match on a single expression; use:
case EXPR, EXPR, ...:
to match on mulltiple expressions. The is interpreted so that if EXPR is a parenthesized tuple or another expression whose value is a tuple, the switch expression must equal that tuple, not one of its elements. This means that we cannot use a variable to indicate multiple cases. While this is also true in C's switch statement, it is a relatively common occurrence in Python (see for example sre_compile.py).
Use:
case EXPR:
to match on a single expression; use:
case in EXPR_LIST:
to match on multiple expressions. If EXPR_LIST is a single expression, the 'in' forces its interpretation as an iterable (or something supporting __contains__, in a minority semantics alternative). If it is multiple expressions, each of those is considered for a match.
Use:
case EXPR:
to match on a single expression; use:
case EXPR, EXPR, ...:
to match on multiple expressions (as in alternative A); and use:
case *EXPR:
to match on the elements of an expression whose value is an iterable. The latter two cases can be combined, so that the true syntax is more like this:
case [*]EXPR, [*]EXPR, ...:
The * notation is similar to the use of prefix * already in use for variable-length parameter lists and for passing computed argument lists, and often proposed for value-unpacking (e.g. a, b, *c = X as an alternative to (a, b), c = X[:2], X[2:]).
This is a mixture of alternatives B and C; the syntax is like alternative B but instead of the 'in' keyword it uses '*'. This is more limited, but still allows the same flexibility. It uses:
case EXPR:
to match on a single expression and:
case *EXPR:
to match on the elements of an iterable. If one wants to specify multiple matches in one case, one can write this:
case *(EXPR, EXPR, ...):
or perhaps this (although it's a bit strange because the relative priority of '*' and ',' is different than elsewhere):
case * EXPR, EXPR, ...:
Alternatives B, C and D are motivated by the desire to specify multiple cases with the same treatment using a variable representing a set (usually a tuple) rather than spelling them out. The motivation for this is usually that if one has several switches over the same set of cases it's a shame to have to spell out all the alternatives each time. An additional motivation is to be able to specify ranges to be matched easily and efficiently, similar to Pascal's "1..1000:" notation. At the same time we want to prevent the kind of mistake that is common in exception handling (and which will be addressed in Python 3000 by changing the syntax of the except clause): writing "case 1, 2:" where "case (1, 2):" was meant, or vice versa.
The case could be made that the need is insufficient for the added complexity; C doesn't have a way to express ranges either, and it's used a lot more than Pascal these days. Also, if a dispatch method based on dict lookup is chosen as the semantics, large ranges could be inefficient (consider range(1, sys.maxint)).
All in all my preferences are (from most to least favorite) B, A, D', C, where D' is D without the third possibility.
There are several issues to review before we can choose the right semantics.
There are several main schools of thought about the switch statement's semantics:
We need to further separate school I into school Ia and school Ib:
School II grew out of the realization that optimization of commonly found cases isn't so easy, and that it's better to face this head on. This will become clear below.
The differences between school I (mostly school Ib) and school II are threefold:
School I sees trouble in school II's approach of pre-freezing a dispatch dict because it places a new and unusual burden on programmers to understand exactly what kinds of case values are allowed to be frozen and when the case values will be frozen, or they might be surprised by the switch statement's behavior.
School II doesn't believe that school Ia's unoptimized switch is worth the effort, and it sees trouble in school Ib's proposal for optimization, which can cause optimized and unoptimized code to behave differently.
In addition, school II sees little value in allowing cases involving unhashable values; after all if the user expects such values, they can just as easily write an if/elif chain. School II also doesn't believe that it's right to allow dead code due to overlapping cases to occur unflagged, when the dict-based dispatch implementation makes it so easy to trap this.
However, there are some use cases for overlapping/duplicate cases. Suppose you're switching on some OS-specific constants (e.g. exported by the os module or some module like that). You have a case for each. But on some OS, two different constants have the same value (since on that OS they are implemented the same way -- like O_TEXT and O_BINARY on Unix). If duplicate cases are flagged as errors, your switch wouldn't work at all on that OS. It would be much better if you could arrange the cases so that one case has preference over another.
There's also the (more likely) use case where you have a set of cases to be treated the same, but one member of the set must be treated differently. It would be convenient to put the exception in an earlier case and be done with it.
(Yes, it seems a shame not to be able to diagnose dead code due to accidental case duplication. Maybe that's less important, and pychecker can deal with it? After all we don't diagnose duplicate method definitions either.)
This suggests school IIb: like school II but redundant cases must be resolved by choosing the first match. This is trivial to implement when building the dispatch dict (skip keys already present).
(An alternative would be to introduce new syntax to indicate "okay to have overlapping cases" or "ok if this case is dead code" but I find that overkill.)
Personally, I'm in school II: I believe that the dict-based dispatch is the one true implementation for switch statements and that we should face the limitiations up front, so that we can reap maximal benefits. I'm leaning towards school IIb -- duplicate cases should be resolved by the ordering of the cases instead of flagged as errors.
For the supporters of school II (dict-based dispatch), the next big dividing issue is when to create the dict used for switching. I call this "freezing the dict".
The main problem that makes this interesting is the observation that Python doesn't have named compile-time constants. What is conceptually a constant, such as re.IGNORECASE, is a variable to the compiler, and there's nothing to stop crooked code from modifying its value.
The most limiting option is to freeze the dict in the compiler. This would require that the case expressions are all literals or compile-time expressions involving only literals and operators whose semantics are known to the compiler, since with the current state of Python's dynamic semantics and single-module compilation, there is no hope for the compiler to know with sufficient certainty the values of any variables occurring in such expressions. This is widely though not universally considered too restrictive.
Raymond Hettinger is the main advocate of this approach. He proposes a syntax where only a single literal of certain types is allowed as the case expression. It has the advantage of being unambiguous and easy to implement.
My main complaint about this is that by disallowing "named constants" we force programmers to give up good habits. Named constants are introduced in most languages to solve the problem of "magic numbers" occurring in the source code. For example, sys.maxint is a lot more readable than 2147483647. Raymond proposes to use string literals instead of named "enums", observing that the string literal's content can be the name that the constant would otherwise have. Thus, we could write "case 'IGNORECASE':" instead of "case re.IGNORECASE:". However, if there is a spelling error in the string literal, the case will silently be ignored, and who knows when the bug is detected. If there is a spelling error in a NAME, however, the error will be caught as soon as it is evaluated. Also, sometimes the constants are externally defined (e.g. when parsing a file format like JPEG) and we can't easily choose appropriate string values. Using an explicit mappping dict sounds like a poor hack.
The oldest proposal to deal with this is to freeze the dispatch dict the first time the switch is executed. At this point we can assume that all the named "constants" (constant in the programmer's mind, though not to the compiler) used as case expressions are defined -- otherwise an if/elif chain would have little chance of success either. Assuming the switch will be executed many times, doing some extra work the first time pays back quickly by very quick dispatch times later.
An objection to this option is that there is no obvious object where the dispatch dict can be stored. It can't be stored on the code object, which is supposed to be immutable; it can't be stored on the function object, since many function objects may be created for the same function (e.g. for nested functions). In practice, I'm sure that something can be found; it could be stored in a section of the code object that's not considered when comparing two code objects or when pickling or marshalling a code object; or all switches could be stored in a dict indexed by weak references to code objects. The solution should also be careful not to leak switch dicts between multiple interpreters.
Another objection is that the first-use rule allows obfuscated code like this:
def foo(x, y): switch x: case y: print 42
To the untrained eye (not familiar with Python) this code would be equivalent to this:
def foo(x, y): if x == y: print 42
but that's not what it does (unless it is always called with the same value as the second argument). This has been addressed by suggesting that the case expressions should not be allowed to reference local variables, but this is somewhat arbitrary.
A final objection is that in a multi-threaded application, the first-use rule requires intricate locking in order to guarantee the correct semantics. (The first-use rule suggests a promise that side effects of case expressions are incurred exactly once.) This may be as tricky as the import lock has proved to be, since the lock has to be held while all the case expressions are being evaluated.
A proposal that has been winning support (including mine) is to freeze a switch's dict when the innermost function containing it is defined. The switch dict is stored on the function object, just as parameter defaults are, and in fact the case expressions are evaluated at the same time and in the same scope as the parameter defaults (i.e. in the scope containing the function definition).
This option has the advantage of avoiding many of the finesses needed to make option 2 work: there's no need for locking, no worry about immutable code objects or multiple interpreters. It also provides a clear explanation for why locals can't be referenced in case expressions.
This option works just as well for situations where one would typically use a switch; case expressions involving imported or global named constants work exactly the same way as in option 2, as long as they are imported or defined before the function definition is encountered.
A downside however is that the dispatch dict for a switch inside a nested function must be recomputed each time the nested function is defined. For certain "functional" styles of programming this may make switch unattractive in nested functions. (Unless all case expressions are compile-time constants; then the compiler is of course free to optimize away the swich freezing code and make the dispatch table part of the code object.)
Another downside is that under this option, there's no clear moment when the dispatch dict is frozen for a switch that doesn't occur inside a function. There are a few pragmatic choices for how to treat a switch outside a function:
- Disallow it.
- Translate it into an if/elif chain.
- Allow only compile-time constant expressions.
- Compute the dispatch dict each time the switch is reached.
- Like (b) but tests that all expressions evaluated are hashable.
Of these, (a) seems too restrictive: it's uniformly worse than (c); and (d) has poor performance for little or no benefits compared to (b). It doesn't make sense to have a performance-critical inner loop at the module level, as all local variable references are slow there; hence (b) is my (weak) favorite. Perhaps I should favor (e), which attempts to prevent atypical use of a switch; examples that work interactively but not in a function are annoying. In the end I don't think this issue is all that important (except it must be resolved somehow) and am willing to leave it up to whoever ends up implementing it.
When a switch occurs in a class but not in a function, we can freeze the dispatch dict at the same time the temporary function object representing the class body is created. This means the case expressions can reference module globals but not class variables. Alternatively, if we choose (b) above, we could choose this implementation inside a class definition as well.
There are a number of proposals to add a construct to the language that makes the concept of a value pre-computed at function definition time generally available, without tying it either to parameter default values or case expressions. Some keywords proposed include 'const', 'static', 'only' or 'cached'. The associated syntax and semantics vary.
These proposals are out of scope for this PEP, except to suggest that if such a proposal is accepted, there are two ways for the switch to benefit: we could require case expressions to be either compile-time constants or pre-computed values; or we could make pre-computed values the default (and only) evaluation mode for case expressions. The latter would be my preference, since I don't see a use for more dynamic case expressions that isn't addressed adequately by writing an explicit if/elif chain.
It is too early to decide. I'd like to see at least one completed proposal for pre-computed values before deciding. In the mean time, Python is fine without a switch statement, and perhaps those who claim it would be a mistake to add one are right.
This document has been placed in the public domain.