>>> def simple_coroutine(): # (1)
... print('-> coroutine started')
... x = yield # (2)
... print('-> coroutine received:', x)
...
>>> my_coro = simple_coroutine()
>>> my_coro # (3)
<generator object simple_coroutine at 0x100c2be10>
>>> next(my_coro) # (4)
-> coroutine started
>>> my_coro.send(42) # (5)
-> coroutine received: 42
Traceback (most recent call last): # (6)
...
StopIteration
Classic Coroutines
Contents:
Coroutine Flavors
There are many implementations of coroutines; even in Python there are several. […] Starting in Python 3.5, coroutines are a native feature of the language itself; however, understanding coroutines as they were first implemented in Python 3.4, using pre-existing language facilities, is the foundation to tackle Python 3.5’s native coroutines.[1]
A Web Crawler With asyncio Coroutines
We find two main senses for the verb "to yield" in dictionaries: to produce or to give way. Both senses apply in Python when we use the yield keyword in a generator. A line such as yield item produces a value that is received by the caller of next(…), and it also gives way, suspending the execution of the generator so that the caller may proceed until it’s ready to consume another value by invoking next() again. The caller pulls values from the generator.
A Python coroutine is essentially a generator driven by calls to its .send(…) method.
In a coroutine, the essential meaning of "to yield" is to give way—to hand control to some other part of the program, and wait until notified to resume.
The caller invokes my_coroutine.send(datum) to push data into the coroutine.
The coroutine then resumes and gets datum as the value of the yield expression where it was suspended.
In normal usage, a caller repeatedly pushes data into the coroutine in that way.
In contrast with generators, coroutines are usually data consumers, not producers.
Regardless of the flow of data, yield is a control flow device that can be used to implement cooperative multitasking:
each coroutine yields control to a central scheduler so that other coroutines can be activated.
Since version 3.5, Python has three kinds of coroutines:
- classic coroutines
-
A generator function that consumes data sent to it via
my_coro.send(data)calls, and reads that data by usingyieldin an expression. Classic coroutines can delegate to other classic coroutines usingyield from. - generator-based coroutines
-
A generator function decorated with
@types.coroutine, which makes it compatible with the newawaitkeyword, introduced in Python 3.5. - native coroutines
-
A coroutine defined with
async def. You can delegate from a native coroutine to another native coroutine or to a generator-based coroutine using theawaitkeyword, similar to how classic coroutines useyield from.
Native coroutines and generator-based coroutines are intended specifically for asynchronous I/O programming. This post is about classic coroutines.
Although native coroutines evolved from classic coroutines, they don’t replace them completely. Classic coroutines have some useful behaviors that native coroutines can’t emulate—and vice-versa, native coroutines have features that are missing in classic coroutines.
Classic coroutines are the product of a series of enhancements to generator functions. Following the evolution of coroutines in Python helps understand their features in stages of increasing functionality and complexity.
After a brief overview of how generators were enhanced to act as coroutines, we jump to the core of the post. Then we’ll see:
-
The behavior and states of a generator operating as a coroutine.
-
Priming a coroutine automatically with a decorator.
-
How the caller can control a coroutine through the
.close()and.throw(…)methods of the generator object. -
How coroutines can return values upon termination.
-
Usage and semantics of the new
yield fromsyntax. -
A use case: coroutines for managing concurrent activities in a simulation.
|
Note
|
In this post I often use the word "coroutine" to refer to classic coroutines—a.k.a. generator-based coroutines— except when I am contrasting them with native coroutines. |
Story of this Post
This is an updated version of chapter 16 of Fluent Python, First Edition.
With the arrival of native coroutines I decided to drop that chapter from the Second Edition, because it is a long and challenging study of a subject that is not as important as it used to be. Coroutines are rarely used except for asynchronous programming, and in that context classic coroutines are no longer supported—only native coroutines are.
However, I still believe that deep understanding of native coroutines and await depend
on understanding the behavior and API of classic coroutine objects and the precise meaning
of yield from. PEP 492 introduced async/await syntax,
and it states in section Await expression:
It (
await) uses theyield fromimplementation with an extra step of validating its argument.
That’s why this text still deserves the attention of motivated and curious Pythonistas.
How Coroutines Evolved from Generators
A classic coroutine is syntactically like a generator: just a function with the yield keyword in its body.
However, in a coroutine, yield usually appears on the right-hand side of an expression (e.g., datum = yield),
and it may or may not produce a value—if there is no expression after the yield keyword,
the generator yields None.
The coroutine may receive data from the caller, which uses coro.send(datum) instead of next(coro) to drive the coroutine.
Usually, the caller pushes values into the coroutine.
It is even possible that no data goes in or out through the yield keyword.
When you start thinking of yield primarily in terms of control flow, you have the mindset to understand why coroutines are useful for concurrent programming.
The infrastructure for coroutines appeared in PEP 342 — Coroutines via Enhanced Generators,
implemented in Python 2.5 (2006): since then, the yield keyword can be used in an expression, and the .send(value) method was added to the generator API.
This allows a generator to be used as a coroutine: a procedure that collaborates with the caller, yielding and receiving values from the caller.
In addition to .send(…), PEP 342 also added .throw(…) and .close() methods that respectively allow the caller to throw an exception to be handled inside the generator, and to terminate it. These features are covered in the next section and in Coroutine Termination and Exception Handling.
The latest evolutionary step for classic coroutines came with PEP 380 - Syntax for Delegating to a Subgenerator, implemented in Python 3.3 (2012). PEP 380 made two syntax changes to generator functions, to make them more useful as coroutines:
-
A generator can now
returna value; previously, providing a value to thereturnstatement inside a generator raised aSyntaxError. -
The
yield fromsyntax enables complex generators to be refactored into smaller, nested generators while avoiding a lot of boilerplate code previously required for a generator to delegate to subgenerators.
These changes will be addressed in Returning a Value from a Coroutine and Using yield from.
After PEP 380 there have been no major changes to classic coroutines. PEP 492 introduced native coroutines, but that’s a story for [async_ch].
Let’s follow the established tradition of Fluent Python and start with some very basic facts and examples, then move into increasingly mind-bending features.
Basic Behavior of a Generator Used as a Coroutine
Example 1 illustrates the behavior of a coroutine.
-
A coroutine is defined as a generator function: with
yieldin its body. -
yieldis used in an expression; when the coroutine is designed just to receive data from the client it yields None—this is implicit because there is no expression to the right of theyieldkeyword. -
As usual with generators, you call the function to get a generator object back.
-
The first call is
next(…)because the generator hasn’t started so it’s not waiting in ayieldand we can’t send it any data initially. -
This call makes the
yieldin the coroutine body evaluate to42; now the coroutine resumes and runs until the nextyieldor termination. -
In this case, control flows off the end of the coroutine body, which prompts the generator machinery to raise
StopIteration, as usual.
A coroutine can be in one of four states. You can determine the current state using the inspect.getgeneratorstate(…) function, which returns one of these strings:
'GEN_CREATED'-
Waiting to start execution.
'GEN_RUNNING'-
Currently being executed by the interpreter.[2]
'GEN_SUSPENDED'-
Currently suspended at a
yieldexpression. 'GEN_CLOSED'-
Execution has completed.
Because the argument to the send method will become the value of the pending yield expression, it follows that you can only make a call like my_coro.send(42) if the coroutine is currently suspended. But that’s not the case if the coroutine has never been activated—when its state is 'GEN_CREATED'. That’s why the first activation of a coroutine is always done with next(my_coro)—you can also call my_coro.send(None), and the effect is the same.
If you create a coroutine object and immediately try to send it a value that is not None, this is what happens:
>>> my_coro = simple_coroutine()
>>> my_coro.send(1729)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't send non-None value to a just-started generator
Note the error message: it’s quite clear.
The initial call next(my_coro) is often described as "priming" the coroutine (i.e., advancing it to the first yield to make it ready for use as a live coroutine).
To get a better feel for the behavior of a coroutine, an example that yields more than once is useful. See Example 2.
>>> def simple_coro2(a):
... print('-> Started: a =', a)
... b = yield a
... print('-> Received: b =', b)
... c = yield a + b
... print('-> Received: c =', c)
...
>>> my_coro2 = simple_coro2(14)
>>> from inspect import getgeneratorstate
>>> getgeneratorstate(my_coro2) (1)
'GEN_CREATED'
>>> next(my_coro2) (2)
-> Started: a = 14
14
>>> getgeneratorstate(my_coro2) (3)
'GEN_SUSPENDED'
>>> my_coro2.send(28) (4)
-> Received: b = 28
42
>>> my_coro2.send(99) (5)
-> Received: c = 99
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> getgeneratorstate(my_coro2) (6)
'GEN_CLOSED'
-
inspect.getgeneratorstatereportsGEN_CREATED(i.e., the coroutine has not started). -
Advance coroutine to first
yield, printing→ Started: a = 14message then yielding value ofaand suspending to wait for value to be assigned tob. -
getgeneratorstatereportsGEN_SUSPENDED(i.e., the coroutine is paused at ayieldexpression). -
Send number
28to suspended coroutine; theyieldexpression evaluates to28and that number is bound tob. The→ Received: b = 28message is displayed, the value ofa + bis yielded (42), and the coroutine is suspended waiting for the value to be assigned toc. -
Send number
99to suspended coroutine; theyieldexpression evaluates to99the number is bound toc. The→ Received: c = 99message is displayed, then the coroutine terminates, causing the generator object to raiseStopIteration. -
getgeneratorstatereportsGEN_CLOSED(i.e., the coroutine execution has completed).
It’s crucial to understand that the execution of the coroutine is suspended exactly at the yield keyword. As mentioned before, in an assignment statement, the code to the right of the = is evaluated before the actual assignment happens. This means that in a line like b = yield a, the value of b will only be set when the coroutine is activated later by the client code. It takes some effort to get used to this fact, but understanding it is essential to make sense of the use of yield in asynchronous programming, as we’ll see later.
Execution of the simple_coro2 coroutine can be split in three phases, as shown in Figure 1:
-
next(my_coro2)prints first message and runs to yield a, yielding number 14. -
my_coro2.send(28)assigns 28 to b, prints second message, and runs toyield a + b, yielding number 42. -
my_coro2.send(99)assigns 99 to c, prints third message, and the coroutine terminates.
Now let’s consider a slightly more involved coroutine example.
Example: Coroutine to Compute a Running Average
While discussing closures in [closures_and_decorators], we studied objects to compute a running average: [ex_average_oo] shows a plain class and [ex_average_fixed] presents a higher-order function producing a closure to keep the total and count variables across invocations. Example 3 shows how to do the same with a coroutine.[3]
def averager():
total = 0.0
count = 0
average = None
while True: # (1)
term = yield average # (2)
total += term
count += 1
average = total/count
-
This infinite loop means this coroutine will keep on accepting values and producing results as long as the caller sends them. This coroutine will only terminate when the caller calls
.close()on it, or when it’s garbage collected because there are no more references to it. -
The
yieldstatement here suspends the coroutine, produces a result to the caller, and—later—gets a value sent by the caller to the coroutine, which resumes its infinite loop.
The advantage of using a coroutine is that total and count can be simple local variables: no instance attributes or closures are needed to keep the context between calls. Example 4 are doctests to show the averager coroutine in operation.
>>> coro_avg = averager() # (1)
>>> next(coro_avg) # (2)
>>> coro_avg.send(10) # (3)
10.0
>>> coro_avg.send(30)
20.0
>>> coro_avg.send(5)
15.0
-
Create the coroutine object.
-
Prime it by calling
next. -
Now we are in business: each call to
.send(…)yields the current average.
In the doctest (Example 4), the call next(coro_avg) makes the coroutine advance to the yield, yielding the initial value for average, which is None, so it does not appear on the console. At this point, the coroutine is suspended at the yield, waiting for a value to be sent. The line coro_avg.send(10) provides that value, causing the coroutine to activate, assigning it to term, updating the total, count, and average variables, and then starting another iteration in the while loop, which yields the average and waits for another term.
The attentive reader may be anxious to know how the execution of an averager instance (e.g., coro_avg) may be terminated, because its body is an infinite loop. We’ll cover that in Coroutine Termination and Exception Handling.
But before discussing coroutine termination, let’s talk about getting them started. Priming a coroutine before use is a necessary but easy-to-forget chore. To avoid it, a special decorator can be applied to the coroutine. One such decorator is presented next.
Decorators for Coroutine Priming
You can’t do much with a coroutine without priming it: we must always remember to call next(my_coro) before my_coro.send(x). To make coroutine usage more convenient, a priming decorator is sometimes used. The coroutine decorator in Example 5 is an example.[4]
from functools import wraps
def coroutine(func):
"""Decorator: primes `func` by advancing to first `yield`"""
@wraps(func)
def primer(*args, **kwargs): # (1)
gen = func(*args, **kwargs) # (2)
next(gen) # (3)
return gen # (4)
return primer
-
The decorated generator function is replaced by this
primerfunction which, when invoked, returns the primed generator. -
Call the decorated function to get a generator object.
-
Prime the generator.
-
Return it.
"""
A coroutine to compute a running average
>>> coro_avg = averager() # (1)
>>> from inspect import getgeneratorstate
>>> getgeneratorstate(coro_avg) # (2)
'GEN_SUSPENDED'
>>> coro_avg.send(10) # (3)
10.0
>>> coro_avg.send(30)
20.0
>>> coro_avg.send(5)
15.0
"""
from coroutil import coroutine # (4)
@coroutine # (5)
def averager(): # (6)
total = 0.0
count = 0
average = None
while True:
term = yield average
total += term
count += 1
average = total/count
-
Call
averager(), creating a generator object that is primed inside theprimerfunction of thecoroutinedecorator. -
getgeneratorstatereportsGEN_SUSPENDED, meaning that the coroutine is ready to receive a value. -
You can immediately start sending values to
coro_avg: that’s the point of the decorator. -
Import the
coroutinedecorator. -
Apply it to the
averagerfunction. -
The body of the function is exactly the same as Example 3.
Several frameworks provide special decorators designed to work with coroutines. Not all of them actually prime the coroutine—some provide other services, such as hooking it to an event loop. One example from the Tornado asynchronous networking library is the tornado.gen decorator.
The yield from syntax we’ll see in Using yield from automatically primes the coroutine called by it, making it incompatible with decorators such as @coroutine from Example 5. The asyncio.coroutine decorator from the Python 3.4 standard library is designed to work with yield from so it does not prime the coroutine. We’ll cover it in [async_ch].
We’ll now focus on essential features of coroutines: the methods used to terminate and throw exceptions into them.
Coroutine Termination and Exception Handling
An unhandled exception within a coroutine propagates to the caller of the next or send that triggered it. Example 7 is an example using the decorated averager coroutine from Example 6.
>>> from coroaverager1 import averager
>>> coro_avg = averager()
>>> coro_avg.send(40) # (1)
40.0
>>> coro_avg.send(50)
45.0
>>> coro_avg.send('spam') # (2)
Traceback (most recent call last):
...
TypeError: unsupported operand type(s) for +=: 'float' and 'str'
>>> coro_avg.send(60) # (3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
-
Using the
@coroutinedecoratedaveragerwe can immediately start sending values. -
Sending a nonnumeric value causes an exception inside the coroutine.
-
Because the exception was not handled in the coroutine, it terminated. Any attempt to reactivate it will raise
StopIteration.
The cause of the error was the sending of a value 'spam' that could not be added to the total variable in the coroutine.
Example 7 suggests one way of terminating coroutines: you can use send with some sentinel value that tells the coroutine to exit. Constant built-in singletons like None and Ellipsis are convenient sentinel values. Ellipsis has the advantage of being quite unusual in data streams. Another sentinel value I’ve seen used is StopIteration—the class itself, not an instance of it (and not raising it). In other words, using it like: my_coro.send(StopIteration).
Since Python 2.5, generator objects have two methods that allow the client to explicitly send exceptions into the coroutine—throw and close:
generator.throw(exc_type[, exc_value[, traceback]])-
Causes the
yieldexpression where the generator was paused to raise the exception given. If the exception is handled by the generator, flow advances to the nextyield, and the value yielded becomes the value of thegenerator.throwcall. If the exception is not handled by the generator, it propagates to the context of the caller. generator.close()-
Causes the
yieldexpression where the generator was paused to raise aGeneratorExitexception. No error is reported to the caller if the generator does not handle that exception or raises StopIteration—usually by running to completion. When receiving aGeneratorExit, the generator must not yield a value, otherwise aRuntimeErroris raised. If any other exception is raised by the generator, it propagates to the caller.
|
Tip
|
The official documentation of the generator object methods is buried deep in The Python Language Reference, (see Generator-iterator methods). |
Let’s see how close and throw control a coroutine. Example 8 lists the demo_exc_handling function used in the following examples.
class DemoException(Exception):
"""An exception type for the demonstration."""
def demo_exc_handling():
print('-> coroutine started')
while True:
try:
x = yield
except DemoException: # (1)
print('*** DemoException handled. Continuing...')
else: # (2)
print(f'-> coroutine received: {x!r}')
raise RuntimeError('This line should never run.') # (3)
-
Special handling for
DemoException. -
If no exception, display received value.
-
This line will never be executed.
The last line in Example 8 is unreachable because the infinite loop can only be aborted with an unhandled exception, and that terminates the coroutine immediately.
Normal operation of demo_exc_handling is shown in Example 9.
>>> exc_coro = demo_exc_handling()
>>> next(exc_coro)
-> coroutine started
>>> exc_coro.send(11)
-> coroutine received: 11
>>> exc_coro.send(22)
-> coroutine received: 22
>>> exc_coro.close()
>>> from inspect import getgeneratorstate
>>> getgeneratorstate(exc_coro)
'GEN_CLOSED'
If the DemoException is thrown into the coroutine, it’s handled and the demo_exc_handling coroutine continues, as in Example 10.
>>> exc_coro = demo_exc_handling()
>>> next(exc_coro)
-> coroutine started
>>> exc_coro.send(11)
-> coroutine received: 11
>>> exc_coro.throw(DemoException)
*** DemoException handled. Continuing...
>>> getgeneratorstate(exc_coro)
'GEN_SUSPENDED'
On the other hand, if an unhandled exception is thrown into the coroutine, it stops—its state becomes 'GEN_CLOSED'. Example 11 demonstrates it.
>>> exc_coro = demo_exc_handling()
>>> next(exc_coro)
-> coroutine started
>>> exc_coro.send(11)
-> coroutine received: 11
>>> exc_coro.throw(ZeroDivisionError)
Traceback (most recent call last):
...
ZeroDivisionError
>>> getgeneratorstate(exc_coro)
'GEN_CLOSED'
If it’s necessary that some cleanup code is run no matter how the coroutine ends, you need to wrap the relevant part of the coroutine body in a try/finally block, as in Example 12.
class DemoException(Exception):
"""An exception type for the demonstration."""
def demo_finally():
print('-> coroutine started')
try:
while True:
try:
x = yield
except DemoException:
print('*** DemoException handled. Continuing...')
else:
print(f'-> coroutine received: {x!r}')
finally:
print('-> coroutine ending')
One of the main reasons why the yield from construct was added to Python 3.3 has to do with throwing exceptions into nested coroutines. The other reason was to enable coroutines to return values more conveniently. Read on to see how.
Returning a Value from a Coroutine
Example 13 shows a variation of the averager coroutine that returns a result. For didactic reasons, it does not yield the running average with each activation. This is to emphasize that some coroutines do not yield anything interesting, but are designed to return a value at the end, often the result of some accumulation.
The result returned by averager in Example 13 is a namedtuple with the number of terms averaged (count) and the average. I could have returned just the average value, but returning a tuple exposes another interesting piece of data that was accumulated: the count of terms.
from collections import namedtuple
Result = namedtuple('Result', 'count average')
def averager():
total = 0.0
count = 0
average = None
while True:
term = yield
if term is None:
break # (1)
total += term
count += 1
average = total/count
return Result(count, average) # (2)
-
In order to return a value, a coroutine must terminate normally; this is why this version of
averagerhas a condition to break out of its accumulating loop. -
Return a
namedtuplewith thecountandaverage. Before Python 3.3, it was a syntax error to return a value in a generator function.
To see how this new averager works, we can drive it from the console, as in Example 14.
>>> coro_avg = averager()
>>> next(coro_avg)
>>> coro_avg.send(10) # (1)
>>> coro_avg.send(30)
>>> coro_avg.send(6.5)
>>> coro_avg.send(None) # (2)
Traceback (most recent call last):
...
StopIteration: Result(count=3, average=15.5)
-
This version does not yield values.
-
Sending
Noneterminates the loop, causing the coroutine to end by returning the result. As usual, the generator object raisesStopIteration. Thevalueattribute of the exception carries the value returned.
Note that the value of the return expression is smuggled to the caller as an attribute of the StopIteration exception. This is a bit of a hack, but it preserves the existing behavior of generator objects: raising StopIteration when exhausted.
Example 15 shows how to retrieve the value returned by the coroutine.
>>> coro_avg = averager()
>>> next(coro_avg)
>>> coro_avg.send(10)
>>> coro_avg.send(30)
>>> coro_avg.send(6.5)
>>> try:
... coro_avg.send(None)
... except StopIteration as exc:
... result = exc.value
...
>>> result
Result(count=3, average=15.5)
This roundabout way of getting the return value from a coroutine makes more sense when we realize it was defined as part of PEP 380, and the yield from construct handles it automatically by catching StopIteration internally. This is analogous to the use of StopIteration in for loops: the exception is handled by the loop machinery in a way that is transparent to the user. In the case of yield from, the interpreter not only consumes the StopIteration, but its value attribute becomes the value of the yield from expression itself. Unfortunately we can’t test this interactively in the console, because it’s a syntax error to use yield from—or yield, for that matter—outside of a function.[5]
The next section has an example where the averager coroutine is used with yield from to produce a result, as intended in PEP 380. So let’s tackle yield from.
Using yield from
The first thing to know about yield from is that it is a completely new language construct. It does a lot more than yield.
The newer await keyword is very similar to yield from, and its name conveys a crucial point:
when a generator gen calls yield from subgen(), the subgen takes over and will yield values to the caller of gen;
the caller will in effect drive subgen directly.
Meanwhile gen will be blocked, waiting until subgen terminates.
However, await does not completely replace yield from.
Each of them has its own use cases, and await is more strict about its context and target:
await can only be used inside a native coroutine, and its target must be an awaitable object, which we will cover in [async_ch].
In contrast, yield from can be used in any function—which then becomes a generator—and its target can be any iterable.
This super simple yield from example cannot be written with async/await syntax:
>>> def gen123():
... yield from [1, 2, 3]
...
>>> tuple(gen123())
(1, 2, 3)
We’ve seen in [iterables2generators] that yield from can be used as a shortcut to yield in a for loop. For example, this:
>>> def gen():
... for c in 'AB':
... yield c
... for i in range(1, 3):
... yield i
...
>>> list(gen())
['A', 'B', 1, 2]
Can be written as:
>>> def gen():
... yield from 'AB'
... yield from range(1, 3)
...
>>> list(gen())
['A', 'B', 1, 2]
When we first mentioned yield from in [yield_from_sec0], the code from Example 16 demonstrates a practical use for it—although the itertools module already provides an optimized chain generator written in C.
>>> def chain(*iterables):
... for it in iterables:
... yield from it
...
>>> s = 'ABC'
>>> t = tuple(range(3))
>>> list(chain(s, t))
['A', 'B', 'C', 0, 1, 2]
Two slightly more complicated—but more useful—examples of yield from are the code in [traversing_tree_sec],
and "Recipe 4.14. Flattening a Nested Sequence" in Beazley and Jones’s Python Cookbook, 3E (source code available on GitHub).
The first thing the yield from x expression does with the x object is to call iter(x) to obtain an iterator from it. This means that x can be any iterable.
However, if replacing nested for loops yielding values was the only contribution of yield from, this language addition wouldn’t have had a good chance of being accepted. The real nature of yield from cannot be demonstrated with simple iterables; it requires the mind-expanding use of nested generators. That’s why PEP 380, which introduced yield from, is titled Syntax for Delegating to a Subgenerator.
The main feature of yield from is to open a bidirectional channel from the outermost caller to the innermost subgenerator, so that values can be sent and yielded back and forth directly from them, and exceptions can be thrown all the way in without adding a lot of exception handling boilerplate code in the intermediate coroutines. This is what enables coroutine delegation in a way that was not possible before.
The use of yield from requires a nontrivial arrangement of code. To talk about the required moving parts, PEP 380 uses some terms in a very specific way:
- delegating generator
-
The generator function that contains the
yield from <iterable>expression. - subgenerator
-
The generator obtained from the
<iterable>part of theyield fromexpression. This is the "subgenerator" mentioned in the title of PEP 380: "Syntax for Delegating to a Subgenerator." - caller
-
PEP 380 uses the term "caller" to refer to the client code that calls the delegating generator. Depending on context, I use "client" instead of "caller," to distinguish from the delegating generator, which is also a "caller" (it calls the subgenerator).
|
Tip
|
PEP 380 often uses the word "iterator" to refer to the subgenerator. That’s confusing because the delegating generator is also an iterator. So I prefer to use the term subgenerator, in line with the title of the PEP—“Syntax for Delegating to a Subgenerator.” However, the subgenerator can be a simple iterator implementing only |
Example 17 provides more context to see yield from at work, and Figure 2 identifies the relevant parts of the example.[6]
The coroaverager3.py script reads a dict with weights and heights from girls and boys in an imaginary seventh grade class. For example, the key 'boys;m' maps to the heights of 9 boys, in meters; 'girls;kg' are the weights of 10 girls in kilograms. The script feeds the data for each group into the averager coroutine we’ve seen before, and produces a report like this one:
$ python3 coroaverager3.py
9 boys averaging 40.42kg
9 boys averaging 1.39m
10 girls averaging 42.04kg
10 girls averaging 1.43m
The code in Example 17 is certainly not the most straightforward solution to the problem, but it serves to show yield from in action. This example is inspired by the one given in What’s New in Python 3.3.
from collections import namedtuple
Result = namedtuple('Result', 'count average')
# the subgenerator
def averager(): # (1)
total = 0.0
count = 0
average = None
while True:
term = yield # (2)
if term is None: # (3)
break
total += term
count += 1
average = total/count
return Result(count, average) # (4)
# the delegating generator
def grouper(results, key): # (5)
while True: # (6)
results[key] = yield from averager() # (7)
# the client code, a.k.a. the caller
def main(data): # (8)
results = {}
for key, values in data.items():
group = grouper(results, key) # (9)
next(group) # (10)
for value in values:
group.send(value) # (11)
group.send(None) # important! (12)
# print(results) # uncomment to debug
report(results)
# output report
def report(results):
for key, result in sorted(results.items()):
group, unit = key.split(';')
print(f'{result.count:2} {group:5}',
f'averaging {result.average:.2f}{unit}')
data = {
'girls;kg':
[40.9, 38.5, 44.3, 42.2, 45.2, 41.7, 44.5, 38.0, 40.6, 44.5],
'girls;m':
[1.6, 1.51, 1.4, 1.3, 1.41, 1.39, 1.33, 1.46, 1.45, 1.43],
'boys;kg':
[39.0, 40.8, 43.2, 40.8, 43.1, 38.6, 41.4, 40.6, 36.3],
'boys;m':
[1.38, 1.5, 1.32, 1.25, 1.37, 1.48, 1.25, 1.49, 1.46],
}
if __name__ == '__main__':
main(data)
-
Same
averagercoroutine from Example 13. Here it is the subgenerator. -
Each value sent by the client code in
mainwill be bound totermhere. -
The crucial terminating condition. Without it, a
yield fromcalling this coroutine will block forever. -
The returned
Resultwill be the value of theyield fromexpression ingrouper. -
grouperis the delegating generator. -
Each iteration in this loop creates a new instance of
averager; each is a generator object operating as a coroutine. -
Whenever
grouperis sent a value, it’s piped into theaveragerinstance by theyield from.grouperwill be suspended here as long as theaveragerinstance is consuming values sent by the client. When anaveragerinstance runs to the end, the value it returns is bound toresults[key]. Thewhileloop then proceeds to create anotheraveragerinstance to consume more values. -
mainis the client code, or "caller" in PEP 380 parlance. This is the function that drives everything. -
groupis a generator object resulting from callinggrouperwith theresultsdictto collect the results, and a particularkey. It will operate as a coroutine. -
Prime the coroutine.
-
Send each
valueinto thegrouper. That value ends up in theterm = yieldline ofaverager;groupernever has a chance to see it. -
Sending
Noneintogroupercauses the currentaveragerinstance to terminate, and allowsgrouperto run again, which creates anotheraveragerfor the next group of values.
The last callout in Example 17 with the comment “important!” highlights a crucial line of code: group.send(None), which terminates one averager and starts the next. If you comment out that line, the script produces no output. Uncommenting the print(results) line near the end of main reveals that the results dict ends up empty.
|
Note
|
If you want to figure out for yourself why no results are collected, it will be a great way to exercise your understanding of how |
Here is an overview of how Example 17 works, explaining what would happen if we omitted the call group.send(None) marked "important!" in main:
-
Each iteration of the outer
forloop creates a newgrouperinstance namedgroup; this is the delegating generator. -
The call
next(group)primes thegrouperdelegating generator, which enters itswhile Trueloop and suspends at theyield from, after calling the subgeneratoraverager. -
The inner
forloop callsgroup.send(value); this feeds the subgeneratoraveragerdirectly. Meanwhile, the currentgroupinstance ofgrouperis suspended at theyield from. -
When the inner
forloop ends, thegroupinstance is still suspended at theyield from, so the assignment toresults[key]in the body ofgrouperhas not happened yet. -
Without the last
group.send(None)in the outerforloop, theaveragersubgenerator never terminates, the delegating generatorgroupis never reactivated, and the assignment toresults[key]never happens. -
When execution loops back to the top of the outer
forloop, a newgrouperinstance is created and bound togroup. The previousgrouperinstance is garbage collected (together with its own unfinishedaveragersubgenerator instance).
|
Warning
|
The key takeaway from this experiment is: if a subgenerator never terminates, the delegating generator will be suspended forever at the |
Pipelines of coroutines
Example 17 demonstrates the simplest arrangement of yield from, with only one delegating generator and one subgenerator. Because the delegating generator works as a pipe, you can connect any number of them in a pipeline: one delegating generator uses yield from to call a subgenerator, which itself is a delegating generator calling another subgenerator with yield from, and so on. Eventually this chain must end in a simple generator that uses just yield, but it may also end in any iterable object, as in Example 16.
Every yield from chain must be driven by a client that calls next(…) or .send(…) on the outermost delegating generator. This call may be implicit, such as a for loop.
Now let’s review the formal description of the yield from construct, as presented in PEP 380.
The Meaning of yield from
|
Note
|
This used to be one of the most challenging sections in Fluent Python.
You may wonder whether it is worth your attention,
given that most uses of |
While developing PEP 380, Greg Ewing—the PEP author—was questioned about the complexity of the proposed semantics. One of his answers was "For humans, almost all the important information is contained in one paragraph near the top." He then quoted part of the draft of PEP 380 which at the time read as follows:
"When the iterator is another generator, the effect is the same as if the body of the subgenerator were inlined at the point of the
yield fromexpression. Furthermore, the subgenerator is allowed to execute areturnstatement with a value, and that value becomes the value of theyield fromexpression."[8]
Those soothing words are no longer part of the PEP—because they don’t cover all the corner cases. But they are OK as a first approximation.
We will tackle our study of yield from in two steps: first, its basic behavior, which covers many use cases.
After that, we’ll see what happens when the subgenerator is terminated before it runs to completion, as well as other exceptional execution paths.
Basic behavior of yield from
The approved version of PEP 380 explains the behavior of yield from in six points in the Proposal section. Here I reproduce them almost exactly, except that I replaced every occurrence of the ambiguous word "iterator" with "subgenerator" and added a few clarifications. Let’s start with the four points that are illustrated by coroaverager3.py in Example 17:
-
Any values that the subgenerator yields are passed directly to the caller of the delegating generator (i.e., the client code).
-
Any values sent to the delegating generator using send() are passed directly to the subgenerator. If the sent value is None, the subgenerator’s
next()method is called. If the sent value is not None, the subgenerator’s send() method is called. If the call raises StopIteration, the delegating generator is resumed. Any other exception is propagated to the delegating generator. -
return expr in a generator (or subgenerator) causes StopIteration(expr) to be raised upon exit from the generator.
-
The value of the yield from expression is the first argument to the StopIteration exception raised by the subgenerator when it terminates.
The other two points about yield from in PEP 380 have to do with exceptions and termination.
We’ll see them in Exception handling in yield from.
For now, let’s study on the behaviour of yield from under "normal" operating conditions.
The detailed semantics of yield from are subtle. Greg Ewing did a great job putting them to words in English in PEP 380.
Ewing also documented the behavior of yield from using pseudocode (with Python syntax). I personally found it useful to spend some time studying the pseudocode in PEP 380. However, the pseudocode is 40 lines long and not easy to grasp at first.
A good way to approach that pseudocode is to simplify it to handle only the most basic and common uses of yield from.
Consider that yield from appears in a delegating generator. The client code drives the delegating generator, which drives the subgenerator. So, to simplify the logic involved, let’s assume the client doesn’t ever call .throw(…) or .close() on the delegating generator. Let’s also assume the subgenerator never raises an exception until it terminates, when StopIteration is raised by the interpreter.
The coroaverager3.py script in Example 17 is an example where those simplifying assumptions hold. In fact, often the delegating generator is expected to run to completion. So let’s see how yield from works in this happier, simpler world.
Take a look at Example 18, which is an expansion of this single statement, in the body of the delegating generator:
RESULT = yield from EXPR
Try to follow the logic in Example 18.
_i = iter(EXPR) # (1)
try:
_y = next(_i) # (2)
except StopIteration as _e:
_r = _e.value # (3)
else:
while 1: # (4)
_s = yield _y # (5)
try:
_y = _i.send(_s) # (6)
except StopIteration as _e: # (7)
_r = _e.value
break
RESULT = _r # (8)
-
The
EXPRcan be any iterable, becauseiter()is applied to get an iterator_i(this is the subgenerator). -
The subgenerator is primed; the result is stored to be the first yielded value
_y. -
If
StopIterationwas raised, extract thevalueattribute from the exception and assign it to_r: this is theRESULTin the simplest case. -
While this loop is running, the delegating generator is blocked, operating just as a channel between the caller and the subgenerator.
-
Yield the current item yielded from the subgenerator; wait for a value
_ssent by the caller. Note that this is the onlyyieldin this listing. -
Try to advance the subgenerator, forwarding the
_ssent by the caller. -
If the subgenerator raised
StopIteration, get thevalue, assign to_r, and exit the loop, resuming the delegating generator. -
_ris theRESULT: the value of the wholeyield fromexpression.
In this simplified pseudocode, I preserved the variable names used in the pseudocode published in PEP 380. The variables are:
_i(iterator)-
The subgenerator
_y(yielded)-
A value yielded from the subgenerator
_r(result)-
The eventual result (i.e., the value of the
yield fromexpression when the subgenerator ends) _s(sent)-
A value sent by the caller to the delegating generator, which is forwarded to the subgenerator
_e(exception)-
An exception (always an instance of
StopIterationin this simplified pseudocode)
Besides not handling .throw(…) and .close(), the simplified pseudocode always uses .send(…) to forward next() or .send(…) calls by the client to the subgenerator. Don’t worry about these fine distinctions on a first reading. As mentioned, coroaverager3.py in Example 17 would run perfectly well if the yield from did only what is shown in the simplified pseudocode in Example 18.
The next section covers the behavior of yield from when the subgenerator ends prematurely, either because the client cancels it, or an unhandled exception is raised.
Exception handling in yield from
In Basic behavior of yield from we saw the first four points about yield from behavior from PEP 380, and pseudo-code describing that behavior. But the reality is more complicated, because of the need to handle .throw(…) and .close() calls from the client, which must be passed into the subgenerator. Here are the other points of the PEP 380 Proposal section, slightly edited:
-
Exceptions other than GeneratorExit thrown into the delegating generator are passed to the throw() method of the subgenerator. If the call raises StopIteration, the delegating generator is resumed. Any other exception is propagated to the delegating generator.
-
If a GeneratorExit exception is thrown into the delegating generator, or the close() method of the delegating generator is called, then the close() method of the subgenerator is called if it has one. If this call results in an exception, it is propagated to the delegating generator. Otherwise, GeneratorExit is raised in the delegating generator.
Also, the subgenerator may be a plain iterator that does not support .throw(…) or .close(), so this must be handled by the yield from logic. If the subgenerator does implement those methods, inside the subgenerator both methods cause exceptions to be raised, which must be handled by the yield from machinery as well. The subgenerator may also throw exceptions of its own, unprovoked by the caller, and this must also be dealt with in the yield from implementation. Finally, as an optimization, if the caller calls next(…) or .send(None), both are forwarded as a next(…) call on the subgenerator; only if the caller sends a non-None value, the .send(…) method of the subgenerator is used.
For your convenience, I present here the complete pseudocode of the yield from expansion from PEP 380, with numbered annotations.
Example 19 was copied verbatim; I only added the callout numbers.
Most of the logic of the yield from pseudocode is implemented in six try/except blocks nested up to four levels deep, so it’s a bit hard to read.
The only other control flow keywords used are one while, one if, and one yield.
Find the while, the yield, the next(…), and the .send(…) calls:
they will help you get an idea of how the whole structure works.
Remember that all the code shown in Example 19 is an expansion of this single statement, in the body of a delegating generator:
RESULT = yield from EXPR
_i = iter(EXPR) # (1)
try:
_y = next(_i) # (2)
except StopIteration as _e:
_r = _e.value # (3)
else:
while 1: # (4)
try:
_s = yield _y # (5)
except GeneratorExit as _e: # (6)
try:
_m = _i.close
except AttributeError:
pass
else:
_m()
raise _e
except BaseException as _e: # (7)
_x = sys.exc_info()
try:
_m = _i.throw
except AttributeError:
raise _e
else: # (8)
try:
_y = _m(*_x)
except StopIteration as _e:
_r = _e.value
break
else: # (9)
try: # (10)
if _s is None: # (11)
_y = next(_i)
else:
_y = _i.send(_s)
except StopIteration as _e: # (12)
_r = _e.value
break
RESULT = _r # (13)
-
The
EXPRcan be any iterable, becauseiter()is applied to get an iterator_i(this is the subgenerator). -
The subgenerator is primed; the result is stored to be the first yielded value
_y. -
If
StopIterationwas raised, extract thevalueattribute from the exception and assign it to_r: this is theRESULTin the simplest case. -
While this loop is running, the delegating generator is blocked, operating just as a channel between the caller and the subgenerator.
-
Yield the current item yielded from the subgenerator; wait for a value
_ssent by the caller. This is the onlyyieldin this listing. -
This deals with closing the delegating generator and the subgenerator. Because the subgenerator can be any iterator, it may not have a
closemethod. -
This deals with exceptions thrown in by the caller using
.throw(…). Again, the subgenerator may be an iterator with nothrowmethod to be called—in which case the exception is raised in the delegating generator. -
If the subgenerator has a
throwmethod, call it with the exception passed from the caller. The subgenerator may handle the exception (and the loop continues); it may raiseStopIteration(the_rresult is extracted from it, and the loop ends); or it may raise the same or another exception, which is not handled here and propagates to the delegating generator. -
If no exception was received when yielding…
-
Try to advance the subgenerator…
-
Call
nexton the subgenerator if the last value received from the caller wasNone, otherwise callsend. -
If the subgenerator raised
StopIteration, get thevalue, assign to_r, and exit the loop, resuming the delegating generator. -
_ris theRESULT: the value of the wholeyield fromexpression.
Right at the top of Example 19, one important detail revealed by the pseudocode is that the subgenerator is primed (second callout in Example 19).[9] This means that auto-priming decorators such as that in Decorators for Coroutine Priming are incompatible with yield from.
In the same message I quoted in the opening of this section, Greg Ewing has this to say about the pseudocode expansion of yield from:
You’re not meant to learn about it by reading the expansion—that’s only there to pin down all the details for language lawyers.
Focusing on the details of the pseudocode expansion may not be helpful—depending on your learning style. Studying real code that uses yield from is certainly more profitable than poring over the pseudocode of its implementation. However, almost all the yield from examples I’ve seen are tied to asynchronous programming with the asyncio module, so they depend on an active event loop to work—and most such code now uses await instead of yield from. There are a few links in Further Reading to interesting code using yield from without an event loop.
We’ll now move on to a classic example of coroutine usage: programming simulations. This example does not showcase yield from, but it does reveal how coroutines are used to manage concurrent activities on a single thread.
Use Case: Coroutines for Discrete Event Simulation
Coroutines are a natural way of expressing many algorithms, such as simulations, games, asynchronous I/O, and other forms of event-driven programming or co-operative multitasking.[10]
PEP 342—Coroutines via Enhanced Generators
In this section, I will describe a very simple simulation implemented using just coroutines and standard library objects. Simulation is a classic application of coroutines in the computer science literature. Simula, the first OO language, introduced the concept of coroutines precisely to support simulations.
|
Note
|
The motivation for the following simulation example is not academic. Coroutines are the fundamental building block of the |
Before going into the example, a word about simulations.
About Discrete Event Simulations
A discrete event simulation (DES) is a type of simulation where a system is modeled as a sequence of events. In a DES, the simulation "clock" does not advance by fixed increments, but advances directly to the simulated time of the next modeled event. For example, if we are simulating the operation of a taxi cab from a high-level perspective, one event is picking up a passenger, the next is dropping the passenger off. It doesn’t matter if a trip takes 5 or 50 minutes: when the drop off event happens, the clock is updated to the end time of the trip in a single operation. In a DES, we can simulate a year of cab trips in less than a second. This is in contrast to a continuous simulation where the clock advances continuously by a fixed—and usually small—increment.
Intuitively, turn-based games are examples of discrete event simulations: the state of the game only changes when a player moves, and while a player is deciding the next move, the simulation clock is frozen. Real-time games, on the other hand, are continuous simulations where the simulation clock is running all the time, the state of the game is updated many times per second, and slow players are at a real disadvantage.
Both types of simulations can be written with multiple threads or a single thread using event-oriented programming techniques such as callbacks or coroutines driven by an event loop. It’s arguably more natural to implement a continuous simulation using threads to account for actions happening in parallel in real time. On the other hand, coroutines offer exactly the right abstraction for writing a DES. SimPy[11] is a DES package for Python that uses one coroutine to represent each process in the simulation.
|
Tip
|
In the field of simulation, the term process refers to the activities of an entity in the model, and not to an OS process. A simulation process may be implemented as an OS process, but usually a thread or a coroutine is used for that purpose. |
If you are interested in simulations, SimPy is well worth studying. However, in this section, I will describe a very simple DES implemented using only standard library features. My goal is to help you develop an intuition about programming concurrent actions with coroutines. Understanding the next section will require careful study, but the reward will come as insights on how libraries such as asyncio, Twisted, and Tornado can manage many concurrent activities using a single thread of execution.
The Taxi Fleet Simulation
In our simulation program, taxi_sim.py, a number of taxi cabs are created. Each will make a fixed number of trips and then go home. A taxi leaves the garage and starts "prowling”—looking for a passenger. This lasts until a passenger is picked up, and a trip starts. When the passenger is dropped off, the taxi goes back to prowling.
The time elapsed during prowls and trips is generated using an exponential distribution. For a cleaner display, times are in whole minutes, but the simulation would work as well using float intervals.[12] Each change of state in each cab is reported as an event. Figure 3 shows a sample run of the program.
The most important thing to note in Figure 3 is the interleaving of the trips by the three taxis. I manually added the arrows to make it easier to see the taxi trips: each arrow starts when a passenger is picked up and ends when the passenger is dropped off. Intuitively, this demonstrates how coroutines can be used for managing concurrent activities.
Other things to note about Figure 3:
-
Each taxi leaves the garage 5 minutes after the other.
-
It took 2 minutes for taxi 0 to pick up the first passenger at
time=2; 3 minutes for taxi 1 (time=8), and 5 minutes for taxi 2 (time=15). -
The cabbie in taxi 0 only makes two trips (purple arrows): the first starts at
time=2and ends at time=18; the second starts attime=28and ends at time=65—the longest trip in this simulation run. -
Taxi 1 makes four trips (green arrows) then goes home at
time=110. -
Taxi 2 makes six trips (red arrows) then goes home at
time=109. His last trip lasts only one minute, starting attime=97.[13] -
While taxi 1 is making her first trip, starting at
time=8, taxi 2 leaves the garage attime=10and completes two trips (short red arrows). -
In this sample run, all scheduled events completed in the default simulation time of 180 minutes; last event was at
time=110.
The simulation may also end with pending events. When that happens, the final message reads like this:
*** end of simulation time: 3 events pending ***
The full listing of taxi_sim.py is at [support_taxi_sim]. In this post, we’ll show only the parts that are relevant to our study of coroutines. The really important functions are only two: taxi_process (a coroutine), and the Simulator.run method where the main loop of the simulation is executed.
Example 20 shows the code for taxi_process. This coroutine uses two objects defined elsewhere: the compute_delay function, which returns a time interval in minutes, and the Event class, a named tuple defined like this:
Event = collections.namedtuple('Event', 'time proc action')
In an Event instance, time is the simulation time when the event will occur, proc is the identifier of the taxi process instance, and action is a string describing the activity.
Let’s review taxi_process play by play in Example 20.
def taxi_process(ident, trips, start_time=0): # (1)
"""Yield to simulator issuing event at each state change"""
time = yield Event(start_time, ident, 'leave garage') # (2)
for i in range(trips): # (3)
time = yield Event(time, ident, 'pick up passenger') # (4)
time = yield Event(time, ident, 'drop off passenger') # (5)
yield Event(time, ident, 'going home') # (6)
# end of taxi process # (7)
-
taxi_processwill be called once per taxi, creating a generator object to represent its operations.identis the number of the taxi (e.g., 0, 1, 2 in the sample run);tripsis the number of trips this taxi will make before going home;start_timeis when the taxi leaves the garage. -
The first
Eventyielded is'leave garage'. This suspends the coroutine, and lets the simulation main loop proceed to the next scheduled event. When it’s time to reactivate this process, the main loop willsendthe current simulation time, which is assigned totime. -
This block will be repeated once for each trip.
-
An
Eventsignaling passenger pick up is yielded. The coroutine pauses here. When the time comes to reactivate this coroutine, the main loop will againsendthe current time. -
An
Eventsignaling passenger drop off is yielded. The coroutine is suspended again, waiting for the main loop to send it the time of when it’s reactivated. -
The
forloop ends after the given number of trips, and a final'going home'event is yielded. The coroutine will suspend for the last time. When reactivated, it will be sent the time from the simulation main loop, but here I don’t assign it to any variable because it will not be used. -
When the coroutine falls off the end, the generator object raises
StopIteration.
You can "drive" a taxi yourself by calling taxi_process in the Python console.[14] Example 21 shows how.
>>> from taxi_sim import taxi_process
>>> taxi = taxi_process(ident=13, trips=2, start_time=0) (1)
>>> next(taxi) (2)
Event(time=0, proc=13, action='leave garage')
>>> taxi.send(_.time + 7) (3)
Event(time=7, proc=13, action='pick up passenger') (4)
>>> taxi.send(_.time + 23) (5)
Event(time=30, proc=13, action='drop off passenger')
>>> taxi.send(_.time + 5) (6)
Event(time=35, proc=13, action='pick up passenger')
>>> taxi.send(_.time + 48) (7)
Event(time=83, proc=13, action='drop off passenger')
>>> taxi.send(_.time + 1)
Event(time=84, proc=13, action='going home') (8)
>>> taxi.send(_.time + 10) (9)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
-
Create a generator object to represent a taxi with
ident=13that will make two trips and start working att=0. -
Prime the coroutine; it yields the initial event.
-
We can now
sendit the current time. In the console, the_variable is bound to the last result; here I add7to the time, which means thetaxiwill spend 7 minutes searching for the first passenger. -
This is yielded by the
forloop at the start of the first trip. -
Sending
_.time + 23means the trip with the first passenger will last 23 minutes. -
Then the
taxiwill prowl for 5 minutes. -
The last trip will take 48 minutes.
-
After two complete trips, the loop ends and the
'going home'event is yielded. -
The next attempt to
sendto the coroutine causes it to fall through the end. When it returns, the interpreter raisesStopIteration.
Note that in Example 21 I am using the console to emulate the simulation main loop. I get the .time attribute of an Event yielded by the taxi coroutine, add an arbitrary number, and use the sum in the next taxi.send call to reactivate it. In the simulation, the taxi coroutines are driven by the main loop in the Simulator.run method. The simulation "clock" is held in the sim_time variable, and is updated by the time of each event yielded.
To instantiate the Simulator class, the main function of taxi_sim.py builds a taxis dictionary like this:
taxis = {i: taxi_process(i, (i + 1) * 2, i * DEPARTURE_INTERVAL)
for i in range(num_taxis)}
sim = Simulator(taxis)
DEPARTURE_INTERVAL is 5; if num_taxis is 3 as in the sample run, the preceding lines will do the same as:
taxis = {0: taxi_process(ident=0, trips=2, start_time=0),
1: taxi_process(ident=1, trips=4, start_time=5),
2: taxi_process(ident=2, trips=6, start_time=10)}
sim = Simulator(taxis)
Therefore, the values of the taxis dictionary will be three distinct generator objects with different parameters. For instance, taxi 1 will make 4 trips and begin looking for passengers at start_time=5. This dict is the only argument required to build a Simulator instance.
The Simulator.init method is shown in Example 22. The main data structures of Simulator are:
self.events-
A
PriorityQueueto holdEventinstances. APriorityQueuelets youputitems, thengetthem ordered byitem[0]; i.e., thetimeattribute in the case of ourEventnamedtupleobjects. self.procs-
A
dictmapping each process number to an active process in the simulation—a generator object representing one taxi. This will be bound to a copy oftaxisdictshown earlier.
class Simulator:
def __init__(self, procs_map):
self.events = queue.PriorityQueue() (1)
self.procs = dict(procs_map) (2)
-
The
PriorityQueueto hold the scheduled events, ordered by increasing time. -
We get the
procs_mapargument as adict(or any mapping), but build adictfrom it, to have a local copy because when the simulation runs, each taxi that goes home is removed fromself.procs, and we don’t want to change the object passed by the user.
Priority queues are a fundamental building block of discrete event simulations: events are created in any order, placed in the queue, and later retrieved in order according to the scheduled time of each one. For example, the first two events placed in the queue may be:
Event(time=14, proc=0, action='pick up passenger')
Event(time=11, proc=1, action='pick up passenger')
This means that taxi 0 will take 14 minutes to pick up the first passenger, while taxi 1—starting at time=10—will take 1 minute and pick up a passenger at time=11. If those two events are in the queue, the first event the main loop gets from the priority queue will be Event(time=11, proc=1, action='pick up passenger').
Now let’s study the main algorithm of the simulation, the Simulator.run method. It’s invoked by the main function right after the Simulator is instantiated, like this:
sim = Simulator(taxis)
sim.run(end_time)
The listing with callouts for the Simulator class is in Example 23, but here is a high-level view of the algorithm implemented in Simulator.run:
-
Loop over processes representing taxis.
-
Prime the coroutine for each taxi by calling next() on it. This will yield the first Event for each taxi.
-
Put each event in the self.events queue of the Simulator.
-
-
Run the main loop of the simulation while sim_time < end_time.
-
Check if self.events is empty; if so, break from the loop.
-
Get the current_event from self.events. This will be the Event object with the lowest time in the PriorityQueue.
-
Display the Event.
-
Update the simulation time with the time attribute of the current_event.
-
Send the time to the coroutine identified by the proc attribute of the current_event. The coroutine will yield the next_event.
-
Schedule next_event by adding it to the self.events queue.
-
The complete Simulator class is Example 23.
class Simulator:
def __init__(self, procs_map):
self.events = queue.PriorityQueue()
self.procs = dict(procs_map)
def run(self, end_time): # (1)
"""Schedule and display events until time is up"""
# schedule the first event for each cab
for _, proc in sorted(self.procs.items()): # (2)
first_event = next(proc) # (3)
self.events.put(first_event) # (4)
# main loop of the simulation
sim_time = 0 # (5)
while sim_time < end_time: # (6)
if self.events.empty(): # (7)
print('*** end of events ***')
break
current_event = self.events.get() # (8)
sim_time, proc_id, previous_action = current_event # (9)
print('taxi:', proc_id, proc_id * ' ', current_event) # (10)
active_proc = self.procs[proc_id] # (11)
next_time = sim_time + compute_duration(previous_action) # (12)
try:
next_event = active_proc.send(next_time) # (13)
except StopIteration:
del self.procs[proc_id] # (14)
else:
self.events.put(next_event) # (15)
else: # (16)
msg = '*** end of simulation time: {} events pending ***'
print(msg.format(self.events.qsize()))
-
The simulation
end_timeis the only required argument forrun. -
Use
sortedto retrieve theself.procsitems ordered by the key; we don’t care about the key, so assign it to_. -
next(proc)primes each coroutine by advancing it to the first yield, so it’s ready to be sent data. AnEventis yielded. -
Add each event to the
self.eventsPriorityQueue. The first event for each taxi is'leave garage', as seen in the sample run (Example 20). -
Zero
sim_time, the simulation clock. -
Main loop of the simulation: run while
sim_timeis less than theend_time. -
The main loop may also exit if there are no pending events in the queue.
-
Get
Eventwith the smallesttimein the priority queue; this is thecurrent_event. -
Unpack the
Eventdata. This line updates the simulation clock,sim_time, to reflect the time when the event happened.[15] -
Display the
Event, identifying the taxi and adding indentation according to the taxi ID. -
Retrieve the coroutine for the active taxi from the
self.procsdictionary. -
Compute the next activation time by adding the
sim_timeand the result of callingcompute_duration(…)with the previous action (e.g.,'pick up passenger','drop off passenger', etc.) -
Send the
timeto the taxi coroutine. The coroutine will yield thenext_eventor raiseStopIterationwhen it’s finished. -
If
StopIterationis raised, delete the coroutine from theself.procsdictionary. -
Otherwise, put the
next_eventin the queue. -
If the loop exits because the simulation time passed, display the number of events pending (which may be zero by coincidence, sometimes).
Linking back to [with_match_ch], note that the Simulator.run method in Example 23 uses else blocks in two places that are not if statements:
-
The main
whileloop has anelsestatement to report that the simulation ended because theend_timewas reached—and not because there were no more events to process. -
The
trystatement at the bottom of thewhileloop tries to get anext_eventby sending thenext_timeto the current taxi process, and if that is successful theelseblock puts thenext_eventinto theself.eventsqueue.
I believe the code in Simulator.run would be a bit harder to read without those else blocks.
The point of this example was to show a main loop processing events and driving coroutines by sending data to them. This is the basic idea behind asyncio, which we’ll study in [async_ch].
To wrap up, let’s discuss generic coroutine types.
|
Note
|
Feel free to skip the next section if coroutines, generic types and variance are too much for you right now. I personally found the combination a bit hard to digest. |
Generic Type Hints for Classic Coroutines
Back in [contravariant_types_sec],
I mentioned typing.Generator as one of the few standard library
types with a contravariant type parameter.
Now that we’ve studied classic coroutines, we are ready to
make sense of this generic type.
For generators that only yield values and are never sent any value other than None,
the recommended type for annotations is Iterator[T_co].
Despite its name, typing.Generator is really used to annotate classic coroutines
which not only yield values, but also accept values via .send() and also return
values through the StopIteration(value) hack.
T_co = TypeVar('T_co', covariant=True)
V_co = TypeVar('V_co', covariant=True)
T_contra = TypeVar('T_contra', contravariant=True)
# many lines omitted
class Generator(Iterator[T_co], Generic[T_co, T_contra, V_co],
extra=_G_base):
That generic type declaration means that a Generator
type hint requires three type parameters, as in this example:
my_coro : Generator[YieldType, SendType, ReturnType]
From the type variables in the formal parameters,
we see that YieldType and ReturnType are covariant,
but SendType is contravariant.
To understand why, consider that YieldType and ReturnType are "output" types.
Both describe data that comes out of the coroutine object—i.e.
the generator object when used as a coroutine object.
It makes sense that these are covariant,
because any code expecting a coroutine that yields floats
can accept a coroutine that yields integers.
That’s why Generator is covariant on its
YieldType parameter.
The same reasoning applies to the ReturnType
parameter—also covariant.
Using the notation introduced in [covariant_types_sec],
the covariance of the first and third parameters
is expressed by the parallel :> symbols:
float :> int
Generator[float, Any, float] :> Generator[int, Any, int]
YieldType and ReturnType are examples of the
first rule of [variance_rules_sec].
On the other hand, SendType is an "input" parameter:
it is the type of the argument for the send method of
the coroutine object.
Code that wants to send floats to a coroutine
cannot use a coroutine with int as the SendType
because int is not a supertype of float.
In other words, float is not consistent-with int.
But it can use a coroutine with complex as the SendType,
because complex is a supertype of float,
therefore float is consistent-with complex.
The :> notation makes the contravariance of the second parameter visible:
float :> int
Generator[Any, float, Any] <: Generator[Any, int, Any]
This is an example of the second Variance Rule of Thumb.
With this merry discussion of variance, we summarize what we saw in this post.
Summary
Guido van Rossum wrote there were three different styles of code you could write using generators:
There’s the traditional "pull" style (iterators), "push" style (like the averaging example), and then there are "tasks” (Have you read Dave Beazley’s coroutines tutorial yet?…).[17]
This post introduced classic coroutines used in "push style" and also as very simple "tasks”—the taxi processes in the simulation example. Native coroutines evolved from the classic coroutines as described here.
The running average example demonstrated a common use for a classic coroutine:
as an accumulator processing items sent to it.
We saw how a decorator can be applied to prime a coroutine,
making it more convenient to use in some cases.
But keep in mind that priming decorators are not compatible with some uses of coroutines.
In particular, yield from subgenerator() assumes the subgenerator is not primed,
and primes it automatically.
Accumulator coroutines can yield back partial results with each send method call, but they become more useful when they can return values, a feature that was added in Python 3.3 with PEP 380. We saw how the statement return the_result in a generator now raises StopIteration(the_result), allowing the caller to retrieve the_result from the value attribute of the exception. This is a rather cumbersome way to retrieve coroutine results, but it’s handled automatically by the yield from syntax introduced in PEP 380.
The coverage of yield from started with trivial examples using simple iterables, then moved to an example highlighting the three main components of any significant use of yield from: the delegating generator (defined by the use of yield from in its body), the subgenerator activated by yield from, and the client code that actually drives the whole setup by sending values to the subgenerator through the pass-through channel established by yield from in the delegating generator. This section was wrapped up with a look at the formal definition of yield from behavior as described in PEP 380 using English and Python-like pseudocode.
The discrete event simulation example showed how generators can be used as an alternative to threads and callbacks to support concurrency. Although simple, the taxi simulation gives a first glimpse at how event-driven frameworks like Tornado and asyncio use a main loop to drive coroutines executing concurrent activities with a single thread of execution. In event-oriented programming with coroutines, each concurrent activity is carried out by a coroutine that repeatedly yields control back to the main loop, allowing other coroutines to be activated and move forward. This is a form of cooperative multitasking: coroutines voluntarily and explicitly yield control to the central scheduler. In contrast, threads implement preemptive multitasking. The scheduler can suspend threads at any time—even halfway through a statement—to give way to other threads.
Further Reading
David Beazley is the ultimate authority on Python generators and coroutines. The Python Cookbook, 3E (O’Reilly) he coauthored with Brian Jones has numerous recipes with coroutines. Beazley’s PyCon tutorials on the subject are famous for their depth and breadth. The first was at PyCon US 2008: "Generator Tricks for Systems Programmers". PyCon US 2009 saw the legendary "A Curious Course on Coroutines and Concurrency" (hard-to-find video links for all three parts: part 1, part 2, part 3). His tutorial from PyCon 2014 in Montréal was "Generators: The Final Frontier," in which he tackles more concurrency examples—so it’s really more about topics in [async_ch] of Fluent Python. Dave can’t resist making brains explode in his classes, so in the last part of "The Final Frontier," coroutines replace the classic Visitor pattern in an arithmetic expression evaluator.
Coroutines allow new ways of organizing code, and just as recursion or polymorphism (dynamic dispatch), it takes some time getting used to their possibilities. An interesting example of classic algorithm rewritten with coroutines is in the post "Greedy algorithm with coroutines," by James Powell. You may also want to browse "Popular recipes tagged coroutine" in the ActiveState Code recipes database.
Paul Sokolovsky implemented yield from in Damien George’s super lean MicroPython interpreter designed to run on microcontrollers. As he studied the feature, he created a great, detailed diagram to explain how yield from works, and shared it in the python-tulip mailing list. Sokolovsky was kind enough to allow me to copy the PDF to this book’s site, where it has a more permanent URL.
As I write this, the vast majority of uses of yield from to be found are in asyncio itself or code that uses it. I spent a lot of time looking for examples of yield from that did not depend on asyncio. Greg Ewing—who penned PEP 380 and implemented yield from in CPython—published a few examples of its use: a BinaryTree class, a simple XML parser, and a task scheduler.
Brett Slatkin’s Effective Python, First Edition (Addison-Wesley) has an excellent short chapter titled "Consider Coroutines to Run Many Functions Concurrently".
That chapter is not in Effective Python, Second Edition, but fortunately it is still available online as a sample chapter.
Slatkin presents the best example of driving coroutines with yield from I’ve seen:
an implementation of John Conway’s Game of Life in which coroutines are used to manage the state of each cell as the game runs.
I refactored the code for the Game of Life example—separating the functions and classes that implement the game from the testing snippets used in Slatkin’s book original code.
I also rewrote the tests as doctests, so you can see the output of the various coroutines and classes without running the script. The refactored example is posted as a GitHub gist.
Other interesting examples of yield from without asyncio appear in a message to the Python Tutor list, "Comparing two CSV files using Python" by Peter Otten, and a Rock-Paper-Scissors game in Ian Ward’s "Iterables, Iterators, and Generators" tutorial published as an iPython notebook.
Guido van Rossum sent a long message to the python-tulip Google Group titled "The difference between yield and yield-from`" that is worth reading. Nick Coghlan posted a heavily commented version of the `yield from expansion to Python-Dev on March 21, 2009; in the same message, he wrote:
Whether or not different people will find code using
yield fromdifficult to understand or not will have more to do with their grasp of the concepts of cooperative multitasking in general more so than the underlying trickery involved in allowing truly nested generators.
Experimenting with discrete event simulations is a great way to become comfortable with cooperative multitasking. Wikipedia’s "Discrete event simulation" article is a good place to start.[18] A short tutorial about writing discrete event simulations by hand (no special libraries) is Ashish Gupta’s "Writing a Discrete Event Simulation: Ten Easy Lessons." The code is in Java so it’s class-based and uses no coroutines, but can easily be ported to Python. Regardless of the code, the tutorial is a good short introduction to the terminology and components of a discrete event simulation. Converting Gupta’s examples to Python classes and then to classes leveraging coroutines is a good exercise.
For a ready-to-use library in Python, using coroutines, there is SimPy. Its online documentation explains:
SimPy is a process-based discrete-event simulation framework based on standard Python. Its event dispatcher is based on Python’s generators and can also be used for asynchronous networking or to implement multi-agent systems (with both simulated and real communication).
Coroutines are not so new in Python but they were pretty much tied to niche application domains before asynchronous networking frameworks started supporting them, starting with Tornado.
The addition of yield from in Python 3.3 and asyncio in Python 3.4 raised awareness about classic coroutines until their main use case—asynchronous programming—was taken over by native coroutines in Python 3.5.
Classic coroutines are not obsolete, but they are now back to niche applications.
Differences between classic coroutines and native coroutines are the subject of Python native coroutines and send() on StackOverflow.
I still believe classic coroutines and yield from are worth studying if you want to understand how native coroutines and await actually support concurrency under the hood.
Also, if you want to develop asynchronous libraries—as opposed to applications—you’ll discover that the functions that do the actual work of I/O are generators and classic coroutines, even in asyncio.
Unfortunately, once you watch David Beazley’s tutorials and read his cookbook examples on the subject, there isn’t a whole lot of content out there that goes deep into programming with classic coroutines.
getgeneratorstate on itself, which is not useful.
yield from directly in the iPython console. It’s used to test asynchronous code and works with asyncio. It was submitted as a patch to Python 3.5 but was not accepted. See Issue #22412: Towards an asyncio-enabled command line in the Python bug tracker.
yield from was a good idea.
typing.Generator and other types that correspond to ABCs in collections.abc were refactored with a wrapper around the corresponding ABC, so their generic parameters aren’t visible in the typing.py source file. That’s why I refer to Python 3.6 source code here.