This document addresses Twisted's
implementation of Deferred objects in
twisted.internet.defer.Deferred
. It
assumes familiarity with the basics of event loops and
asynchronous programming.
Deferreds standardize callbacks
Callbacks are the lingua franca of asynchronous programming: any time you need to process the result of a non-blocking operation, you give that operation a callback for it to call when it has finished processing and has a result for you.
If you were implementing an asynchronous function from scratch, you might be tempted to define it like this:
1 2
def nonblocking_call(input, on_success, on_error): pass
The person using this code, then, would pass the functions he wanted called to this function at call time, like this:
1 2 3 4 5
def success_handler(result): print "Success: %s" % result def error_handler(error): print "Failure: %s" % str(error) nonblocking_call("input", success_handler, error_handler)
This works quite well for many simple cases, where you only need one success handler and one error handler, and the nonblocking call is a relatively one off function.
But what if you are Twisted and you have many nonblocking functions:
do you force every one of these functions to have a on_success
and on_error
parameter? What if you want to perform a calculation on the result
of the success_handler
: do you write all of the code into a bigger
success_handler
and increase the indentation?
Twisted's elegant solution to this problem is Deferreds. Since the nonblocking call doesn't have a meaningful return value anyway (remember, it's asynchronous; it can return before it has a result), we return a Deferred which you can attach callbacks to.
1 2 3
d = nonblocking_call("input") d.addCallback(success_handler) d.addErrback(error_handler)
The Deferred object doesn't do anything that you couldn't have done with the two callback parameters. This point is worth repeating: Deferred is an abstraction over callback parameters: it does nothing magical and is not, itself, asynchronous. It is a merely a standard: if a function returns a Deferred, you know that you are dealing with an asynchronous function, and you know exactly what its API for adding callbacks is.
Deferred
Callbacks
At its very simplest, the Deferred has a single callback attached to it, which gets invoked with the result as an argument when it becomes available:
Synchronous | Asynchronous |
---|---|
|
|
Errbacks
Error handling is an ever present concern in synchronous code. Deferred implements a system of errbacks in order to simulate Python try/except blocks. Just like in synchronous code, you always should register an errback in order to deal with an error gracefully.
Synchronous | Asynchronous |
---|---|
|
|
There are plenty of things going on here:
- Instead of being passed an exception object, which is roughly
analogous to the result in the no error case, you are passed a
twisted.python.failure.Failure
object. This is roughly a wrapper around the standardException
with a few crucial enhancements to make it useful in an asynchronous context. - Consequently, we pull out the real exception by using
failure.trap(UserError)
. This is the userland implementation ofexcept
; if the exception is not trapped, it gets re-thrown and our errback is bypassed. - You can trap multiple types of exceptions by simply calling trap
with multiple arguments, e.g.
failure.trap(UserError, OtherUserError)
Omitting the trap declaration is equivalent to a catch-all except block:
Synchronous | Asynchronous |
---|---|
|
|
Notice that in order to re-raise the exception, we simply return it from our errback handler. Deferred will notice that it is the type of a failure object, and act accordingly. You can also throw an exception and Deferred will handle it properly:
1 2 3 4 5 6
d = asynchronous_operation() def handle_twisted_error(failure): status = handle_error(failure.value) if not status: raise UserError d.addErrback(handle_twisted_error)
If you would like to re-raise the original error, it is preferred to use
failure.raiseException()
, which preserves traceback information
if available.
Failure has another convenience function, check()
, which
makes it easier to simulate multiple except
blocks:
Synchronous | Asynchronous |
---|---|
|
|
Callbacks and errbacks
In most cases, you'll want to perform some processing on the deferred result as well as have error handling. As you may have guessed, this simply means you should define both a callback and an errback.
Synchronous | Asynchronous |
---|---|
|
|
Notice that in the synchronous version, process
is inside the try..except block. This translates over
to the asynchronous code: if
process
throws an exception, handle_twisted_error
will get a Failure
object corresponding to that
exception. The errback could handle either an error
from the asynchronous operation or from our callback.
Why does this happen? This is because
Deferreds chain callbacks.
Chaining callbacks
A common pattern in programs is the notion of one function returning an intermediate result, which gets passed to another function to calculate a further result, and so forth. Such a chain of data processing entities is called a pipeline, and Deferreds are ideally suited for modeling them.
Synchronous | Asynchronous |
---|---|
|
|
This behavior makes the name addCallback
slightly misleading, since each of these
callbacks will get a different result. If you would like to multiplex
(have multiple callbacks handle the same result), you should code
this directly into your callback:
Synchronous | Asynchronous |
---|---|
|
|
Errbacks work similarly, but instead of pipelining values through multiple functions, they create nested try..except blocks:
Synchronous | Asynchronous |
---|---|
|
|
Now, we can do tricky things with chaining callbacks and errbacks. The following code makes it possible for the errback function to gracefully provide the result of the computation, even though it failed (perhaps from a cache).
Synchronous | Asynchronous |
---|---|
|
|
This code introduces a new function: addCallbacks
, which
adds both a callback and an errback. Unlike adding them individually, if
the callback errors, the errback will not receive the error, and if the
errback returns a valid result, the callback will not receive it. They
are completely isolated from each other.
Synchronous | Asynchronous |
---|---|
|
|
Let's stick our hand inside the black box and see what actually is happening. The order in which we add callbacks and errbacks is obviously influencing the end behavior. Here's why:
Internally, Deferred stores callbacks and errbacks in a list of
callback/errback tuples. When you call addCallback
or addErrback
, you are not adding a callback/errback
to separate stacks; instead, Deferred wraps your callback into a
tuple (substituting a "pass through" function for the missing
callback/errback) and sticks this on the callback/errback tuple list.
The result from the asynchronous function will either
be a Failure
object, or some
other Python value. If it is the former, Deferred will call
your errback function in the tuple with the result; the latter will result
in a call to the callback function in the tuple. The function
call itself can result in two end results, another failure (either
by returning a Failure object or by raising an Exception) or
a regular Python value. Deferred will then move to the next
tuple and repeat until there are no more tuples left.
Take the following code as an example:
1 2 3 4 5
d = asynchronous_operation() d.addCallback(callback1) # tuple 1 d.addCallback(callback2) # tuple 2 d.addErrback(errback3) # tuple 3 d.addCallbacks(callback4, errback4) # tuple 4
Consider two possible scenarios. First, success:
- The asynchronous operation succeeds with a result of
"Foo"
. - No failure. We give
"Foo"
to the callback of tuple 1,callback1
. It returns("Foo", 123)
. - No failure. We give
("Foo", 123)
to the callback of tuple 2,callback2
. It returns"Foo123"
. - No failure. We give
"Foo123"
to the callback of tuple 3, which happens to be a pass through function. It returns"Foo123"
. - No failure. We give
"Foo123"
to the callback of tuple 4,callback4
. It does something, but the return value is not given to anyone.
What about failure?
- The asynchronous operation fails, and a Failure object is constructed.
- Failure. We give the failure object to the errback of tuple 1, which happens to be pass through function. It returns the failure object.
- Failure. We give the failure object to the errback of tuple 2, which is also a pass through function. It returns the failure object.
- Failure. We give the failure object to the errback of tuple 3,
errback3
. It acknowledges and logs the error. It doesn't return anything. - No failure (remember,
None
is a valid result value!) We giveNone
to the callback of tuple 4,callback4
.
Think of your callback/errback chains as parallel
pipes of execution, which could transfer to one another at any point. As
a parting word, here is a use of one convenience function, addBoth
.
Synchronous | Asynchronous |
---|---|
|
|
The lambda is simply a convenient way to avoid passing x
to
clean()
(lest Python raise a TypeError
).
Fluent interface
Deferred implements a fluent interface for adding callbacks, where the
return value of addCallback
, addErrback
or any
other similar method is the object itself (return self
). This means you can
write this:
1 2
d = asynchronous_operation() d.addCallback(f1).addCallback(f2).addCallback(f3)
which is equivalent to:
1 2 3 4
d = asynchronous_operation() d.addCallback(f1) d.addCallback(f2) d.addCallback(f3)
Use of this style is a matter of taste and consistency.
Chaining Deferreds
All of the examples, to this point, have been focused around a single asynchronous operation, and the synchronous post-processing of that operation. However, in the real world, you will often need to run multiple asynchronous operations, one after the other. For example, if you make an HTTP request, and find out that the request is a redirect, you need to make another (asynchronous) HTTP request.
Our code, then, is fatally hobbled if we can't easily chain deferreds together. With the framework we setup previously, we could implement something along the lines of having the callback call the next asynchronous function, and then setup the callbacks on the deferreds that function returned.
Synchronous | Asynchronous |
---|---|
|
|
But we just spent the first section explaining our wonderful system
of multiple callbacks and errbacks and, as you might notice, there isn't
actually a way to get chain
to return the value of process in this
example without making it synchronous.
To make this work, Twisted does something special: it lets callbacks return Deferred, and treats it to mean, "this callback doesn't have the answer yet, but when this Deferred fires it will!"
Synchronous | Asynchronous |
---|---|
|
|
Written a little more explicitly (in case you're still squeamish about higher order functions), the asynchronous code is equivalent to this:
1 2 3 4 5
d = asynchronous_operation_a() def chain(result): return asynchronous_operation_b(result) d.addCallback(chain) d.addCallback(process)
Here is the mantra: Callbacks and errbacks can return deferreds.
Chaining in Pictures
We're now going to introduce some visual aids to see how you can use deferred chaining to modify program flow. We'll represent Deferred objects as "pipes," that is, a series of callbacks that take some input, process it in turn, and then return some output.
This is a Deferred that we instantiated from scratch; it doesn't do anything and unless we explicitly call it, it will never run (in the next section, Composing Deferreds, we will see why Deferred objects like this can be useful). In many other cases, the function we called to get this deferred object promised to call back at some point: we'll represent as the red text "Asynchronous Code". This code provides the input A that gets the ball rolling.
Under normal circumstances, C simply falls off into oblivion; no other code cares about it!
Now, suppose that the asynchronous code finishes its job and calls the Deferred. While processing this value, Callback 1 returns a Deferred B instead of an actual B, indicating, "No wait, the value isn't ready yet!"
We can't just pass Deferred B to Callback 2, since it's expecting a B, not a deferred. How do we get B out of Deferred B? Well, recall what Deferred B looks like:
There are a few comments to be made about this deferred: first off, it's a fully formed deferred object that some other asynchronous code, Asynchronous Code for B, has promised to call back with a result. However, this result in this example isn't actually B; it's us B''. You can imagine this as some precursor value for B that needs to go through Callback 1' and Callback 2' before it becomes B. We've used the prime symbol (') in order to distinguish Callback 1 from Callback 1'; they are distinct and may be completely different functions.
By now, the words "chain" and the arrow labeled B probably have given you some idea how to reincorporate Deferred B into the original deferred. Sure enough, we simply plug it in.
(We've omitted Callback 1 from the diagram for the interest of brevity; it is now inaccessible and non-existent for the purposes of finishing processing.) The evaluation proceeds as normal. Note that any of the callbacks in our new chained Deferred can return a deferred and repeat this process.
One last comment: something interesting has happened to the value that comes out of the last callback: for Deferred B, it was actually used! Chaining deferreds means that we care about the ultimate end result of our callback chain.
Dependencies
Well written, maintainable callbacks maintain "contracts" with respect to their behavior. Any given callback should have a well-defined value it takes and a well-defined value that it returns. This is good sense that applies not only to callbacks but also to functions.
We've now added a slight twist to this, in that any callback can return the value that it is contractually obligated to supply, or it can promise to return to the value in the form of a Deferred. (Imagine if you could get away doing this in real life!) And, in the process of fulfilling that promise, you discover you need to do another asynchronous request. Something has just happened: you're resolving a dependency chain.
[ here goes an example with actual running Twisted code in three steps. Pictures of how the "callback" chain looks like as we discover more and more dependencies should be supplied ]
Looping
A common form of dependency is needing to perform the asynchronous
operation all over again. The canonical example of this an HTTP
redirect: when the callback for a deferred from a page request is
returned, it could be the result, or it could be an empty body
with the Location
HTTP header set, in which case
you simply perform the operation over again.
[ here is the HTTP redirect example. It also should have pictures. ]
Lambdas
We now take this opportunity to remind you that chaining deferreds often results in the creation of lots of little functions to shuffle the result of one operation to the next asynchronous function. Sometimes you can be clever and pass the asynchronous function itself as a callback, but this only works if the next asynchronous function takes a single parameter, and that parameter is the result of the previous computation.
In simple cases, you may want to use a lambda to move a parameter
around, or partially apply a function. Suppose we have an asynchronous
function send_message(value, type)
, and we know that
in our code type should equal 2
, then:
Without lambdas | With lambdas |
---|---|
|
|
Composing Deferreds
Chaining deferred dealt with sequential computation: each successive asynchronous operation required the result of the previous computation in order to run. But we could have done this very easily synchronously: asynchronous execution shines when we want to perform computations in parallel. But parallelizing computations results in some questions: when is a parallel computation complete? How do I treat these parallel computations as a single unit?
The answer is composition, that is, we can combine deferreds together into a single deferred. As it turns out, Twisted has some built-in facilities for doing this.
The implementation of a DeferredList
Consider a Deferred that would only fire after some other number of Deferreds fired.
1 2 3 4
class FireWhenAllFinish(Deferred): def __init__(self, deferreds): super(FireWhenAllFinish, self).__init__() self.deferreds = deferreds
We start off with a logical constructor for our class: a simple
list of the Deferred objects we want to finish before this Deferred
fires. Recall that we need to setup callbacks in each Deferred in
deferreds
to tell us when they've finished. Thus:
1 2 3 4 5 6 7 8 9 10 11
class FireWhenAllFinish(Deferred): def __init__(self, deferreds): super(FireWhenAllFinish, self).__init__() self.deferreds = deferreds for d in self.deferreds: self.addCallbacks(self._cbDeferred, self._ebDeferred) self.addErrback(self._ebDeferred) def _cbDeferred(self, result): raise NotImplemented def _ebDeferred(self, failure): raise NotImplemented
Now, for the definition of _cbDeferred
, after a little
thought, and the knowledge that callback()
and
errback()
are the
methods you can use to fire a deferred (it's what
asynchronous_operation()
would have called behind
the veil), a relatively simple implementation comes to mind:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
class FireWhenAllFinish(Deferred): def __init__(self, deferreds): super(FireWhenAllFinish, self).__init__() self.deferreds = deferreds self.finishedCount = 0 if not self.deferreds: self.callback() for d in self.deferreds: self.addCallbacks(self._cbDeferred, self._ebDeferred) def _cbDeferred(self, result): self.finishedCount += 1 if self.finishedCount == len(self.deferreds): self.callback() def _ebDeferred(self, failure): if not self.called: # this property is True if callback()/errback() has already been called self.failed = True self.errback()
There are two gotchas: the first is that if there were no
deferreds passed into this deferred, we should automatically fire
our callback; after all, we're not waiting on anything thing.
The second is that callback()
and errback()
must only be called (between the two of them) once, so we manually
guard for this by checking if self.called
is False
before making the errback call (why such a check is unnecessary for
callback call is left as an exercise for the reader.)