Async Batching with Twisted: A Walkthrough

Posted on June 20, 2008 by oubiwann


Blog post image

While drafting a Divmod announcement last week, I had a quick chat with a dot-bomb-era colleague of mine. Turns out, his team wants to do some cool asynchronous batching jobs, so he's taking a look at Twisted. Because he's a good guy and I like Twisted, I drew up some examples for him that should get him jump-started. Each example covered something in more depth that it's predecessor, so is probably generally useful. Thus this blog post :-)

I didn't get a chance to show him a DeferredSemaphore example nor one for the Cooperator, so I will take this opportunity to do so. For each of the examples below, you can save the code as a text file and call it with "python filname.py", and the output will be displayed.

These examples don't attempt to give any sort of introduction to the complexities of asynchronous programming nor the problem domain of highly concurrent applications. Deferreds are covered in more depth here and h ere. However, hopefully this mini-howto will inspire curiosity about those :-)


Example 1: Just a DefferedList
This is one of the simplest examples you'll ever see for a deferred list in action. Get two deferreds (the getPage function returns a deferred) and use them to created a deferred list. Add callbacks to the list, garnish with a lemon.


Example 2: Simple Result Manipulation
< br />Here, we mix things up a little bit. Instead of doing processing on all the results at once (in the deferred list callback), we're processing them when the page callbacks fire. Our processing here is just a simple example of getting the length of the getPage deferred result: the HTML content of the page at the given URL.


Example 4: Results with More Structure
< br />After all this playing, we start asking ourselves more serious questions, like: "I want to decide which values show up in my callbacks" or "Some information that is available here, isn't available there. How do I get it there?" This is how :-) Just pass the parameters you want to your callback. They'll be tacked on after the result (as you can see from the function signatures).

In this example, we needed to create our own deferred-returning function, one that wraps the getPage function so that we can also pass the URL on to the callback.


Example 6: Adding Some Error Checking
< br />As we get closer to building real applications, we start getting concerned about things like catching/anticipating errors. We haven't added any errbacks to the deferred list, but we have added one to our page callback. We've added more URLs and put them in a list to ease the pains of duplicate code. As you can see, two of the URLs should return errors: one a 404, and the other should be a domain not resolving (we'll see this as a timeout).


Example 7: Batching with DeferredSemaphore

This is the last example for this post, and it's is probably the most arcane :-) This example is taken from JP's blog post from a couple years ago. Our observation in the previous example about the way that the deferreds were created in the for loop and how they were run is now our counter example. What if we want to limit when the deferreds are created? What if we're using deferred semaphore to create 1000 deferreds (but only running them 50 at a time), but running out of file descriptors? Cooperator to the rescue.

This one is going to require a little more explanation :-) Let's see if we can move through the justifications for the strangeness clearly:
  1. We need the deferreds to be yielded so that the callback is not created until it's actually needed (as opposed to the situation in the deferred semaphore example where all the deferreds were created at once).
  2. We need to call doWork before the for loop so that the generator is created outside the loop. thus making our way through the URLs (calling it inside the loop would give us all four URLs every iteration).
  3. We removed the result-processing callback on the deferred list because coop.coiterate swallows our results; if we need to process, we have to do it with pageCallback.
  4. We still use a deferred list as the means to determine when all the batches have finished.
This example could have been written much more concisely: the doWork function could have been left in test as a generator expression and test's for loop could have been a list comprehension. However, the point is to show very clearly what is going on.

I hope these examples were informative and provide some practical insight on working with deferreds in your Twisted projects :-)

Author oubiwann
Date June 20, 2008
Time 02:08:08
Category
Tags asynchronous concurrency deferreds development howtos programming methodologies twisted
Line Count 1
Word Count 1302
Character Count 9673

Comments?
This blog doesn't use standard (embedded) comments; however, since the site is hosted on Github, if there is something you'd like to share, please do so by opening a "comment" ticket!