Back to blogging in 2020!

Make Computer Go

We can start off my telling our fine audience that Adam is typing this while Kathleen does the dishes. It might be a little bit unfair, gender-wise, but Adam did all the cooking.

Additionally, the goal of this blog post is to expose my pattern of what I use under the hood of a lot of my research projectS. What is that, you ask?

I like to make web pages that trigger crazy computations behind the scenes. The gist: how do you make a web page trigger math in the form of some C++ code behind the scenes.

So, I don't know what you're writing, and not seeing it is pretty weird. I don't intend to write stream of consciousness blog post, so this may be pretty silly.

Step 1: Make a web server out of pythons using Twisted (this is step 1?). Try it out and hit up http://localhost:8080/random

Why Twisted? Four years ago, Adam heard that @progrium was using Twisted for cool stuff, and now Kathleen uses it for basically everything. If you, dear reader, have a suggestion of something better, I/we would like to hear it (we don't get to talk to web people often!).

Oh, a kitty! (+Mpeg & Jpeg The Cats).

Step 2: You need the thing that does the computation. For me, in my research, this means a new photo gets processed and added to a 3D model, or it a face gets detected in the photo and its features are extracted and it is added to a predictive model. This is the sort of thing I don't want to write in some scripting language (e.g. PHP), I want to use my mathy libraries far away from my web serving logic. I want to be able to launch these jobs on our cluster and not deal with web requests in the same process.

For the purposes of this post (and me making a template for using later), the mathy task is computing a random number and returning it (OpenCV comes later). So, let's look at the code here (again):

Step 3: I have my web server (the python script) running. It listens for normal HTTP requests, but, because we're in Twisted, we can also listen for requests on other wild sockets. Because I know how to do socket programming in C++ (and bash, yo, /dev/tcp), we're gunna go with it. The actual step of step 3 is making this happen: have the C++ dial up the web server.

Our mathy worker is going to function like a client; it'll initiate a connection to the main server. Once connected, the server will know that it can shunt requests for work down to the worker through the connection it made.

The challenge here is speaking sockets in both languages (Python/C++).

Some of the fun trickiness that I discovered, and that Adam help sort out, involved Deferreds in Twisted (event callback thingies, to be precise). Earlier, I had the worker talking to the server, but this was totally separated from the web page serving logic.

In the WorkerManager (a piece in Python), I create a "job" (instance of Deferred) and put it into a queue. When a response comes in from the worker over the socket, I dequeue the oldest job and post a callback on it. This might be different than what you normally think of as an (air quotes) job queue. It's actually just a Python list of things that I enqueued that happened to use a variable name called "job".

If you look at the main server, the way that we use this WorkerManager thing is with "inlineCallbacks". If you haven't seen it before, inlineCallbacks ... Well, without it, writing a lot of chained callback logic in Twisted quickly leads to spaghetti code (Adam notes: node.js folks know what this is like). With inlineCallbacks, the flow of a single function gets logically paused at any "yield" point, waiting to be resumed when a callback is made. This really makes my page serving logic easy to read and write.

If you were writing Twisted without IC, you'd have all these deferreds floating around, getting triggered, getting handled in various way, and this is the crap that I deal with in ActionScript (oh, sick burn), so it's really easy for things to happen out of order. If you want to force things to happen in order, and wait for the event before them to finish, you can use IC to write code in the order you expect it to run with these "yield" statements.

The way I put this all together is... First, I make a Resource (part of Twisted's web framework) to handle requests to certain pages. In the logic for the render method, I use this sort of trick

def render_GET(self, request):
    def _process():
        numbar = yield workerManager.doWork("random")
    return server.NOT_DONE_YET

(I feel weird about putting the decorator on the render method itself, so I apply it to a little internal helper function.)

Some of the logic I want to launch finishes quickly (fast enough that I might want to keep the HTTP connection open until the end), but some other logic is best simply launched and checked via polling other pages. In the case of the little random number generator, we have the page rendering method effectively block until the worker has finished its measly task.

If it didn't matter that I got a response from the worker right away, I wouldn't have used the callback trick in the Resource at all. This is a totally new concept: there are different length of concepts that I want my worker to do. Some of those are fast (say, under a second), and others are minutes to hours. The only thing I want to do with most web requests is to start a job. So maybe the random number example is misleading.

(I'm doing some dishes right now, with pink gloves.)

First, I think this post needs some more pictures, or else nobody is going to read it.

Two, to wrap up, if you ever trigger complicated computation from your web request handlers... rather, if you have previously solved this problem in another way, tell me about it. Tell me how this works for you.

I have a little bit of the Impostor Syndrome where I don't know if I'm doing this Right. So... I'm just going to do it my way until I hear of something better.


(And, with that, Kathleen has cleaned all the dishes except the cast-iron pan.)

dishes and dictation

bloggin' and typin'


Post a Comment