The root of most of the problems is that Google App Engine is stateless in nature. Server instances can be spun up or spun down without notice, and so we can't store complex state, which Pushy really requires. So a couple of weeks ago I set to investigating implementing a server-initiated RPC mechanism that is asynchronous and (mostly) stateless.
How would it work? Well, earlier this year I read that ProtoRPC was released, which brought RPC services to Google App Engine. In our case, Google App Engine is the client, and is calling the agent - but we can at least reuse the API to minimise dependencies and hopefully simplify the mechanism. Okay, so we have a ProtoRPC service running on a remote machine, consumed by our Google App Engine application. How do they talk?
One thing I wanted to avoid was the need for polling, as that's both slow and expensive. Slow in that there will necessarily be delays between polls, and expensive in that unnecessary polls will burn CPU cycles in Google App Engine, which aren't free. Long-polling isn't possible, either, since HTTP requests are limited to 30 seconds of processing time. If you read my last post, you probably already know what I'm going to say: we'll use XMPP.
What's XMPP? That's the Extensible Messaging and Presence Protocol, which is the protocol underlying Jabber. It is also the primary protocol that Google Talk is built on. It's an XML-based, client-server protocol, so peers do not talk directly to each other. It's also asynchronous. So let's look at the picture so far...
- The client (agent) and server (GAE application) talk to each other via XMPP.
- The agent serves a ProtoRPC service, and the GAE application will consume it.
Because our RPC mechanism will be server-initiated, we'll need something else: agent availability discovery. Google App Engine provides XMPP handlers for agent availability (and unavailability) notification. When an agent starts up it will register its presence with the application. When agent is discovered, the application will request the agent's service descriptor. The agent will respond, and the application will store it away in Memcache.
We (ab)use Memcache for sharing of data between instances of the application. When you make enough requests to the application, Google App Engine may dynamically spin up a new instance to handle requests. By storing the service descriptor in Memcache, it can be accessed by any instance. I said abuse because Memcache is not guaranteed to keep the data you put in it - it may be expelled when memory is constrained. Really we should use Datastore, but I was too lazy to deal with cleaning it up. "Left as an exercise for the reader." One thing I did make a point of using was to use the new Python Memcache CAS API, which allows for safe concurrent updates to Memcache.
Orrite. So now we have an agent and application which talk to each other via XMPP, using ProtoRPC. The application discovers the agent, and, upon request, the agent describes its service to the application. How can we use it? Well the answer is really "however you like", but I have created a toy web UI for invoking the remote service methods.
Wot 'ave we 'ere then? The drop-down selection has all of the available agent JIDs (XMPP IDs). The textbox has some Python code, which will be executed by the Google App Engine application. Yes, security alert! This is just a demonstration of how we can use the RPC mechanism - not a best practice. When you hit "Go!", the code will be run by the application. But before doing so, the application will set a local variable "agent", which is an instance of the ProtoRPC service stub bound to the agent selected in the drop-down.
ProtoRPC is intended to be synchronous (from the looks of the comments in the code, anyway), but there is an asynchronous API for clients. But given that application requests can only take up to 30 seconds to service a request, our application can't actively wait for a response. What to do? Instead, we need to complete the request asynchronously when the client responds, and convey some context to the response handler so it knows what to do with it.
In the demo, I've done something fairly straight forward with regards to response handling. When the UI is rendered, we create an asynchronous channel using the Channel API. We use this to send the response back to the user. So when the code is executed, the service stub is invoked, and the channel ID is passed as context to the client. When the client responds, it includes the context. Once again, security alert. We could fix security concerns by encrypting the context to ensure the client doesn't tamper with it. Let's just assume the client is friendly though, okay? Just this once!
So we finally have an application flow that goes something like this:
- Agent registers service.
- Server detects agent's availability, and requests client's service descriptor.
- Client sends service descriptor, server receives and stores it in Memcache.
- User hits web UI, which server renders with a new channel.
- User selects an agent and clicks "Go!".
- Server instantiates a service stub, and invokes it with the channel ID as context. The invocation sends an XMPP message to the agent.
- Agent receives XMPP message, decodes and executes the request. The response is sent back to the server as an XMPP message, including the context set by the server.
- The server receives the response, and extracts the response and channel ID (context). The response is formatted and sent to the channel.
I've put my code up on GitHub, here: http://github.com/axw/gaea. Feel free to fork and/or have a play. I hope this can be of use to someone. If nothing else, I've learnt a few new tricks!