Monday, September 5, 2011

Hello, Mr. Hooker.

I've been procrastinating on cmonster. I have some nasty architectural decisions to make, and I keep putting it off. In the mean time I've been working on a new little tool called "Mr. Hooker" (or just mrhooker).

Introducing Mr. Hooker

The idea behind mrhooker is very simple: I wanted to be able to write LD_PRELOAD hooks in Python. If you're not familiar with LD_PRELOAD, it's a mechanism employed by various UNIX and UNIX-like operating systems for "preloading" some specified code in a shared library. You can use this to provide your own version of native functions, including those in standard libraries such as libc.

Anyway, I occasionally find the need for an LD_PRELOAD library to change the behaviour of a program that I can't easily recompile. Often these libraries will be throw-away, so it might end up taking just as long to write the LD_PRELOAD library. So I wrote mrhooker to simplify this.

It turns out there's very little to do, since Cython (and friends) do most of the hard work. Cython is a programming language that extends Python to simplify building Python extensions. It also has an interface for building these extensions on-the-fly. So mrhooker doesn't need to do much - it takes a .pyx (Pyrex/Cython source) and compiles it to a shared library using Cython. Mrhooker takes this, and some common code, and loads it into a child process using LD_PRELOAD.

Example - Hooking BSD Sockets


Let's look at an example of how to use mrhooker. Hooks are defined as external functions in a Cython script. Say we want to hook the BSD sockets "send" function. First we'd find the signature of send (man 2 send), which is:

ssize_t send(int sockfd, const void *buf, size_t len, int flags);

Given this, we can produce a wrapper in Cython, like so:

cdef extern ssize_t send(int sockfd, char *buf, size_t len, int flags) with gil:
    ...

There's a couple of important things to note here. First, the parameter type for "buf" drops const, since Cython doesn't know about const-ness. Second, and crucially, the function must be defined "with gil". This ensures that the function acquires the Python Global Interpreter Lock before calling any Python functions. Okay, with that out of the way, let's go on...

We'll want to do something vaguely useful with this wrapper. Let's make it print out the argument values, and then continue on with calling the original "send" function. To do that we'll use dlsym/RTLD_NEXT to find the next function called "send".

cdef extern ssize_t send(int sockfd, char *buf, size_t len, int flags) with gil:
    print "====> send(%r, %r, %r, %r)" % (sockfd, buf[:len], len, flags)
    real_send = dlsym(RTLD_NEXT, "send")
    if real_send:
        with nogil:
            res = (<ssize_t(*)(int, void*, size_t, int) nogil>real_send)(
                sockfd, buf, len, flags)
        return res
    else:
        return -1

We'll also need to declare dlsym and RTLD_NEXT. Let's do that.

# Import stuff from <dlfcn.h>
cdef extern from "dlfcn.h":
    void* dlsym(void*, char*)
    void* RTLD_NEXT

Now you just run:

mrhooker <script.pyx> <command>


And there we go. This is trivial - it would also be fairly trivial to write a C program to do this. But if we wanted to do anything more complex, or if we were frequently changing the wrapper function, I'd much rather write it in Python - or Cython, as it were.

Enjoy!


Edit: I just noticed that it's broken if you don't have a certain config file. I always had one while testing... until I got to work.
You'll get an error "ConfigParser.NoSectionError: No section: 'default'". I'll fix the code at home, but in the mean time you can do this:

$ mkdir ~/.mrhooker
$ echo [default] > ~/.mrhooker/mrhooker.config

P.S. if you add "build_dir = <path>" in that section, or a per-module section, mrhooker/Cython will store the shared library that it builds. Then if you don't change the source it'll be used without rebuilding.