C Bindings – Python extensions

This is my third post on C extensions for other languages. I started with Lua, and my second post was about creating a C library for Java. This time I’m going to display how this works for Python.

Why would I want to do this? I already explained it in the previous posts, but to sum it up: You get all the advantages of C – Better performance, smaller footprint and direct access to platform resources. But with the cost of losing the platform independency that these higher-level languages (/scripts) have.

Python has an extensive documentation,and also specifically on its C API. You can find it all here.

As with the previous examples, I will export my “Unix Domain Sockets” client library to Python. And of course, the code for this post is available on github as well. The library I wrote is compatible with Python 2.7, instructions on how to port python C extensions from v2.7 to v3 (and above) are found here.

Building a dynamic library for Python

You don’t use a makefile in order to build your library (although it can be done, probably not recommended). Instead, you create a Python script named “setup.py” where you specify, using Python classes, the information you usually write in your makefile. For example – what are the source files, which libraries you use, libraries locations, etc.
You can also define a different compiler (the default is gcc), if your library is for a different platform then the build machine.

This is my setup.py:

from distutils.core import setup, Extension

UnixSockets = Extension('UnixSockets',
                        sources = ['src/python/unixsockets.c'],
                        library_dirs = ['unix_sockets/'],
                        libraries = ['client'],
                        include_dirs = ['unix_sockets/src/include/'])

setup(name = 'UnixSockets',
      version = '1.0',
      description = 'Unix sockets library for python',
      ext_modules = [UnixSockets])

As you can see, the parameters for “Extension” really do remind a makefile content. This class defines what will be compiled (and linked) and how. The “setup” class defines the build output.
This is just a simple example, more information on creating the setup script can be found here. And these are distutils.core API specifications.

Some other definitions go into setup.cfg, I used this file in order to define the build location. More information about this file – Here.

Using setup.py, building our extension becomes very easy:

python setup.py build

And installing it is not harder than that:

sudo python setup.py install

Implementation

Just exposing the C API (open/close/read/write) to Python is too easy, and lack any architectonic sophistication – So what’s the fun about that?
Instead, I wanted to create a new Python class, which opens a socket on creation, and closes it once it is disposed by the interpreter’s garbage collector.

Defining a new class

This is how our Python object looks like:

typedef struct {
    PyObject_HEAD
    int fd;
} socket;

All Python classes must “inherit” (a C-style inheritance) from PyObject, hence the _HEAD macro. But this is not the whole class, we need some more definition in order to make this a new Python type. I see this struct just as an object prototype.

This is how we define a new type in Python:

static PyTypeObject socketType = {
    PyObject_HEAD_INIT(NULL)
    0,                         /*ob_size*/
    "UnixSockets.Socket",      /*tp_name*/
    sizeof(socket),            /*tp_basicsize*/
    0,                         /*tp_itemsize*/
    (destructor)socketDealloc, /*tp_dealloc*/
    0,                         /*tp_print*/
    0,                         /*tp_getattr*/
    0,                         /*tp_setattr*/
    0,                         /*tp_compare*/
    0,                         /*tp_repr*/
    0,                         /*tp_as_number*/
    0,                         /*tp_as_sequence*/
    0,                         /*tp_as_mapping*/
    0,                         /*tp_hash */
    0,                         /*tp_call*/
    0,                         /*tp_str*/
    0,                         /*tp_getattro*/
    0,                         /*tp_setattro*/
    0,                         /*tp_as_buffer*/
    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /*tp_flags*/
    "Unix socket",             /* tp_doc */
    0,                         /* tp_traverse */
    0,                         /* tp_clear */
    0,                         /* tp_richcompare */
    0,                         /* tp_weaklistoffset */
    0,                         /* tp_iter */
    0,                         /* tp_iternext */
    unixSocketsMethods,        /* tp_methods */
    0,                         /* tp_members */
    0,                         /* tp_getset */
    0,                         /* tp_base */
    0,                         /* tp_dict */
    0,                         /* tp_descr_get */
    0,                         /* tp_descr_set */
    0,                         /* tp_dictoffset */
    (initproc)socketInit,      /* tp_init */
    0,                         /* tp_alloc */
    socketNew,                 /* tp_new */
};

There’s a lot going on here. we define here the class name (“UnixSockets.socket”), constructor, destructor, initializer, and the class methods (unixSocketMethods).
This struct is registered as a Python type in the main function of the module (I will show this later).

Allocator and initializer

These are too different functions. Basically, in normal usage, both will be called when we create a new object, but while the allocator is called only once, upon creation, the initializer is actually the class’s __init__ function, which can be called independently (in some circumstances), and this needs to be taken into consideration – For example, in my code I prevent calling __init__ while the socket is already open (Another way to handle this is to close the socket and re-open it with the new file name).

The allocator

static PyObject *socketNew(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    socket *self;

    self = (socket *)type->tp_alloc(type, 0);
    if (self == NULL)
        return NULL;

    self->fd = 0;

    return (PyObject *)self;
}

This is a simple example of an allocator. We don’t care about the input arguments. We only allocate a new “socket” struct (using an allocator which is implemented by Python), and if the allocation goes well, we initialize the fd and return the allocated struct. This allocated struct will be passed to us later, on each method, as “self”.

The initializer

static int socketInit(socket *self, PyObject *args, PyObject *kwds)
{
    const char *socket_name;

    if (self->fd != 0) {
        PyErr_SetString(PyExc_BaseException,
                        "Cannot initialize an open socket");
        return -1;
    }

    if (!PyArg_ParseTuple(args, "s", &socket_name))
        return -1;

    self->fd = socket_open(socket_name);
    if (self->fd <= 0) {
        PyErr_SetFromErrno(PyExc_IOError);
        return -1;
    }

    return 0;
}

As I explained before, in order to prevent the user from trying to re-open an opened socket, I check if the file descriptor is non-zero. If it is, I raise an exception. There is an extensive API for exceptions handling,  and there’s a list of exception types you can throw.

Our initializer takes one parameter – The socket’s file name, if the parsing of the input tuple fails, the we return -1. Usually, when returning a error value (-1), we must first set an exception, as we do when the fd is not zero, or when socket_open fails. But PyArg_ParseTuple already takes care of this for us.

If socket_open fails, we set the exception message directly from errno, so the user can know exactly what did go wrong.

Deallocator and clear

“tp_clear” is not required in this type, since we don’t have any inner objects which can be visited – I preferred not to expose “fd”, because its value has no meaning at the Python level, but I could have made it a PyObject too, and so allow access to it. In such case, I must implement “tp_traverse” and “tp_clear”, and also add “Py_TPFLAGS_HAVE_GC” to tp_flags (more on this issue – here).

Deallocating is pretty simple:

static void socketDealloc(socket* self)
{
    if (self->fd > 0)
        socket_close(self->fd);

    self->ob_type->tp_free((PyObject*)self);
}

If the file descriptor is valid, then we close the socket, and then we use Python-provided (default) free method.

Class methods

Our class has two methods – “write” and “read” (“open” and “close” are already integrated into the constructor/destructor). If you look at our type’s definition, the tp_methods field is set to “unixSocketsMethods”, and here is how it is defined:

static PyMethodDef unixSocketsMethods[] = {
    { "write", (PyCFunction)socketWrite, METH_VARARGS, "Write to a Unix domain socket" },
    { "read", (PyCFunction)socketRead, METH_NOARGS, "Read from a Unix domain socket" },
    { NULL, NULL, 0, NULL }
};

We declare the method name, and define its callback. Then we define how the function is called, and give a description for the doc.

Write

static PyObject *socketWrite(socket *self, PyObject *args)
{
    const char *str;
    int res, len;

    if (!PyArg_ParseTuple(args, "s#", &str, &len))
        return NULL;

    res = socket_write(self->fd, str, len);
    if (res <= 0) {
        PyErr_SetFromErrno(PyExc_IOError);
        return NULL;
    }

    return Py_BuildValue("i", res);
}

As we declared this method, write takes one parameter – a string, from a tuple. We parse it into a string and its length, and write it to the socket. Upon failure, we set an exception and return NULL – This is how Python knows that the function fails and look for the exception. When we return NULL, we must also set an exception.
If succeeded, we return the number of bytes sent (as a PyObject).

Read

static PyObject *socketRead(socket *self)
{
    char str[MAX_BUFFER_LEN];
    int res;

    res = socket_read(self->fd, str, MAX_BUFFER_LEN);
    if (res <= 0) {
        PyErr_SetFromErrno(PyExc_IOError);
        return NULL;
    }
    str[res] = '\0';

    return Py_BuildValue("s", str);
}

Read takes no arguments. It tries to read a bufer from the socket. On fail, it set an exception and return NULL. When succeeds it terminates the string and returns a copy of it as a PyObject.

Initializing the module

Every Python C module must have a unique PyMODINIT_FUNC function, this function is called when the module is imported (by Python’s “import” directive).

This is how our init function looks like:

PyMODINIT_FUNC initUnixSockets(void)
{
    PyObject* m;

    if (PyType_Ready(&socketType) < 0)
        return;

    m = Py_InitModule3("UnixSockets", unixSocketsModuleMethods,
                       "Unix socket modules");
    if (m == NULL)
        return;

    Py_INCREF(&socketType);
    PyModule_AddObject(m, "Socket", (PyObject *)&socketType);
}

PyType_Ready “finalizes” our object type – Adds some necessary defaults to our new type. Then we register our module – This module doesn’t have any method of it own, so unixSocketsModuleMethods is just a “dummy” array of methods.
PyModule_AddObject registers our new type. We need socket type to have at least one reference for this, so we increment its reference before adding it.

Using the new module

This is how we use our new module (after building and installing it, of courses):

import sys, UnixSockets

if len(sys.argv) != 2:
    print('Usage: python {0} '.format(sys.argv[0]))
    quit()

socketName = sys.argv[1]

try:
    mySocket = UnixSockets.Socket(socketName)
except Exception as e:
    print('Error opening socket {0}: "{1}"'.format(socketName, e))
    quit()

words = [ 'Hi there', 'Just testing you from python', 'Third message', 'Another message, just before leaving', 'So long']

for word in words:
    try:
        writeRes = mySocket.write(word)
        print('* Sent "{0}", {1} bytes sent'.format(word, writeRes))
        resp = mySocket.read()
        print('  Listener response: {0}'.format(resp))
    except IOError as e:
        print('Error communicating with listener: "{0}"'.format(e))
        break

First, we import our library using the “import” directive. This script receives the socket name as a command line argument, and passes it as an argument to the object initializer. We do this inside a try-catch section, and if something goes wrong, we’ll catch the exception and print its error message, and here’s an example:

$ python tests/python/UnixSocket.py /tmp/sock
Error opening socket /tmp/sock: "[Errno 2] No such file or directory"

For the purpose of testing, we go over a list of strings, write each one to the socket, and read the listener’s answer. Again, if we get an exception, we print an error message.
Here’s the output of a successful execution of this script:

$ python tests/python/UnixSocket.py /tmp/sock1
* Sent "Hi there", 8 bytes sent
  Listener response: Recieved 8 Bytes
* Sent "Just testing you from python", 28 bytes sent
  Listener response: Recieved 28 Bytes
* Sent "Third message", 13 bytes sent
  Listener response: Recieved 13 Bytes
* Sent "Another message, just before leaving", 36 bytes sent
  Listener response: Recieved 36 Bytes
* Sent "So long", 7 bytes sent
  Listener response: Recieved 7 Bytes

Exactly what we wanted.

This is it for now,

Amnon.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s