Behold, My Stuff

[Home] [Writing] [CV] [Contact]

Optimizing Python Code with ctypes

Note: the code in this post is licensed under GNU AGPLv3.

I wrote this guide when I couldn’t find one for using ctypes all in one place. Hopefully this makes someone else’s life much easier.

Table of Contents:

  1. Basic optimizations
  2. ctypes
  3. Compiling for Python
  4. Structs in Python
  5. Calling Your C Code
  6. PyPy

Prelude: Basic Optimizations

Before rewriting Python source code in C, consider these standard Python optimizations:

1. Built-in Data Structures

The built-in data structures in Python like set and dict are written in C. They are much faster than writing your own data structures as Python classes. Other data structures besides the standard set, dict, list, and tuple are documented in the collections module.

2. List Comprehensions

Rather than appending to a list, use list comprehensions.

# Slow
mapped = []
for value in originallist:

# Faster
mapped = [myfunc(value) in originallist]

The Main Event: ctypes

ctypes is a module that allows you to communicate with C code from your Python code without using subprocess or similar modules to run another process from the CLI.

There are two parts: compiling your C code to be loaded as a shared object, and setting up the data structures in your Python code to map to C-types.

In this post, I’ll be connecting my Python code to lcs.c, which finds the longest common subsequence of two lists of strings. In the Python code, I want this to happen:

list1 = ['My', 'name', 'is', 'Sam', 'Stevens', '!']
list2 = ['My', 'name', 'is', 'Alex', 'Stevens', '.']

common = lcs(list1, list2)

# ['My', 'name', 'is', 'Stevens']

Some challenges with this particular C function is function signature having lists of strings as the argument types, and the return type not having a fixed length. I solve this with a Sequence struct containing the pointers and the length.

Compiling C code for Python

First, the C source code (lcs.c) is compiled to that can be loaded by Python.

gcc -c -fpic lcs.c -o lcs.o
gcc -shared lcs.o -o 

Next, we begin to write the Python code to use this shared object file.

Structs in Python

Below, I show the two structs that are used in my C source code.

struct Sequence
    char **items;
    int length;

struct Cell
    int index;
    int length;
    struct Cell *prev;

Here, you see the Python translation of the structs.

import ctypes
class SEQUENCE(ctypes.Structure):
    _fields_ = [('items', ctypes.POINTER(ctypes.c_char_p)),
                ('length', ctypes.c_int)]

class CELL(ctypes.Structure):

CELL._fields_ = [('index', ctypes.c_int), ('length', ctypes.c_int),
                 ('prev', ctypes.POINTER(CELL))]

Some notes:

Calling Your C Code

Additionally, I needed some code to convert your Python types to C structs.

First, set the argument types and the return types of the DLL:

lcsmodule = ctypes.cdll.LoadLibrary('lcsmodule/')
lcsmodule.lcs.argtypes = [
lcsmodule.lcs.restype = ctypes.POINTER(SEQUENCE)

Finally, you can use your new C function to speed up your Python code.

def list_to_SEQUENCE(strlist: List[str]) -> SEQUENCE:
    bytelist = [bytes(s, 'utf-8') for s in strlist]
    arr = (ctypes.c_char_p * len(bytelist))()
    arr[:] = bytelist
    return SEQUENCE(arr, len(bytelist))

def lcs(s1: List[str], s2: List[str]) -> List[str]:
    seq1 = list_to_SEQUENCE(s1)
    seq2 = list_to_SEQUENCE(s2)

    # struct Sequence *lcs(struct Sequence *s1, struct Sequence *s2)
    common = lcsmodule.lcs(ctypes.byref(seq1), ctypes.byref(seq2))[0]

    ret = []

    for i in range(common.length):

    # manually free memory from c now that 
    # it's referenced by the Python VM.

    return ret

list1 = ['My', 'name', 'is', 'Sam', 'Stevens', '!']
list2 = ['My', 'name', 'is', 'Alex', 'Stevens', '.']

common = lcs(list1, list2)

# ['My', 'name', 'is', 'Stevens']

More notes:

Optimized Python code: code that you wrote in C and wrote a wrapper for in Python.

Extra Credit: PyPy, Numba and Cython

NOTE: I’ve never used PyPy personally.

One simple optimization is simply to run your programs in the PyPy runtime, which includes a just-in-time (JIT) compiler which will speed your loops by compiling them into native code when they run many times.

Other options are Numba and Cython, both of which apply compilation to pseudo-Python code to produce much more optimized code, especially with NumPy-related code.

Please email me if you have any comments or want to discuss further.

[Relevant link] [Source]

Sam Stevens, 2024