Optimizing Python Code with
ctypes
Note: the code in this post is licensed under GNU AGPLv3.
I wrote this guide when I couldn’t find one for using
ctypes all in one place. Hopefully this makes someone
else’s life much easier.
Table of Contents:
Prelude: Basic Optimizations
Before rewriting Python source code in C, consider these standard Python optimizations:
1. Built-in Data Structures
The built-in data structures in Python like set and
dict are written in C. They are much faster than writing
your own data structures as Python classes. Other data structures
besides the standard set, dict,
list, and tuple are documented in the collections
module.
2. List Comprehensions
Rather than appending to a list, use list comprehensions.
# Slow
mapped = []
for value in originallist:
mapped.append(myfunc(value))
# Faster
mapped = [myfunc(value) in originallist]The Main Event: ctypes
ctypes
is a module that allows you to communicate with C code from your Python
code without using subprocess or similar modules to run
another process from the CLI.
There are two parts: compiling your C code to be loaded as a shared object, and setting up the data structures in your Python code to map to C-types.
In this post, I’ll be connecting my Python code to lcs.c, which finds the longest common subsequence of two lists of strings. In the Python code, I want this to happen:
list1 = ['My', 'name', 'is', 'Sam', 'Stevens', '!']
list2 = ['My', 'name', 'is', 'Alex', 'Stevens', '.']
common = lcs(list1, list2)
print(common)
# ['My', 'name', 'is', 'Stevens']Some challenges with this particular C function is function signature
having lists of strings as the argument types, and the return type not
having a fixed length. I solve this with a Sequence struct
containing the pointers and the length.
Compiling C code for Python
First, the C source code (lcs.c) is compiled to
lcs.so that can be loaded by Python.
gcc -c -fpic lcs.c -o lcs.o
gcc -shared lcs.o -o lcs.so -fpicgenerates “position independent instructions”, which is necessary if you want to use this library with Python.-cprevents the linker from running, because we want to manually invokeldwithgcc -shared.
Next, we begin to write the Python code to use this shared object file.
Structs in Python
Below, I show the two structs that are used in my C source code.
struct Sequence
{
char **items;
int length;
};
struct Cell
{
int index;
int length;
struct Cell *prev;
};Here, you see the Python translation of the structs.
import ctypes
class SEQUENCE(ctypes.Structure):
_fields_ = [('items', ctypes.POINTER(ctypes.c_char_p)),
('length', ctypes.c_int)]
class CELL(ctypes.Structure):
pass
CELL._fields_ = [('index', ctypes.c_int), ('length', ctypes.c_int),
('prev', ctypes.POINTER(CELL))]Some notes:
- All structs are
classes that inherit fromctypes.Structure. - The only field is
_fields_, which is a list of tuples. Each tuple is(<variable-name>, <ctypes.TYPE>). ctypeshas types likec_char(char), andc_char_p(*char).ctypesalso includesPOINTER()which creates a pointer type from any type passed to it.- If you have a recursive definition like in
CELL, you mustpassthe initial declaration and then add the_fields_fields to reference itself later. - Since I didn’t use
CELLin my Python code, I didn’t need to write this struct out, but it has a an interesting feature in the recursive field.
Calling Your C Code
Additionally, I needed some code to convert your Python types to C structs.
First, set the argument types and the return types of the DLL:
lcsmodule = ctypes.cdll.LoadLibrary('lcsmodule/lcs.so')
lcsmodule.lcs.argtypes = [
ctypes.POINTER(SEQUENCE),
ctypes.POINTER(SEQUENCE),
]
lcsmodule.lcs.restype = ctypes.POINTER(SEQUENCE)Finally, you can use your new C function to speed up your Python code.
def list_to_SEQUENCE(strlist: List[str]) -> SEQUENCE:
bytelist = [bytes(s, 'utf-8') for s in strlist]
arr = (ctypes.c_char_p * len(bytelist))()
arr[:] = bytelist
return SEQUENCE(arr, len(bytelist))
def lcs(s1: List[str], s2: List[str]) -> List[str]:
seq1 = list_to_SEQUENCE(s1)
seq2 = list_to_SEQUENCE(s2)
# struct Sequence *lcs(struct Sequence *s1, struct Sequence *s2)
common = lcsmodule.lcs(ctypes.byref(seq1), ctypes.byref(seq2))[0]
ret = []
for i in range(common.length):
ret.append(common.items[i].decode('utf-8'))
# manually free memory from c now that
# it's referenced by the Python VM.
lcsmodule.freeSequence(common)
return ret
list1 = ['My', 'name', 'is', 'Sam', 'Stevens', '!']
list2 = ['My', 'name', 'is', 'Alex', 'Stevens', '.']
common = lcs(list1, list2)
print(common)
# ['My', 'name', 'is', 'Stevens']More notes:
**char(a list of strings) maps directly to a list of bytes in Python.lcs.chas a functionlcs()with the signature:struct Sequence *lcs(struct Sequence *s1, struct Sequence *s2). To get the return type set up, I uselcsmodule.lcs.restype = ctypes.POINTER(SEQUENCE).- To make a call with the reference to the
struct Sequence, I usectypes.byref()which returns a “light-weight pointer” to your object (faster thanctypes.POINTER()). common.itemsis a list of bytes, so they are decoded to getretto be a list ofstr.lcsmodule.freeSequence(common)simply frees the memory associated withcommon. This is critical, because it will not be automatically collected by the garbage collector (AFAIK).
Optimized Python code: code that you wrote in C and wrote a wrapper for in Python.
Extra Credit: PyPy, Numba and Cython
NOTE: I’ve never used PyPy personally.
One simple optimization is simply to run your programs in the PyPy runtime, which includes a just-in-time (JIT) compiler which will speed your loops by compiling them into native code when they run many times.
Other options are Numba and Cython, both of which apply compilation to pseudo-Python code to produce much more optimized code, especially with NumPy-related code.
Please email me if you have any comments or want to discuss further.
Sam Stevens, 2024