Optimizing Python Code with
ctypes
Note: the code in this post is licensed under GNU AGPLv3.
I wrote this guide when I couldn’t find one for using
ctypes
all in one place. Hopefully this makes someone
else’s life much easier.
Table of Contents:
Prelude: Basic Optimizations
Before rewriting Python source code in C, consider these standard Python optimizations:
1. Built-in Data Structures
The built-in data structures in Python like set
and
dict
are written in C. They are much faster than writing
your own data structures as Python classes. Other data structures
besides the standard set
, dict
,
list
, and tuple
are documented in the collections
module.
2. List Comprehensions
Rather than appending to a list, use list comprehensions.
# Slow
= []
mapped for value in originallist:
mapped.append(myfunc(value))
# Faster
= [myfunc(value) in originallist] mapped
The Main Event: ctypes
ctypes
is a module that allows you to communicate with C code from your Python
code without using subprocess
or similar modules to run
another process from the CLI.
There are two parts: compiling your C code to be loaded as a shared object, and setting up the data structures in your Python code to map to C-types.
In this post, I’ll be connecting my Python code to lcs.c, which finds the longest common subsequence of two lists of strings. In the Python code, I want this to happen:
= ['My', 'name', 'is', 'Sam', 'Stevens', '!']
list1 = ['My', 'name', 'is', 'Alex', 'Stevens', '.']
list2
= lcs(list1, list2)
common
print(common)
# ['My', 'name', 'is', 'Stevens']
Some challenges with this particular C function is function signature
having lists of strings as the argument types, and the return type not
having a fixed length. I solve this with a Sequence
struct
containing the pointers and the length.
Compiling C code for Python
First, the C source code (lcs.c
) is compiled to
lcs.so
that can be loaded by Python.
gcc -c -fpic lcs.c -o lcs.o
gcc -shared lcs.o -o lcs.so
-fpic
generates “position independent instructions”, which is necessary if you want to use this library with Python.-c
prevents the linker from running, because we want to manually invokeld
withgcc -shared
.
Next, we begin to write the Python code to use this shared object file.
Structs in Python
Below, I show the two structs that are used in my C source code.
struct Sequence
{
char **items;
int length;
};
struct Cell
{
int index;
int length;
struct Cell *prev;
};
Here, you see the Python translation of the structs.
import ctypes
class SEQUENCE(ctypes.Structure):
= [('items', ctypes.POINTER(ctypes.c_char_p)),
_fields_ 'length', ctypes.c_int)]
(
class CELL(ctypes.Structure):
pass
= [('index', ctypes.c_int), ('length', ctypes.c_int),
CELL._fields_ 'prev', ctypes.POINTER(CELL))] (
Some notes:
- All structs are
class
es that inherit fromctypes.Structure
. - The only field is
_fields_
, which is a list of tuples. Each tuple is(<variable-name>, <ctypes.TYPE>)
. ctypes
has types likec_char
(char
), andc_char_p
(*char
).ctypes
also includesPOINTER()
which creates a pointer type from any type passed to it.- If you have a recursive definition like in
CELL
, you mustpass
the initial declaration and then add the_fields_
fields to reference itself later. - Since I didn’t use
CELL
in my Python code, I didn’t need to write this struct out, but it has a an interesting feature in the recursive field.
Calling Your C Code
Additionally, I needed some code to convert your Python types to C structs.
First, set the argument types and the return types of the DLL:
= ctypes.cdll.LoadLibrary('lcsmodule/lcs.so')
lcsmodule = [
lcsmodule.lcs.argtypes
ctypes.POINTER(SEQUENCE),
ctypes.POINTER(SEQUENCE),
]= ctypes.POINTER(SEQUENCE) lcsmodule.lcs.restype
Finally, you can use your new C function to speed up your Python code.
def list_to_SEQUENCE(strlist: List[str]) -> SEQUENCE:
= [bytes(s, 'utf-8') for s in strlist]
bytelist = (ctypes.c_char_p * len(bytelist))()
arr = bytelist
arr[:] return SEQUENCE(arr, len(bytelist))
def lcs(s1: List[str], s2: List[str]) -> List[str]:
= list_to_SEQUENCE(s1)
seq1 = list_to_SEQUENCE(s2)
seq2
# struct Sequence *lcs(struct Sequence *s1, struct Sequence *s2)
= lcsmodule.lcs(ctypes.byref(seq1), ctypes.byref(seq2))[0]
common
= []
ret
for i in range(common.length):
'utf-8'))
ret.append(common.items[i].decode(
# manually free memory from c now that
# it's referenced by the Python VM.
lcsmodule.freeSequence(common)
return ret
= ['My', 'name', 'is', 'Sam', 'Stevens', '!']
list1 = ['My', 'name', 'is', 'Alex', 'Stevens', '.']
list2
= lcs(list1, list2)
common
print(common)
# ['My', 'name', 'is', 'Stevens']
More notes:
**char
(a list of strings) maps directly to a list of bytes in Python.lcs.c
has a functionlcs()
with the signature:struct Sequence *lcs(struct Sequence *s1, struct Sequence *s2)
. To get the return type set up, I uselcsmodule.lcs.restype = ctypes.POINTER(SEQUENCE)
.- To make a call with the reference to the
struct Sequence
, I usectypes.byref()
which returns a “light-weight pointer” to your object (faster thanctypes.POINTER()
). common.items
is a list of bytes, so they are decoded to getret
to be a list ofstr
.lcsmodule.freeSequence(common)
simply frees the memory associated withcommon
. This is critical, because it will not be automatically collected by the garbage collector (AFAIK).
Optimized Python code: code that you wrote in C and wrote a wrapper for in Python.
Extra Credit: PyPy, Numba and Cython
NOTE: I’ve never used PyPy personally.
One simple optimization is simply to run your programs in the PyPy runtime, which includes a just-in-time (JIT) compiler which will speed your loops by compiling them into native code when they run many times.
Other options are Numba and Cython, both of which apply compilation to pseudo-Python code to produce much more optimized code, especially with NumPy-related code.
Please email me if you have any comments or want to discuss further.
[Relevant link] [Source]
Sam Stevens, 2024