Following this answer to “Can I force a numpy ndarray to take ownership of its memory?” I attempted to use the Python C API function PyArray_ENABLEFLAGS through Cython’s NumPy wrapper and found it is not exposed.
The following attempt to expose it manually (this is just a minimum example reproducing the failure)
from libc.stdlib cimport malloc
import numpy as np
cimport numpy as np
np.import_array()
ctypedef np.int32_t DTYPE_t
cdef extern from "numpy/ndarraytypes.h":
void PyArray_ENABLEFLAGS(np.PyArrayObject *arr, int flags)
def test():
cdef int N = 1000
cdef DTYPE_t *data = <DTYPE_t *>malloc(N * sizeof(DTYPE_t))
cdef np.ndarray[DTYPE_t, ndim=1] arr = np.PyArray_SimpleNewFromData(1, &N, np.NPY_INT32, data)
PyArray_ENABLEFLAGS(arr, np.NPY_ARRAY_OWNDATA)
fails with a compile error:
Error compiling Cython file:
------------------------------------------------------------
...
def test():
cdef int N = 1000
cdef DTYPE_t *data = <DTYPE_t *>malloc(N * sizeof(DTYPE_t))
cdef np.ndarray[DTYPE_t, ndim=1] arr = np.PyArray_SimpleNewFromData(1, &N, np.NPY_INT32, data)
PyArray_ENABLEFLAGS(arr, np.NPY_ARRAY_OWNDATA)
^
------------------------------------------------------------
/tmp/test.pyx:19:27: Cannot convert Python object to 'PyArrayObject *'
My question: Is this the right approach to take in this case? If so, what am I doing wrong? If not, how do I force NumPy to take ownership in Cython, without going down to a C extension module?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You just have some minor errors in the interface definition. The following worked for me:
from libc.stdlib cimport malloc
import numpy as np
cimport numpy as np
np.import_array()
ctypedef np.int32_t DTYPE_t
cdef extern from "numpy/arrayobject.h":
void PyArray_ENABLEFLAGS(np.ndarray arr, int flags)
cdef data_to_numpy_array_with_spec(void * ptr, np.npy_intp N, int t):
cdef np.ndarray[DTYPE_t, ndim=1] arr = np.PyArray_SimpleNewFromData(1, &N, t, ptr)
PyArray_ENABLEFLAGS(arr, np.NPY_OWNDATA)
return arr
def test():
N = 1000
cdef DTYPE_t *data = <DTYPE_t *>malloc(N * sizeof(DTYPE_t))
arr = data_to_numpy_array_with_spec(data, N, np.NPY_INT32)
return arr
This is my setup.py file:
from distutils.core import setup, Extension
from Cython.Distutils import build_ext
ext_modules = [Extension("_owndata", ["owndata.pyx"])]
setup(cmdclass={'build_ext': build_ext}, ext_modules=ext_modules)
Build with python setup.py build_ext --inplace. Then verify that the data is actually owned:
import _owndata arr = _owndata.test() print arr.flags
Among others, you should see OWNDATA : True.
And yes, this is definitely the right way to deal with this, since numpy.pxd does exactly the same thing to export all the other functions to Cython.
Method 2
@Stefan’s solution works for most scenarios, but is somewhat fragile. Numpy uses PyDataMem_NEW/PyDataMem_FREE for memory-management and it is an implementation detail, that these calls are mapped to the usual malloc/free + some memory tracing (I don’t know which effect Stefan’s solution has on the memory tracing, at least it seems not to crash).
There are also more esoteric cases possible, in which free from numpy-library doesn’t use the same memory-allocator as malloc in the cython code (linked against different run-times for example as in this github-issue or this SO-post).
The right tool to pass/manage the ownership of the data is PyArray_SetBaseObject.
First we need a python-object, which is responsible for freeing the memory. I’m using a self-made cdef-class here (mostly because of logging/demostration), but there are obviously other possiblities as well:
%%cython
from libc.stdlib cimport free
cdef class MemoryNanny:
cdef void* ptr # set to NULL by "constructor"
def __dealloc__(self):
print("freeing ptr=", <unsigned long long>(self.ptr)) #just for debugging
free(self.ptr)
@staticmethod
cdef create(void* ptr):
cdef MemoryNanny result = MemoryNanny()
result.ptr = ptr
print("nanny for ptr=", <unsigned long long>(result.ptr)) #just for debugging
return result
...
Now, we use a MemoryNanny-object as sentinel for the memory, which gets freed as soon as the parent-numpy-array gets destroyed. The code is a little bit awkward, because PyArray_SetBaseObject steals the reference, which is not handled by Cython automatically:
%%cython
...
from cpython.object cimport PyObject
from cpython.ref cimport Py_INCREF
cimport numpy as np
#needed to initialize PyArray_API in order to be able to use it
np.import_array()
cdef extern from "numpy/arrayobject.h":
# a little bit awkward: the reference to obj will be stolen
# using PyObject* to signal that Cython cannot handle it automatically
int PyArray_SetBaseObject(np.ndarray arr, PyObject *obj) except -1 # -1 means there was an error
cdef array_from_ptr(void * ptr, np.npy_intp N, int np_type):
cdef np.ndarray arr = np.PyArray_SimpleNewFromData(1, &N, np_type, ptr)
nanny = MemoryNanny.create(ptr)
Py_INCREF(nanny) # a reference will get stolen, so prepare nanny
PyArray_SetBaseObject(arr, <PyObject*>nanny)
return arr
...
And here is an example, how this functionality can be called:
%%cython
...
from libc.stdlib cimport malloc
def create():
cdef double *ptr=<double*>malloc(sizeof(double)*8);
ptr[0]=42.0
return array_from_ptr(ptr, 8, np.NPY_FLOAT64)
which can be used as follows:
>>> m = create() nanny for ptr= 94339864945184 >>> m.flags ... OWNDATA : False ... >>> m[0] 42.0 >>> del m freeing ptr= 94339864945184
with results/output as expected.
Note: the resulting arrays doesn’t really own the data (i.e. flags return OWNDATA : False), because the memory is owned be the memory-nanny, but the result is the same: the memory gets freed as soon as array is deleted (because nobody holds a reference to the nanny anymore).
MemoryNanny doesn’t have to guard a raw C-pointer. It can be anything else, for example also a std::vector:
%%cython -+
from libcpp.vector cimport vector
cdef class VectorNanny:
#automatically default initialized/destructed by Cython:
cdef vector[double] vec
@staticmethod
cdef create(vector[double]& vec):
cdef VectorNanny result = VectorNanny()
result.vec.swap(vec) # swap and not copy
return result
# for testing:
def create_vector(int N):
cdef vector[double] vec;
vec.resize(N, 2.0)
return VectorNanny.create(vec)
The following test shows, that the nanny works:
nanny=create_vector(10**8) # top shows additional 800MB memory are used del nanny # top shows, this additional memory is no longer used.
Method 3
The latest Cython version allows you to do with with minimal syntax, albeit slightly more overhead than the lower-level solutions suggested.
numpy_array = np.asarray(<np.int32_t[:10, :10]> my_pointer)
https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#coercion-to-numpy
This alone does not pass ownership.
Notably, a Cython array is generated with this call, via array_cwrapper.
This generates a cython.array, without allocating memory. The cython.array uses the stdlib.h malloc and free by default, so it would be expected that you use the default malloc, as well, instead of any special CPython/Numpy allocators.
free is only called if ownership is set for this cython.array, which it is by default only if it allocates data. For our case, we can manually set it via:
my_cyarr.free_data = True
So to return a 1D array, it would be as simple as:
from cython.view cimport array as cvarray
# ...
cdef cvarray cvarr = <np.int32_t[:N]> data
cvarr.free_data = True
return np.asarray(cvarr)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0