I was curious how much the overhead of np.sum impacted Cython code. That is, should you write your own sum method that loops or use np.sum? So, I wrote up a little test definition of the two approaches, as shown below:

import numpy as np
cimport numpy as np

cpdef double sum_np(np.ndarray[double] a):
    return np.sum(a)

cpdef double sum_loop(double[:] a) nogil:
    cdef size_t i, I
    cdef double total = 0.0
    I = a.shape[0]
    for i in range(I):
        total += a[i]
    return total

sum_np just uses np.sum directly. sum_loop instead manually sums using a for loop.

I then ran this many times to determine a runtime for each loop on a length 20,000 numpy array of numbers. On my computer sum_loop is 1.8 times faster than sum_np. For a smaller length 20 numpy array, sum_loop is 6 times faster than sum_np. This makes sense because the overhead of calling a numpy function is a greater fraction of the operation time.

Thus, if you’re summing inside of a loop it might actually make sense to manually do the sums yourself, but if you’re just summing once it might be easier to just use np.sum directly (unless you’re looking for the most performant code).