Revision history [back]

How to efficiently calculate a sum of arrays with numpy and @parallel decorator?

Hello!

I have an algorithm to process a huge array by chunks. Each processing operation results in a matrix of size N*N, I need to calculate a sum of these matrices. For simplicity assume processing function does almost nothing and requires no input - just returns zeros. In that case working example looks like this:

import datetime
import numpy as np
import time

N = 1024 * 2
K = 256

def f():
    return np.ones((N, N),  dtype=np.complex128)

buffer = np.zeros((N, N),  dtype=np.complex128)

start_time = datetime.datetime.now()

for i in range(K):
    buffer += f()

print 'Elapsed time:', (datetime.datetime.now() - start_time)

Execution takes about 5 seconds on my PC. Now, as function f becomes more complex, I would like to run in parallel, so I modify code as follows:

import datetime
import numpy as np

N = 1024 * 2
K = 256

@parallel
def f(_):
    return np.ones((N, N),  dtype=np.complex128)

start_time = datetime.datetime.now()

for o in f(range(K)):
    buffer += o[1]

print 'Elapsed time:', (datetime.datetime.now() - start_time)

And now it takes about 26 seconds to calculate! What am I doing wrong? Or what causes such a huge overhead? (it looks silly for if the cost of collecting the result of f() across parallel processes is more than calculating one iteration of f() itself, I better run f() without parallelism at all)