How to efficiently calculate a sum of arrays with numpy and @parallel decorator?
I have an algorithm to process a huge array by chunks. Each processing operation results in a matrix of size N*N, I need to calculate a sum of these matrices. For simplicity assume processing function does almost nothing and requires no input - just returns zeros. In that case working example looks like this:
import datetime import numpy as np import time N = 1024 * 2 K = 256 def f(): return np.ones((N, N), dtype=np.complex128) buffer = np.zeros((N, N), dtype=np.complex128) start_time = datetime.datetime.now() for i in range(K): buffer += f() print 'Elapsed time:', (datetime.datetime.now() - start_time)
Execution takes about 5 seconds on my PC. Now, as function f becomes more complex, I would like to run in parallel, so I modify code as follows:
import datetime import numpy as np N = 1024 * 2 K = 256 @parallel def f(_): return np.ones((N, N), dtype=np.complex128) start_time = datetime.datetime.now() for o in f(range(K)): buffer += o print 'Elapsed time:', (datetime.datetime.now() - start_time)
And now it takes about 26 seconds to calculate! What am I doing wrong? Or what causes such a huge overhead? (it looks silly for if the cost of collecting the result of f() across parallel processes is more than calculating one iteration of f() itself, I better run f() without parallelism at all)