Hi! I need to compute the product of several matrices with entries in a binary field. Since products do not depend on each other, computations can be easily distributed among the processors. However results of sequencial code and in processes are almost the same!

Here some code of a simple test that represents what I want to achieve:

```
from sage.all import *
import multiprocessing as mp
from multiprocessing.pool import ThreadPool, cpu_count
import time
def matrix_mult(A, B):
start = time.time()
A*B
return time.time() - start
def matrix_mult_proc(A, B, n):
avg = 0.0
for i in range(n):
start = time.time()
A*B
avg += (time.time() - start)
# print('time: {}'.format(time.time() - start))
print('Average per call proc {}: {}'.format(os.getpid(), avg / n))
@parallel
def matrix_mult_parallel(A, B):
start = time.time()
A*B
return time.time() - start
# print('time: {}'.format(time.time() - start))
iters = 40
n = 100
Fq = GF(2**16, 'X')
MS = MatrixSpace(Fq, n)
A = MS.random_element()
start = time.time()
tms = map(lambda i: matrix_mult(A, A), range(iters))
print('\n*** sequencial:\nAverage per call: {}\nTotal: {}'.format(sum(tms) / iters,time.time() - start))
nthreads = cpu_count()
pool = ThreadPool(nthreads)
start = time.time()
tms = pool.map(lambda i: matrix_mult(A, A), range(iters))
print('\n*** multithread {} threads:\nAverage per call: {}\nTotal: {}'.format(nthreads, sum(tms) / iters,time.time() - start))
nprocs = cpu_count()
procs = []
start = time.time()
print('\n*** multiproc {} procs:'.format(nprocs))
for i in range(nprocs):
p = mp.Process(target=matrix_mult_proc, args=(A, A, iters//nprocs))#, out_q))
procs.append(p)
p.start()
for p in procs:
p.join()
print('Total: {}'.format(time.time() - start))
args = [(A, A)]*iters
start = time.time()
tms = map(lambda e: e[1], list(matrix_mult_parallel(args)))
print('\n*** sage_parallel\nAverage per call: {}\nTotal {}'.format(sum(tms) / iters, time.time() - start))
```

Results are as follow:

```
For sequencial
Average time of a matrix multiplication: 0.279646992683
Total time for 40 multiplications: 11.1862668991
For Threadpool with 4 threads maximum
Average time of a matrix multiplication: 0.280531394482
Total time for 40 multiplications: 11.2248089314
For 4 processes in a 4 core computer (2 physical):
Average time of a matrix multiplication: 1.13726825714
Total time for 40 multiplications: 11.7641329765
With sage's @parallel decorator:
Average time of a matrix multiplication: 1.1256641984
Total time for 40 multiplications: 11.7641329765
```

I don't understand why multiplications seem to take a proportional amount of time to the number of processes. Same behavior on a 8 core machine.

Hope someone can explain. Thanks in advance.