# Matrix dot multiplication slowness and BLAS versions

Hello everyone!

Is there a way to increase a performance of matrix multiplication in Sage? Right now I am relying on numpy's dot function like this:

 import numpy as np
N = 768
P = 1024
A = np.random.random((P, N))
A.T.dot(A)


Timing dot product in Sage now gives me a time about second and a half:

>>> setup = """
...
... import numpy as np
...
... N = 768
... P = 1024
...
... A = np.random.random((P, N))
... """
>>> timeit.repeat('A.T.dot(A)', setup=setup, number=10, repeat=3)
[18.736198902130127, 18.66787099838257, 17.36500310897827]


Yet the same multiplication in Matlab takes less than 100 ms. I heard that numpy internally relying on BLAS and it can be replaced with OpenBLAS /ATLAS/IntelMKL or something like that for the better performance.

So I am looking for some kind of manual or info about that is going on with the performance in regard with underlying numpy's components and when one should consider replacing one with another and is there a simple way to do that?

edit retag close merge delete

Which version of Sage are you using ? Which binaries did you use ? Which hardware ? Which distribution ?

( 2015-04-13 11:35:07 -0600 )edit

Sort by ยป oldest newest most voted

For what it worth, I can not reproduce your problem within Sage, neither with ipython %timeit nor sage timeit:

sage: import numpy as np
sage: N = 768
sage: P = 1024
sage: A = np.random.random((P, N))
sage: %timeit A.T.dot(A)
10 loops, best of 3: 90.8 ms per loop
sage: timeit('A.T.dot(A)')
5 loops, best of 3: 90.2 ms per loop


I do not know the details of Python timeit.repeat, but it seems that number=10 cumulates the time of 10 runs. If i try with number=1, i also get about 100ms as expected:

sage: import timeit
sage: setup = """
...
... import numpy as np
...
... N = 768
... P = 1024
...
... A = np.random.random((P, N))
... """
sage: timeit.repeat('A.T.dot(A)', setup=setup, number=1, repeat=3)
[0.0931999683380127, 0.08932089805603027, 0.09101414680480957]


Note that i did not compile ATLAS specifically for my hardware since i am using SAGE_ATLAS_ARCH='fast' preselected configuration. Which version of Sage are you using ? Which binaries did you use ? Which hardware ? Which distribution ?

EDIT: i tried on my laptop with a version of Sage that was compiled on Pentium 3 (in particular without SSE2 set of instructions), and the timing is about 380 ms, which is still below your timings.

more

I am using Arch Linux, and now I found that the people has exactly the same problem with the dot product performance in Arch. The CPU is Core i5.

I tried dot function on Sage 6.5 both from Arch repo and build from sources (just by issuing make), the performance is low in all cases: in system's python2 and numpy, in Sage from Arch report and in compiled Sage.

I assume now I need to find a way to switch from numpy's reference BLAS to a faster one in the system, which brings next two questions: 1. Does Sage use numpy from my system or bring one of his own? 2. What did you mean by "SAGE_ATLAS_ARCH='fast' preselected configuration"? I tried to run sage with env variable but it caused no effect.

( 2015-04-14 01:19:44 -0600 )edit

Sage uses its own version of numpy, not the one provided by your system.

SAGE_ATLAS_ARCH is an environment variable to be set up at compilation time. By default, Atlas does many compilations, benchmark each of them and select the best one. By setting this variable to some architecture (or generic choices such as 'base', 'fast'), you skip this optimization so that Atlas is compiled only once.

Sage uses your system's version of Atlas if SAGE_ATLAS_LIB is set at compilation time.

See this page for a list of environment variables that could be used to tune Sage's compilation.

( 2015-04-14 12:25:41 -0600 )edit

Could you paste somewhere the contents of $SAGE_ROOT/logs/pkgs/atlas-*.log where $SAGE_ROOT is the directory where you made your own Sage compilation ?

( 2015-04-14 12:30:23 -0600 )edit

( 2015-04-14 15:32:59 -0600 )edit

Which variables did you export before typing make ? Could you try recompiling by typing the following before running make:

export SAGE_INSTALL_GCC='yes'
export SAGE_ATLAS_ARCH='fast'

( 2015-04-15 05:47:51 -0600 )edit

numpy in Sage uses Atlas for matrix operations; if you use a binary Sage installation then Atlas might be not optimised for your hardware. To get good Atlas performance, the best is to build Sage from source.

more