Ask Your Question
1

Matrix dot multiplication slowness and BLAS versions

asked 2015-04-12 15:52:31 +0100

Eugene gravatar image

updated 2018-10-08 00:42:41 +0100

tmonteil gravatar image

Hello everyone!

Is there a way to increase a performance of matrix multiplication in Sage? Right now I am relying on numpy's dot function like this:

 import numpy as np
 N = 768
 P = 1024
 A = np.random.random((P, N))
 A.T.dot(A)

Timing dot product in Sage now gives me a time about second and a half:

>>> setup = """
... 
... import numpy as np
... 
... N = 768
... P = 1024
... 
... A = np.random.random((P, N))
... """
>>> timeit.repeat('A.T.dot(A)', setup=setup, number=10, repeat=3)
[18.736198902130127, 18.66787099838257, 17.36500310897827]

Yet the same multiplication in Matlab takes less than 100 ms. I heard that numpy internally relying on BLAS and it can be replaced with OpenBLAS /ATLAS/IntelMKL or something like that for the better performance.

So I am looking for some kind of manual or info about that is going on with the performance in regard with underlying numpy's components and when one should consider replacing one with another and is there a simple way to do that?

edit retag flag offensive close merge delete

Comments

Which version of Sage are you using ? Which binaries did you use ? Which hardware ? Which distribution ?

tmonteil gravatar imagetmonteil ( 2015-04-13 18:35:07 +0100 )edit

2 Answers

Sort by ยป oldest newest most voted
1

answered 2015-04-13 13:14:30 +0100

tmonteil gravatar image

updated 2015-04-14 19:17:31 +0100

For what it worth, I can not reproduce your problem within Sage, neither with ipython %timeit nor sage timeit:

sage: import numpy as np
sage: N = 768
sage: P = 1024
sage: A = np.random.random((P, N))
sage: %timeit A.T.dot(A)
10 loops, best of 3: 90.8 ms per loop
sage: timeit('A.T.dot(A)')
5 loops, best of 3: 90.2 ms per loop

I do not know the details of Python timeit.repeat, but it seems that number=10 cumulates the time of 10 runs. If i try with number=1, i also get about 100ms as expected:

sage: import timeit
sage: setup = """
... 
... import numpy as np
... 
... N = 768
... P = 1024
... 
... A = np.random.random((P, N))
... """
sage: timeit.repeat('A.T.dot(A)', setup=setup, number=1, repeat=3)
[0.0931999683380127, 0.08932089805603027, 0.09101414680480957]

Note that i did not compile ATLAS specifically for my hardware since i am using SAGE_ATLAS_ARCH='fast' preselected configuration. Which version of Sage are you using ? Which binaries did you use ? Which hardware ? Which distribution ?

EDIT: i tried on my laptop with a version of Sage that was compiled on Pentium 3 (in particular without SSE2 set of instructions), and the timing is about 380 ms, which is still below your timings.

edit flag offensive delete link more

Comments

Thanks for the reply!

I am using Arch Linux, and now I found that the people has exactly the same problem with the dot product performance in Arch. The CPU is Core i5.

I tried dot function on Sage 6.5 both from Arch repo and build from sources (just by issuing make), the performance is low in all cases: in system's python2 and numpy, in Sage from Arch report and in compiled Sage.

I assume now I need to find a way to switch from numpy's reference BLAS to a faster one in the system, which brings next two questions: 1. Does Sage use numpy from my system or bring one of his own? 2. What did you mean by "SAGE_ATLAS_ARCH='fast' preselected configuration"? I tried to run sage with env variable but it caused no effect.

Eugene gravatar imageEugene ( 2015-04-14 08:19:44 +0100 )edit

Sage uses its own version of numpy, not the one provided by your system.

SAGE_ATLAS_ARCH is an environment variable to be set up at compilation time. By default, Atlas does many compilations, benchmark each of them and select the best one. By setting this variable to some architecture (or generic choices such as 'base', 'fast'), you skip this optimization so that Atlas is compiled only once.

Sage uses your system's version of Atlas if SAGE_ATLAS_LIB is set at compilation time.

See this page for a list of environment variables that could be used to tune Sage's compilation.

tmonteil gravatar imagetmonteil ( 2015-04-14 19:25:41 +0100 )edit

Could you paste somewhere the contents of $SAGE_ROOT/logs/pkgs/atlas-*.log where $SAGE_ROOT is the directory where you made your own Sage compilation ?

tmonteil gravatar imagetmonteil ( 2015-04-14 19:30:23 +0100 )edit

I have uploaded a 7-zipped log atlas-3.10.2.7z using a few hosting (please choose anyone you'll find convenient): https://yadi.sk/d/oKB84a21fzBCVhttp://www.datafilehost.com/d/678b5f66http://s000.tinyupload.com/index.php?...

Eugene gravatar imageEugene ( 2015-04-14 22:32:59 +0100 )edit

Which variables did you export before typing make ? Could you try recompiling by typing the following before running make:

export SAGE_INSTALL_GCC='yes'
export SAGE_ATLAS_ARCH='fast'
tmonteil gravatar imagetmonteil ( 2015-04-15 12:47:51 +0100 )edit
2

answered 2015-04-13 12:48:54 +0100

numpy in Sage uses Atlas for matrix operations; if you use a binary Sage installation then Atlas might be not optimised for your hardware. To get good Atlas performance, the best is to build Sage from source.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

2 followers

Stats

Asked: 2015-04-12 15:52:31 +0100

Seen: 1,343 times

Last updated: Apr 20 '15