ASKSAGE: Sage Q&A Forum - Individual question feedhttp://ask.sagemath.org/questions/Q&A Forum for SageenCopyright Sage, 2010. Some rights reserved under creative commons license.Wed, 15 Apr 2015 17:34:29 -0500Matrix dot multiplication slowness and BLAS versionshttp://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/Hello everyone!
Is there a way to increase a performance of matrix multiplication in Sage? Right now I am relying on numpy's dot function like this:
import numpy as np
N = 768
P = 1024
A = np.random.random((P, N))
A.T.dot(A)
Timing dot product in Sage now gives me a time about second and a half:
>>> setup = """
...
... import numpy as np
...
... N = 768
... P = 1024
...
... A = np.random.random((P, N))
... """
>>> timeit.repeat('A.T.dot(A)', setup=setup, number=10, repeat=3)
[18.736198902130127, 18.66787099838257, 17.36500310897827]
Yet the same multiplication in Matlab takes less than 100 ms. I heard that numpy internally relying on BLAS and it can be replaced with OpenBLAS /ATLAS/IntelMKL or something like that for the better performance.
So I am looking for some kind of manual or info about that is going on with the performance in regard with underlying numpy's components and when one should consider replacing one with another and is there a simple way to do that?
Sun, 12 Apr 2015 08:52:31 -0500http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/Comment by tmonteil for <p>Hello everyone!</p>
<p>Is there a way to increase a performance of matrix multiplication in Sage? Right now I am relying on numpy's dot function like this:</p>
<pre><code> import numpy as np
N = 768
P = 1024
A = np.random.random((P, N))
A.T.dot(A)
</code></pre>
<p>Timing dot product in Sage now gives me a time about second and a half:</p>
<pre><code>>>> setup = """
...
... import numpy as np
...
... N = 768
... P = 1024
...
... A = np.random.random((P, N))
... """
>>> timeit.repeat('A.T.dot(A)', setup=setup, number=10, repeat=3)
[18.736198902130127, 18.66787099838257, 17.36500310897827]
</code></pre>
<p>Yet the same multiplication in Matlab takes less than 100 ms. I heard that numpy internally relying on BLAS and it can be replaced with OpenBLAS /ATLAS/IntelMKL or something like that for the better performance.</p>
<p>So I am looking for some kind of manual or info about that is going on with the performance in regard with underlying numpy's components and when one should consider replacing one with another and is there a simple way to do that?</p>
http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?comment=26515#post-id-26515Which version of Sage are you using ? Which binaries did you use ? Which hardware ? Which distribution ?Mon, 13 Apr 2015 11:35:07 -0500http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?comment=26515#post-id-26515Answer by tmonteil for <p>Hello everyone!</p>
<p>Is there a way to increase a performance of matrix multiplication in Sage? Right now I am relying on numpy's dot function like this:</p>
<pre><code> import numpy as np
N = 768
P = 1024
A = np.random.random((P, N))
A.T.dot(A)
</code></pre>
<p>Timing dot product in Sage now gives me a time about second and a half:</p>
<pre><code>>>> setup = """
...
... import numpy as np
...
... N = 768
... P = 1024
...
... A = np.random.random((P, N))
... """
>>> timeit.repeat('A.T.dot(A)', setup=setup, number=10, repeat=3)
[18.736198902130127, 18.66787099838257, 17.36500310897827]
</code></pre>
<p>Yet the same multiplication in Matlab takes less than 100 ms. I heard that numpy internally relying on BLAS and it can be replaced with OpenBLAS /ATLAS/IntelMKL or something like that for the better performance.</p>
<p>So I am looking for some kind of manual or info about that is going on with the performance in regard with underlying numpy's components and when one should consider replacing one with another and is there a simple way to do that?</p>
http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?answer=26512#post-id-26512For what it worth, I can not reproduce your problem within Sage, neither with ipython `%timeit` nor sage `timeit`:
sage: import numpy as np
sage: N = 768
sage: P = 1024
sage: A = np.random.random((P, N))
sage: %timeit A.T.dot(A)
10 loops, best of 3: 90.8 ms per loop
sage: timeit('A.T.dot(A)')
5 loops, best of 3: 90.2 ms per loop
I do not know the details of Python `timeit.repeat`, but it seems that `number=10` cumulates the time of 10 runs. If i try with `number=1`, i also get about 100ms as expected:
sage: import timeit
sage: setup = """
...
... import numpy as np
...
... N = 768
... P = 1024
...
... A = np.random.random((P, N))
... """
sage: timeit.repeat('A.T.dot(A)', setup=setup, number=1, repeat=3)
[0.0931999683380127, 0.08932089805603027, 0.09101414680480957]
Note that i did not compile ATLAS specifically for my hardware since i am using `SAGE_ATLAS_ARCH='fast' ` preselected configuration. Which version of Sage are you using ? Which binaries did you use ? Which hardware ? Which distribution ?
**EDIT**: i tried on my laptop with a version of Sage that was compiled on Pentium 3 (in particular without SSE2 set of instructions), and the timing is about 380 ms, which is still below your timings.
Mon, 13 Apr 2015 06:14:30 -0500http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?answer=26512#post-id-26512Comment by Eugene for <p>For what it worth, I can not reproduce your problem within Sage, neither with ipython <code>%timeit</code> nor sage <code>timeit</code>:</p>
<pre><code>sage: import numpy as np
sage: N = 768
sage: P = 1024
sage: A = np.random.random((P, N))
sage: %timeit A.T.dot(A)
10 loops, best of 3: 90.8 ms per loop
sage: timeit('A.T.dot(A)')
5 loops, best of 3: 90.2 ms per loop
</code></pre>
<p>I do not know the details of Python <code>timeit.repeat</code>, but it seems that <code>number=10</code> cumulates the time of 10 runs. If i try with <code>number=1</code>, i also get about 100ms as expected:</p>
<pre><code>sage: import timeit
sage: setup = """
...
... import numpy as np
...
... N = 768
... P = 1024
...
... A = np.random.random((P, N))
... """
sage: timeit.repeat('A.T.dot(A)', setup=setup, number=1, repeat=3)
[0.0931999683380127, 0.08932089805603027, 0.09101414680480957]
</code></pre>
<p>Note that i did not compile ATLAS specifically for my hardware since i am using <code>SAGE_ATLAS_ARCH='fast'</code> preselected configuration. Which version of Sage are you using ? Which binaries did you use ? Which hardware ? Which distribution ?</p>
<p><strong>EDIT</strong>: i tried on my laptop with a version of Sage that was compiled on Pentium 3 (in particular without SSE2 set of instructions), and the timing is about 380 ms, which is still below your timings.</p>
http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?comment=26549#post-id-26549Thanks a lot! Finally the performance get right! Rebuilding with these options helped.Wed, 15 Apr 2015 17:34:29 -0500http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?comment=26549#post-id-26549Comment by tmonteil for <p>For what it worth, I can not reproduce your problem within Sage, neither with ipython <code>%timeit</code> nor sage <code>timeit</code>:</p>
<pre><code>sage: import numpy as np
sage: N = 768
sage: P = 1024
sage: A = np.random.random((P, N))
sage: %timeit A.T.dot(A)
10 loops, best of 3: 90.8 ms per loop
sage: timeit('A.T.dot(A)')
5 loops, best of 3: 90.2 ms per loop
</code></pre>
<p>I do not know the details of Python <code>timeit.repeat</code>, but it seems that <code>number=10</code> cumulates the time of 10 runs. If i try with <code>number=1</code>, i also get about 100ms as expected:</p>
<pre><code>sage: import timeit
sage: setup = """
...
... import numpy as np
...
... N = 768
... P = 1024
...
... A = np.random.random((P, N))
... """
sage: timeit.repeat('A.T.dot(A)', setup=setup, number=1, repeat=3)
[0.0931999683380127, 0.08932089805603027, 0.09101414680480957]
</code></pre>
<p>Note that i did not compile ATLAS specifically for my hardware since i am using <code>SAGE_ATLAS_ARCH='fast'</code> preselected configuration. Which version of Sage are you using ? Which binaries did you use ? Which hardware ? Which distribution ?</p>
<p><strong>EDIT</strong>: i tried on my laptop with a version of Sage that was compiled on Pentium 3 (in particular without SSE2 set of instructions), and the timing is about 380 ms, which is still below your timings.</p>
http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?comment=26536#post-id-26536Which variables did you export before typing `make` ? Could you try recompiling by typing the following before running `make`:
export SAGE_INSTALL_GCC='yes'
export SAGE_ATLAS_ARCH='fast'Wed, 15 Apr 2015 05:47:51 -0500http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?comment=26536#post-id-26536Comment by Eugene for <p>For what it worth, I can not reproduce your problem within Sage, neither with ipython <code>%timeit</code> nor sage <code>timeit</code>:</p>
<pre><code>sage: import numpy as np
sage: N = 768
sage: P = 1024
sage: A = np.random.random((P, N))
sage: %timeit A.T.dot(A)
10 loops, best of 3: 90.8 ms per loop
sage: timeit('A.T.dot(A)')
5 loops, best of 3: 90.2 ms per loop
</code></pre>
<p>I do not know the details of Python <code>timeit.repeat</code>, but it seems that <code>number=10</code> cumulates the time of 10 runs. If i try with <code>number=1</code>, i also get about 100ms as expected:</p>
<pre><code>sage: import timeit
sage: setup = """
...
... import numpy as np
...
... N = 768
... P = 1024
...
... A = np.random.random((P, N))
... """
sage: timeit.repeat('A.T.dot(A)', setup=setup, number=1, repeat=3)
[0.0931999683380127, 0.08932089805603027, 0.09101414680480957]
</code></pre>
<p>Note that i did not compile ATLAS specifically for my hardware since i am using <code>SAGE_ATLAS_ARCH='fast'</code> preselected configuration. Which version of Sage are you using ? Which binaries did you use ? Which hardware ? Which distribution ?</p>
<p><strong>EDIT</strong>: i tried on my laptop with a version of Sage that was compiled on Pentium 3 (in particular without SSE2 set of instructions), and the timing is about 380 ms, which is still below your timings.</p>
http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?comment=26535#post-id-26535I have uploaded a 7-zipped log atlas-3.10.2.7z using a few hosting (please choose anyone you'll find convenient):
https://yadi.sk/d/oKB84a21fzBCV
http://www.datafilehost.com/d/678b5f66
http://s000.tinyupload.com/index.php?file_id=24295558793419887747Tue, 14 Apr 2015 15:32:59 -0500http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?comment=26535#post-id-26535Comment by tmonteil for <p>For what it worth, I can not reproduce your problem within Sage, neither with ipython <code>%timeit</code> nor sage <code>timeit</code>:</p>
<pre><code>sage: import numpy as np
sage: N = 768
sage: P = 1024
sage: A = np.random.random((P, N))
sage: %timeit A.T.dot(A)
10 loops, best of 3: 90.8 ms per loop
sage: timeit('A.T.dot(A)')
5 loops, best of 3: 90.2 ms per loop
</code></pre>
<p>I do not know the details of Python <code>timeit.repeat</code>, but it seems that <code>number=10</code> cumulates the time of 10 runs. If i try with <code>number=1</code>, i also get about 100ms as expected:</p>
<pre><code>sage: import timeit
sage: setup = """
...
... import numpy as np
...
... N = 768
... P = 1024
...
... A = np.random.random((P, N))
... """
sage: timeit.repeat('A.T.dot(A)', setup=setup, number=1, repeat=3)
[0.0931999683380127, 0.08932089805603027, 0.09101414680480957]
</code></pre>
<p>Note that i did not compile ATLAS specifically for my hardware since i am using <code>SAGE_ATLAS_ARCH='fast'</code> preselected configuration. Which version of Sage are you using ? Which binaries did you use ? Which hardware ? Which distribution ?</p>
<p><strong>EDIT</strong>: i tried on my laptop with a version of Sage that was compiled on Pentium 3 (in particular without SSE2 set of instructions), and the timing is about 380 ms, which is still below your timings.</p>
http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?comment=26532#post-id-26532Sage uses its own version of `numpy`, not the one provided by your system.
`SAGE_ATLAS_ARCH` is an environment variable to be set up at compilation time. By default, Atlas does many compilations, benchmark each of them and select the best one. By setting this variable to some architecture (or generic choices such as 'base', 'fast'), you skip this optimization so that Atlas is compiled only once.
Sage uses your system's version of Atlas if `SAGE_ATLAS_LIB` is set at compilation time.
See [this page](http://sagemath.org/doc/installation/source.html#environment-variables) for a list of environment variables that could be used to tune Sage's compilation.Tue, 14 Apr 2015 12:25:41 -0500http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?comment=26532#post-id-26532Comment by tmonteil for <p>For what it worth, I can not reproduce your problem within Sage, neither with ipython <code>%timeit</code> nor sage <code>timeit</code>:</p>
<pre><code>sage: import numpy as np
sage: N = 768
sage: P = 1024
sage: A = np.random.random((P, N))
sage: %timeit A.T.dot(A)
10 loops, best of 3: 90.8 ms per loop
sage: timeit('A.T.dot(A)')
5 loops, best of 3: 90.2 ms per loop
</code></pre>
<p>I do not know the details of Python <code>timeit.repeat</code>, but it seems that <code>number=10</code> cumulates the time of 10 runs. If i try with <code>number=1</code>, i also get about 100ms as expected:</p>
<pre><code>sage: import timeit
sage: setup = """
...
... import numpy as np
...
... N = 768
... P = 1024
...
... A = np.random.random((P, N))
... """
sage: timeit.repeat('A.T.dot(A)', setup=setup, number=1, repeat=3)
[0.0931999683380127, 0.08932089805603027, 0.09101414680480957]
</code></pre>
<p>Note that i did not compile ATLAS specifically for my hardware since i am using <code>SAGE_ATLAS_ARCH='fast'</code> preselected configuration. Which version of Sage are you using ? Which binaries did you use ? Which hardware ? Which distribution ?</p>
<p><strong>EDIT</strong>: i tried on my laptop with a version of Sage that was compiled on Pentium 3 (in particular without SSE2 set of instructions), and the timing is about 380 ms, which is still below your timings.</p>
http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?comment=26533#post-id-26533Could you paste somewhere the contents of `$SAGE_ROOT/logs/pkgs/atlas-*.log` where `$SAGE_ROOT` is the directory where you made your own Sage compilation ?Tue, 14 Apr 2015 12:30:23 -0500http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?comment=26533#post-id-26533Comment by Eugene for <p>For what it worth, I can not reproduce your problem within Sage, neither with ipython <code>%timeit</code> nor sage <code>timeit</code>:</p>
<pre><code>sage: import numpy as np
sage: N = 768
sage: P = 1024
sage: A = np.random.random((P, N))
sage: %timeit A.T.dot(A)
10 loops, best of 3: 90.8 ms per loop
sage: timeit('A.T.dot(A)')
5 loops, best of 3: 90.2 ms per loop
</code></pre>
<p>I do not know the details of Python <code>timeit.repeat</code>, but it seems that <code>number=10</code> cumulates the time of 10 runs. If i try with <code>number=1</code>, i also get about 100ms as expected:</p>
<pre><code>sage: import timeit
sage: setup = """
...
... import numpy as np
...
... N = 768
... P = 1024
...
... A = np.random.random((P, N))
... """
sage: timeit.repeat('A.T.dot(A)', setup=setup, number=1, repeat=3)
[0.0931999683380127, 0.08932089805603027, 0.09101414680480957]
</code></pre>
<p>Note that i did not compile ATLAS specifically for my hardware since i am using <code>SAGE_ATLAS_ARCH='fast'</code> preselected configuration. Which version of Sage are you using ? Which binaries did you use ? Which hardware ? Which distribution ?</p>
<p><strong>EDIT</strong>: i tried on my laptop with a version of Sage that was compiled on Pentium 3 (in particular without SSE2 set of instructions), and the timing is about 380 ms, which is still below your timings.</p>
http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?comment=26523#post-id-26523Thanks for the reply!
I am using Arch Linux, and now I found that the people has exactly the same [problem with the dot product performance](https://bugs.archlinux.org/task/21313) in Arch. The CPU is Core i5.
I tried dot function on Sage 6.5 both from Arch repo and build from sources (just by issuing make), the performance is low in all cases: in system's python2 and numpy, in Sage from Arch report and in compiled Sage.
I assume now I need to find a way to switch from numpy's reference BLAS to a faster one in the system, which brings next two questions:
1. Does Sage use numpy from my system or bring one of his own?
2. What did you mean by "SAGE_ATLAS_ARCH='fast' preselected configuration"? I tried to run sage with env variable but it caused no effect.Tue, 14 Apr 2015 01:19:44 -0500http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?comment=26523#post-id-26523Answer by Dima for <p>Hello everyone!</p>
<p>Is there a way to increase a performance of matrix multiplication in Sage? Right now I am relying on numpy's dot function like this:</p>
<pre><code> import numpy as np
N = 768
P = 1024
A = np.random.random((P, N))
A.T.dot(A)
</code></pre>
<p>Timing dot product in Sage now gives me a time about second and a half:</p>
<pre><code>>>> setup = """
...
... import numpy as np
...
... N = 768
... P = 1024
...
... A = np.random.random((P, N))
... """
>>> timeit.repeat('A.T.dot(A)', setup=setup, number=10, repeat=3)
[18.736198902130127, 18.66787099838257, 17.36500310897827]
</code></pre>
<p>Yet the same multiplication in Matlab takes less than 100 ms. I heard that numpy internally relying on BLAS and it can be replaced with OpenBLAS /ATLAS/IntelMKL or something like that for the better performance.</p>
<p>So I am looking for some kind of manual or info about that is going on with the performance in regard with underlying numpy's components and when one should consider replacing one with another and is there a simple way to do that?</p>
http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?answer=26511#post-id-26511numpy in Sage uses Atlas for matrix operations; if you use a binary Sage installation then Atlas might be not optimised for your hardware. To get good Atlas performance, the best is to build Sage from source.Mon, 13 Apr 2015 05:48:54 -0500http://ask.sagemath.org/question/26507/matrix-dot-multiplication-slowness-and-blas-versions/?answer=26511#post-id-26511