Matrix dot multiplication slowness and BLAS versions    
   Hello everyone!
Is there a way to increase a performance of matrix multiplication in Sage? Right now I am relying on numpy's dot function like this:
 import numpy as np
 N = 768
 P = 1024
 A = np.random.random((P, N))
 A.T.dot(A)
Timing dot product in Sage now gives me a time about second and a half:
>>> setup = """
... 
... import numpy as np
... 
... N = 768
... P = 1024
... 
... A = np.random.random((P, N))
... """
>>> timeit.repeat('A.T.dot(A)', setup=setup, number=10, repeat=3)
[18.736198902130127, 18.66787099838257, 17.36500310897827]
Yet the same multiplication in Matlab takes less than 100 ms. I heard that numpy internally relying on BLAS and it can be replaced with OpenBLAS /ATLAS/IntelMKL or something like that for the better performance.
So I am looking for some kind of manual or info about that is going on with the performance in regard with underlying numpy's components and when one should consider replacing one with another and is there a simple way to do that?
 
 
Which version of Sage are you using ? Which binaries did you use ? Which hardware ? Which distribution ?