ASKSAGE: Sage Q&A Forum - RSS feedhttps://ask.sagemath.org/questions/Q&A Forum for SageenCopyright Sage, 2010. Some rights reserved under creative commons license.Wed, 30 Oct 2013 09:46:03 +0100Speed comparison with numerical arrays: sage, R, matlabhttps://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/Dear all
I am a sage newbie coming from matlab, R, and Maple, and although I really like many features, I find that my sage code for handling numerical arrays is much more complex and much slower than the corresponding code in R and Matlab.
Here's an example:
# Simuating a sample path of standard Brownian motion
I want to simulate a random path of standard Brownian motion. The input is a vector of time points, and I want returned a (random) trajectory, sampled at these time points.
To do this I need: 1) Fast random number generation 2) Fast array operations: Diff, multiplication, and cumsum. I found that in sage I need to go through NumPy, so I wrote:
sage: import numpy
sage: # General routine to sample a Brownian motion on a time grid
sage: def rB(tvec):
sage: dt = numpy.diff(numpy.array(tvec))
sage: sdt = sqrt(dt)
sage: RNG = RealDistribution('gaussian',1)
sage: dB = sdt * numpy.array([RNG.get_random_element() for j in range(dt.__len__())],dtype=float)
sage: B1 = numpy.array([normalvariate(0,sqrt(tvec[0]))],dtype=float)
sage: B = numpy.cumsum(numpy.concatenate((B1,dB)))
sage: return(B)
I then time it, testing with 1 million time points:
sage: tvec = srange(0,1e3,1e-3)
sage: time B = rB(tvec)
I get a time of 3.5 seconds on my machine, a standard Laptop running Ubuntu, running sage from the shell. Using prun, I see that half of the time, roughly, is spent generating random numbers and the other half mainly constructing numerical arrays.
For comparison, the R code would be (here I assume that tvec[1]=0)
> tvec <- seq(0,1000,1e-3)
> system.time(B <- c(0,cumsum(rnorm(length(tvec)-1,sd=sqrt(diff(tvec))))))
which executes in 0.24 seconds. The matlab code would be
>> tvec = linspace(0,1e3,1e6+1);
>> tic;B = [0,cumsum(randn(1,length(tvec)-1) .* sqrt(diff(tvec)))];toc
which runs in 0.06 seconds.
You see that my sage code is not only clumsier, it is also between 10 and 100 times slower, so I guess I am doing something wrong.
I would really appreciate if you could point me in the direction of cleaner and faster code.
Cheers,
Uffe
Mon, 28 Oct 2013 18:52:08 +0100https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/Answer by ppurka for <p>Dear all</p>
<p>I am a sage newbie coming from matlab, R, and Maple, and although I really like many features, I find that my sage code for handling numerical arrays is much more complex and much slower than the corresponding code in R and Matlab. </p>
<p>Here's an example: </p>
<h1>Simuating a sample path of standard Brownian motion</h1>
<p>I want to simulate a random path of standard Brownian motion. The input is a vector of time points, and I want returned a (random) trajectory, sampled at these time points.</p>
<p>To do this I need: 1) Fast random number generation 2) Fast array operations: Diff, multiplication, and cumsum. I found that in sage I need to go through NumPy, so I wrote:</p>
<pre><code>sage: import numpy
sage: # General routine to sample a Brownian motion on a time grid
sage: def rB(tvec):
sage: dt = numpy.diff(numpy.array(tvec))
sage: sdt = sqrt(dt)
sage: RNG = RealDistribution('gaussian',1)
sage: dB = sdt * numpy.array([RNG.get_random_element() for j in range(dt.__len__())],dtype=float)
sage: B1 = numpy.array([normalvariate(0,sqrt(tvec[0]))],dtype=float)
sage: B = numpy.cumsum(numpy.concatenate((B1,dB)))
sage: return(B)
</code></pre>
<p>I then time it, testing with 1 million time points:</p>
<pre><code>sage: tvec = srange(0,1e3,1e-3)
sage: time B = rB(tvec)
</code></pre>
<p>I get a time of 3.5 seconds on my machine, a standard Laptop running Ubuntu, running sage from the shell. Using prun, I see that half of the time, roughly, is spent generating random numbers and the other half mainly constructing numerical arrays. </p>
<p>For comparison, the R code would be (here I assume that tvec[1]=0)</p>
<pre><code>> tvec <- seq(0,1000,1e-3)
> system.time(B <- c(0,cumsum(rnorm(length(tvec)-1,sd=sqrt(diff(tvec))))))
</code></pre>
<p>which executes in 0.24 seconds. The matlab code would be</p>
<pre><code>>> tvec = linspace(0,1e3,1e6+1);
>> tic;B = [0,cumsum(randn(1,length(tvec)-1) .* sqrt(diff(tvec)))];toc
</code></pre>
<p>which runs in 0.06 seconds.</p>
<p>You see that my sage code is not only clumsier, it is also between 10 and 100 times slower, so I guess I am doing something wrong.</p>
<p>I would really appreciate if you could point me in the direction of cleaner and faster code. </p>
<p>Cheers,</p>
<p>Uffe</p>
https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?answer=15626#post-id-15626I don't know what you are doing in creating B1. Other than that, most of your time is being spent in converting between Sage numbers or arrays and numpy numbers or arrays. When writing your code, make sure to minimize these kinds of conversions.
Following is the code which is almost the same as your code, except everything is handled by numpy and there are hardly any conversions being done, except for a few floating values. Also, I removed the line containing B1 since it gives errors with zero variance.
import numpy
# General routine to sample a Brownian motion on a time grid
def rB(tvec):
dt = numpy.diff(tvec)
sdt = numpy.sqrt(dt)
dB = sdt * numpy.random.normal(0, 1, len(dt))
B = numpy.cumsum(dB)
return(B)
Here, we must ensure that the input `tvec` is also a numpy array, otherwise there will be a lot of time wasted in converting it. After we make this change, the time taken by the code is 0.17s (as compared to over 4s with your code).
tvec = numpy.arange(0, 1e3, 1e-3) # srange(0,1e3,1e-3)
time B = rB(tvec)
Time: CPU 0.16 s, Wall: 0.17 s
Output of r code:
%r
tvec <- seq(0,1000,1e-3)
system.time(B <- c(0,cumsum(rnorm(length(tvec)-1,sd=sqrt(diff(tvec))))))
user system elapsed
0.440 0.003 0.451
Mon, 28 Oct 2013 20:38:19 +0100https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?answer=15626#post-id-15626Comment by Uffe_H_Thygesen for <p>I don't know what you are doing in creating B1. Other than that, most of your time is being spent in converting between Sage numbers or arrays and numpy numbers or arrays. When writing your code, make sure to minimize these kinds of conversions.</p>
<p>Following is the code which is almost the same as your code, except everything is handled by numpy and there are hardly any conversions being done, except for a few floating values. Also, I removed the line containing B1 since it gives errors with zero variance.</p>
<pre><code>import numpy
# General routine to sample a Brownian motion on a time grid
def rB(tvec):
dt = numpy.diff(tvec)
sdt = numpy.sqrt(dt)
dB = sdt * numpy.random.normal(0, 1, len(dt))
B = numpy.cumsum(dB)
return(B)
</code></pre>
<p>Here, we must ensure that the input <code>tvec</code> is also a numpy array, otherwise there will be a lot of time wasted in converting it. After we make this change, the time taken by the code is 0.17s (as compared to over 4s with your code).</p>
<pre><code>tvec = numpy.arange(0, 1e3, 1e-3) # srange(0,1e3,1e-3)
time B = rB(tvec)
Time: CPU 0.16 s, Wall: 0.17 s
</code></pre>
<p>Output of r code:</p>
<pre><code>%r
tvec <- seq(0,1000,1e-3)
system.time(B <- c(0,cumsum(rnorm(length(tvec)-1,sd=sqrt(diff(tvec))))))
user system elapsed
0.440 0.003 0.451
</code></pre>
https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16837#post-id-16837Thanks! I guess the take-home-message is to stay in numpy, whereas I was trying to do as much as possible in sage proper, and only shifting to numpy for bottlenecks. Tue, 29 Oct 2013 03:43:18 +0100https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16837#post-id-16837Comment by Uffe_H_Thygesen for <p>I don't know what you are doing in creating B1. Other than that, most of your time is being spent in converting between Sage numbers or arrays and numpy numbers or arrays. When writing your code, make sure to minimize these kinds of conversions.</p>
<p>Following is the code which is almost the same as your code, except everything is handled by numpy and there are hardly any conversions being done, except for a few floating values. Also, I removed the line containing B1 since it gives errors with zero variance.</p>
<pre><code>import numpy
# General routine to sample a Brownian motion on a time grid
def rB(tvec):
dt = numpy.diff(tvec)
sdt = numpy.sqrt(dt)
dB = sdt * numpy.random.normal(0, 1, len(dt))
B = numpy.cumsum(dB)
return(B)
</code></pre>
<p>Here, we must ensure that the input <code>tvec</code> is also a numpy array, otherwise there will be a lot of time wasted in converting it. After we make this change, the time taken by the code is 0.17s (as compared to over 4s with your code).</p>
<pre><code>tvec = numpy.arange(0, 1e3, 1e-3) # srange(0,1e3,1e-3)
time B = rB(tvec)
Time: CPU 0.16 s, Wall: 0.17 s
</code></pre>
<p>Output of r code:</p>
<pre><code>%r
tvec <- seq(0,1000,1e-3)
system.time(B <- c(0,cumsum(rnorm(length(tvec)-1,sd=sqrt(diff(tvec))))))
user system elapsed
0.440 0.003 0.451
</code></pre>
https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16830#post-id-16830@kcrisman : That sounds like the right way to do it. Would you by chance be able to point to an example in one of the tutorials that does this in a best-practice way? I tend to confuse myself when converting objects from one class to another, which leads to a lot of overhead, as in my original example. (If you don't have an example at hand, then don't worry, I just had the idea that perhaps you could save me a lot of time)Wed, 30 Oct 2013 08:44:23 +0100https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16830#post-id-16830Comment by kcrisman for <p>I don't know what you are doing in creating B1. Other than that, most of your time is being spent in converting between Sage numbers or arrays and numpy numbers or arrays. When writing your code, make sure to minimize these kinds of conversions.</p>
<p>Following is the code which is almost the same as your code, except everything is handled by numpy and there are hardly any conversions being done, except for a few floating values. Also, I removed the line containing B1 since it gives errors with zero variance.</p>
<pre><code>import numpy
# General routine to sample a Brownian motion on a time grid
def rB(tvec):
dt = numpy.diff(tvec)
sdt = numpy.sqrt(dt)
dB = sdt * numpy.random.normal(0, 1, len(dt))
B = numpy.cumsum(dB)
return(B)
</code></pre>
<p>Here, we must ensure that the input <code>tvec</code> is also a numpy array, otherwise there will be a lot of time wasted in converting it. After we make this change, the time taken by the code is 0.17s (as compared to over 4s with your code).</p>
<pre><code>tvec = numpy.arange(0, 1e3, 1e-3) # srange(0,1e3,1e-3)
time B = rB(tvec)
Time: CPU 0.16 s, Wall: 0.17 s
</code></pre>
<p>Output of r code:</p>
<pre><code>%r
tvec <- seq(0,1000,1e-3)
system.time(B <- c(0,cumsum(rnorm(length(tvec)-1,sd=sqrt(diff(tvec))))))
user system elapsed
0.440 0.003 0.451
</code></pre>
https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16839#post-id-16839@Uffe_H_Thygesen - you may enjoy running Sage in Python mode, with `sage -ipython`. You can still import Sage if you want (`from sage.all import *`, I think), but can also just do pure Numpy and friends.Mon, 28 Oct 2013 21:14:19 +0100https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16839#post-id-16839Comment by kcrisman for <p>I don't know what you are doing in creating B1. Other than that, most of your time is being spent in converting between Sage numbers or arrays and numpy numbers or arrays. When writing your code, make sure to minimize these kinds of conversions.</p>
<p>Following is the code which is almost the same as your code, except everything is handled by numpy and there are hardly any conversions being done, except for a few floating values. Also, I removed the line containing B1 since it gives errors with zero variance.</p>
<pre><code>import numpy
# General routine to sample a Brownian motion on a time grid
def rB(tvec):
dt = numpy.diff(tvec)
sdt = numpy.sqrt(dt)
dB = sdt * numpy.random.normal(0, 1, len(dt))
B = numpy.cumsum(dB)
return(B)
</code></pre>
<p>Here, we must ensure that the input <code>tvec</code> is also a numpy array, otherwise there will be a lot of time wasted in converting it. After we make this change, the time taken by the code is 0.17s (as compared to over 4s with your code).</p>
<pre><code>tvec = numpy.arange(0, 1e3, 1e-3) # srange(0,1e3,1e-3)
time B = rB(tvec)
Time: CPU 0.16 s, Wall: 0.17 s
</code></pre>
<p>Output of r code:</p>
<pre><code>%r
tvec <- seq(0,1000,1e-3)
system.time(B <- c(0,cumsum(rnorm(length(tvec)-1,sd=sqrt(diff(tvec))))))
user system elapsed
0.440 0.003 0.451
</code></pre>
https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16828#post-id-16828I think @ppurka has basically said this well.Wed, 30 Oct 2013 08:49:54 +0100https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16828#post-id-16828Comment by Uffe_H_Thygesen for <p>I don't know what you are doing in creating B1. Other than that, most of your time is being spent in converting between Sage numbers or arrays and numpy numbers or arrays. When writing your code, make sure to minimize these kinds of conversions.</p>
<p>Following is the code which is almost the same as your code, except everything is handled by numpy and there are hardly any conversions being done, except for a few floating values. Also, I removed the line containing B1 since it gives errors with zero variance.</p>
<pre><code>import numpy
# General routine to sample a Brownian motion on a time grid
def rB(tvec):
dt = numpy.diff(tvec)
sdt = numpy.sqrt(dt)
dB = sdt * numpy.random.normal(0, 1, len(dt))
B = numpy.cumsum(dB)
return(B)
</code></pre>
<p>Here, we must ensure that the input <code>tvec</code> is also a numpy array, otherwise there will be a lot of time wasted in converting it. After we make this change, the time taken by the code is 0.17s (as compared to over 4s with your code).</p>
<pre><code>tvec = numpy.arange(0, 1e3, 1e-3) # srange(0,1e3,1e-3)
time B = rB(tvec)
Time: CPU 0.16 s, Wall: 0.17 s
</code></pre>
<p>Output of r code:</p>
<pre><code>%r
tvec <- seq(0,1000,1e-3)
system.time(B <- c(0,cumsum(rnorm(length(tvec)-1,sd=sqrt(diff(tvec))))))
user system elapsed
0.440 0.003 0.451
</code></pre>
https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16836#post-id-16836@kcrisman - thanks for the pointer. Now, most of my code contains much symbolic manipulation before going to numerics, so that's why I tend to start in sage. I will have to do some experimentation ... Tue, 29 Oct 2013 03:45:29 +0100https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16836#post-id-16836Comment by ppurka for <p>I don't know what you are doing in creating B1. Other than that, most of your time is being spent in converting between Sage numbers or arrays and numpy numbers or arrays. When writing your code, make sure to minimize these kinds of conversions.</p>
<p>Following is the code which is almost the same as your code, except everything is handled by numpy and there are hardly any conversions being done, except for a few floating values. Also, I removed the line containing B1 since it gives errors with zero variance.</p>
<pre><code>import numpy
# General routine to sample a Brownian motion on a time grid
def rB(tvec):
dt = numpy.diff(tvec)
sdt = numpy.sqrt(dt)
dB = sdt * numpy.random.normal(0, 1, len(dt))
B = numpy.cumsum(dB)
return(B)
</code></pre>
<p>Here, we must ensure that the input <code>tvec</code> is also a numpy array, otherwise there will be a lot of time wasted in converting it. After we make this change, the time taken by the code is 0.17s (as compared to over 4s with your code).</p>
<pre><code>tvec = numpy.arange(0, 1e3, 1e-3) # srange(0,1e3,1e-3)
time B = rB(tvec)
Time: CPU 0.16 s, Wall: 0.17 s
</code></pre>
<p>Output of r code:</p>
<pre><code>%r
tvec <- seq(0,1000,1e-3)
system.time(B <- c(0,cumsum(rnorm(length(tvec)-1,sd=sqrt(diff(tvec))))))
user system elapsed
0.440 0.003 0.451
</code></pre>
https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16826#post-id-16826@Uffe_H_Thygesen I had to figure out some of these interactions to do my work. Some of the ways I separated my code can be found [here](https://github.com/ppurka/papers/tree/master/matrix_code).Wed, 30 Oct 2013 09:46:03 +0100https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16826#post-id-16826Comment by kcrisman for <p>I don't know what you are doing in creating B1. Other than that, most of your time is being spent in converting between Sage numbers or arrays and numpy numbers or arrays. When writing your code, make sure to minimize these kinds of conversions.</p>
<p>Following is the code which is almost the same as your code, except everything is handled by numpy and there are hardly any conversions being done, except for a few floating values. Also, I removed the line containing B1 since it gives errors with zero variance.</p>
<pre><code>import numpy
# General routine to sample a Brownian motion on a time grid
def rB(tvec):
dt = numpy.diff(tvec)
sdt = numpy.sqrt(dt)
dB = sdt * numpy.random.normal(0, 1, len(dt))
B = numpy.cumsum(dB)
return(B)
</code></pre>
<p>Here, we must ensure that the input <code>tvec</code> is also a numpy array, otherwise there will be a lot of time wasted in converting it. After we make this change, the time taken by the code is 0.17s (as compared to over 4s with your code).</p>
<pre><code>tvec = numpy.arange(0, 1e3, 1e-3) # srange(0,1e3,1e-3)
time B = rB(tvec)
Time: CPU 0.16 s, Wall: 0.17 s
</code></pre>
<p>Output of r code:</p>
<pre><code>%r
tvec <- seq(0,1000,1e-3)
system.time(B <- c(0,cumsum(rnorm(length(tvec)-1,sd=sqrt(diff(tvec))))))
user system elapsed
0.440 0.003 0.451
</code></pre>
https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16835#post-id-16835No problem. What I would say is to try to abstract out the numerics stuff in your code in a separate function that you can call from your symbolic code. Especially if you use it a lot in a few places, this would be more efficient and better design.Tue, 29 Oct 2013 08:36:36 +0100https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16835#post-id-16835Comment by kcrisman for <p>I don't know what you are doing in creating B1. Other than that, most of your time is being spent in converting between Sage numbers or arrays and numpy numbers or arrays. When writing your code, make sure to minimize these kinds of conversions.</p>
<p>Following is the code which is almost the same as your code, except everything is handled by numpy and there are hardly any conversions being done, except for a few floating values. Also, I removed the line containing B1 since it gives errors with zero variance.</p>
<pre><code>import numpy
# General routine to sample a Brownian motion on a time grid
def rB(tvec):
dt = numpy.diff(tvec)
sdt = numpy.sqrt(dt)
dB = sdt * numpy.random.normal(0, 1, len(dt))
B = numpy.cumsum(dB)
return(B)
</code></pre>
<p>Here, we must ensure that the input <code>tvec</code> is also a numpy array, otherwise there will be a lot of time wasted in converting it. After we make this change, the time taken by the code is 0.17s (as compared to over 4s with your code).</p>
<pre><code>tvec = numpy.arange(0, 1e3, 1e-3) # srange(0,1e3,1e-3)
time B = rB(tvec)
Time: CPU 0.16 s, Wall: 0.17 s
</code></pre>
<p>Output of r code:</p>
<pre><code>%r
tvec <- seq(0,1000,1e-3)
system.time(B <- c(0,cumsum(rnorm(length(tvec)-1,sd=sqrt(diff(tvec))))))
user system elapsed
0.440 0.003 0.451
</code></pre>
https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16840#post-id-16840I think that this is a really nice post - it illustrates exactly when the overhead of all the Sage stuff is not appropriate. Which is why it's so nice that Sage includes these for free.Mon, 28 Oct 2013 21:13:16 +0100https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16840#post-id-16840Comment by ppurka for <p>I don't know what you are doing in creating B1. Other than that, most of your time is being spent in converting between Sage numbers or arrays and numpy numbers or arrays. When writing your code, make sure to minimize these kinds of conversions.</p>
<p>Following is the code which is almost the same as your code, except everything is handled by numpy and there are hardly any conversions being done, except for a few floating values. Also, I removed the line containing B1 since it gives errors with zero variance.</p>
<pre><code>import numpy
# General routine to sample a Brownian motion on a time grid
def rB(tvec):
dt = numpy.diff(tvec)
sdt = numpy.sqrt(dt)
dB = sdt * numpy.random.normal(0, 1, len(dt))
B = numpy.cumsum(dB)
return(B)
</code></pre>
<p>Here, we must ensure that the input <code>tvec</code> is also a numpy array, otherwise there will be a lot of time wasted in converting it. After we make this change, the time taken by the code is 0.17s (as compared to over 4s with your code).</p>
<pre><code>tvec = numpy.arange(0, 1e3, 1e-3) # srange(0,1e3,1e-3)
time B = rB(tvec)
Time: CPU 0.16 s, Wall: 0.17 s
</code></pre>
<p>Output of r code:</p>
<pre><code>%r
tvec <- seq(0,1000,1e-3)
system.time(B <- c(0,cumsum(rnorm(length(tvec)-1,sd=sqrt(diff(tvec))))))
user system elapsed
0.440 0.003 0.451
</code></pre>
https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16833#post-id-16833You can even make all the numerical stuff into functions and put them into a separate python file. Later you can simply import that python file to do the computations you want. This will nicely separate out the sage part from the numpy part.
The only thing you need to be careful of here is whether your computations produce numbers that exceed the maximum in numpy. For example in sage, the integers can be arbitrarily large, whereas in numpy they are limited.Tue, 29 Oct 2013 10:55:15 +0100https://ask.sagemath.org/question/10671/speed-comparison-with-numerical-arrays-sage-r-matlab/?comment=16833#post-id-16833