Speed comparison with numerical arrays: sage, R, matlab

Dear all

I am a sage newbie coming from matlab, R, and Maple, and although I really like many features, I find that my sage code for handling numerical arrays is much more complex and much slower than the corresponding code in R and Matlab.

Here's an example:

Simuating a sample path of standard Brownian motion

I want to simulate a random path of standard Brownian motion. The input is a vector of time points, and I want returned a (random) trajectory, sampled at these time points.

To do this I need: 1) Fast random number generation 2) Fast array operations: Diff, multiplication, and cumsum. I found that in sage I need to go through NumPy, so I wrote:

sage: import numpy
sage: # General routine to sample a Brownian motion on a time grid
sage: def rB(tvec):
sage:    dt = numpy.diff(numpy.array(tvec))
sage:    sdt = sqrt(dt)
sage:    RNG = RealDistribution('gaussian',1)
sage:    dB = sdt * numpy.array([RNG.get_random_element() for j in range(dt.__len__())],dtype=float)
sage:    B1 = numpy.array([normalvariate(0,sqrt(tvec))],dtype=float)
sage:    B = numpy.cumsum(numpy.concatenate((B1,dB)))
sage:    return(B)

I then time it, testing with 1 million time points:

sage: tvec = srange(0,1e3,1e-3)
sage: time B = rB(tvec)

I get a time of 3.5 seconds on my machine, a standard Laptop running Ubuntu, running sage from the shell. Using prun, I see that half of the time, roughly, is spent generating random numbers and the other half mainly constructing numerical arrays.

For comparison, the R code would be (here I assume that tvec=0)

> tvec <- seq(0,1000,1e-3)
> system.time(B <- c(0,cumsum(rnorm(length(tvec)-1,sd=sqrt(diff(tvec))))))

which executes in 0.24 seconds. The matlab code would be

>> tvec = linspace(0,1e3,1e6+1);
>> tic;B = [0,cumsum(randn(1,length(tvec)-1) .* sqrt(diff(tvec)))];toc

which runs in 0.06 seconds.

You see that my sage code is not only clumsier, it is also between 10 and 100 times slower, so I guess I am doing something wrong.

I would really appreciate if you could point me in the direction of cleaner and faster code.

Cheers,

Uffe

edit retag close merge delete

Sort by » oldest newest most voted

I don't know what you are doing in creating B1. Other than that, most of your time is being spent in converting between Sage numbers or arrays and numpy numbers or arrays. When writing your code, make sure to minimize these kinds of conversions.

Following is the code which is almost the same as your code, except everything is handled by numpy and there are hardly any conversions being done, except for a few floating values. Also, I removed the line containing B1 since it gives errors with zero variance.

import numpy
# General routine to sample a Brownian motion on a time grid
def rB(tvec):
dt = numpy.diff(tvec)
sdt = numpy.sqrt(dt)
dB = sdt * numpy.random.normal(0, 1, len(dt))
B = numpy.cumsum(dB)
return(B)

Here, we must ensure that the input tvec is also a numpy array, otherwise there will be a lot of time wasted in converting it. After we make this change, the time taken by the code is 0.17s (as compared to over 4s with your code).

tvec = numpy.arange(0, 1e3, 1e-3) # srange(0,1e3,1e-3)
time B = rB(tvec)

Time: CPU 0.16 s, Wall: 0.17 s

Output of r code:

%r
tvec <- seq(0,1000,1e-3)
system.time(B <- c(0,cumsum(rnorm(length(tvec)-1,sd=sqrt(diff(tvec))))))

user  system elapsed
0.440   0.003   0.451
more

I think that this is a really nice post - it illustrates exactly when the overhead of all the Sage stuff is not appropriate. Which is why it's so nice that Sage includes these for free.

@Uffe_H_Thygesen - you may enjoy running Sage in Python mode, with sage -ipython. You can still import Sage if you want (from sage.all import *, I think), but can also just do pure Numpy and friends.

Thanks! I guess the take-home-message is to stay in numpy, whereas I was trying to do as much as possible in sage proper, and only shifting to numpy for bottlenecks.

@kcrisman - thanks for the pointer. Now, most of my code contains much symbolic manipulation before going to numerics, so that's why I tend to start in sage. I will have to do some experimentation ...

1

No problem. What I would say is to try to abstract out the numerics stuff in your code in a separate function that you can call from your symbolic code. Especially if you use it a lot in a few places, this would be more efficient and better design.