Ask Your Question

Time-consuming of compiled files

asked 2011-01-11 08:00:50 +0200

Mustafa gravatar image

updated 2011-04-28 17:19:31 +0200

Kelvin Li gravatar image

I have made two files, first named "speedtest.sage", second "speedtest_compiled.spyx"


def speedtest(n):
 f = 1
 for i in range(0,n):
    f=17*f^2 % 1234565432134567898765456787654323456787654345678765456787654321


def speedtest_compiled(n):
 f = 1
 for i in range(0,n):
    f=17*f^2 % 1234565432134567898765456787654323456787654345678765456787654321

They are equal (modulo names of the speedtest-function). I have this output on my notebook:

sage: load speedtest.sage
sage: load speedtest.spyx
Compiling speedtest.spyx...

sage: time speedtest(10^4)
CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s
Wall time: 0.02 s

sage: time speedtest(10^5)
CPU times: user 0.14 s, sys: 0.02 s, total: 0.16 s
Wall time: 0.16 s

sage: time speedtest(10^6)
CPU times: user 1.39 s, sys: 0.10 s, total: 1.49 s
Wall time: 1.50 s

sage: time speedtest_compiled(10^4)
CPU times: user 0.08 s, sys: 0.00 s, total: 0.08 s
Wall time: 0.08 s

sage: time speedtest_compiled(10^5)
CPU times: user 7.87 s, sys: 0.00 s, total: 7.87 s
Wall time: 7.87 s

The interpretated Python-Code of speedtest() behaves absolutely normal, as you can see - BUT the compiled Cython-Code behaves strange, not only that it is so extremely slow, its consumed time is not even close to linear in n (notice that I could not evaluate speedtest_compiled(10^6), it took too long. What is the explanation for this - and how to compile correctly to gain the 10-1000x speedup I've read about?

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2011-01-11 09:10:59 +0200

DSM gravatar image

updated 2011-01-11 09:16:57 +0200

First, while compiling something will often cause a slight increase in speed when the loop itself is a problem, mostly the real speedup comes not just from compiling functions but by marking them up with type information ("cdef int a") and so on.

The reason interpreted languages like Python are slower than compiled languages isn't because of the interpreter, that's a common misconception. (If that were so then removing the interpreter loop and executing the bytecode directly would be fast, and it's not: people have tried.) The reason they're slow is more because of "multiple dispatch": when you give Python "a+b", it has to check the types of a and b, see whether there's an add method, etc., all of which takes lots and lots of cycles while C just compiles to (say) one flop and a copy. So the real gain comes from specifying types, but of course this comes at a cost: C doesn't have a type corresponding to an arbitrary-length integer, for example.

As for your particular functions, you can understand the behaviour if you actually look at what the functions return (I added a "return f" to the end of each function, which is something I think you probably should have done first):

sage: time speedtest(10**2)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
sage: time speedtest_compiled(10**2)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s

Two things are going on: the first is that the operations are binding differently, so it's parsing f=17*f^2 % 123.. as f=17*f^(2 %123..), or f=17*f^2, so no mod is being taken. The second is that the "^" is being interpreted as an XOR operation (like in C), not as an exponentiation. To see what code is actually being generated, I looked in .sage/temp, and it showed that it was interpreting it as

 *     f = 1
 *     for i in range(0,n):
 *         f= 17*f^2 % 1234565432134567898765456787654323456787654345678765456787654321             # <<<<<<<<<<<<<<
 *     return f
    __pyx_t_3 = PyNumber_Multiply(__pyx_int_17, __pyx_v_f); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 9; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __pyx_t_4 = PyNumber_Xor(__pyx_t_3, __pyx_int_2); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 9; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __pyx_v_f = __pyx_t_4;
    __pyx_t_4 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
which isn't what we wanted. The following code
def speedtest_compiled_v2(n):
    f = 1   
    for i in range(0,n):
        f=(17*f**2) % 1234565432134567898765456787654323456787654345678765456787654321
    return f
should return the right values, but it still won't be faster. Where the real speed gain comes in is when you can do this:

#pure python
def speedtest_v3(n):
    f = 1    
    for i in range(n):
        f= (17*f*f) % 1234565
    return f

# cython
def speedtest_compiled_v3(long n):
    cdef long i,f
    f = 1    
    for i in range(n):
        f= (17*f ...
edit flag offensive delete link more


Thank you very much - I did lots of mistakes (Im beginner in Sage/Python and not much experience in programming in general, so that comment helps!). I experiment with my implementations of modern integer-factoring algorithm, so the integers have usually >>20 digits - so theres no straight-forward way to speed up with Cython I guess?

Mustafa gravatar imageMustafa ( 2011-01-11 10:36:10 +0200 )edit

Depends on your definition of straightforward -- you can use gmp/mpir integers with some work, which are faster than using Python ints -- and on where most of the time is being spent. You don't have to mark up everything, so subproblems which can be done in longs can be sped up and Python overhead removed. But I'd recommend against spending much time worrying about optimizations in an experimental phase.

DSM gravatar imageDSM ( 2011-01-11 21:38:38 +0200 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools


Asked: 2011-01-11 08:00:50 +0200

Seen: 521 times

Last updated: Jan 11 '11