Time-consuming of compiled files

Mustafa
45 ●2 ●5 ●8

Kelvin Li
503 ●5 ●12 ●17

I have made two files, first named "speedtest.sage", second "speedtest_compiled.spyx"

speedtest.sage:

def speedtest(n):
 f = 1
 for i in range(0,n):
    f=17*f^2 % 1234565432134567898765456787654323456787654345678765456787654321

speedtest.spyx:

def speedtest_compiled(n):
 f = 1
 for i in range(0,n):
    f=17*f^2 % 1234565432134567898765456787654323456787654345678765456787654321

They are equal (modulo names of the speedtest-function). I have this output on my notebook:

sage: load speedtest.sage
sage: load speedtest.spyx
Compiling speedtest.spyx...

sage: time speedtest(10^4)
CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s
Wall time: 0.02 s

sage: time speedtest(10^5)
CPU times: user 0.14 s, sys: 0.02 s, total: 0.16 s
Wall time: 0.16 s

sage: time speedtest(10^6)
CPU times: user 1.39 s, sys: 0.10 s, total: 1.49 s
Wall time: 1.50 s

sage: time speedtest_compiled(10^4)
CPU times: user 0.08 s, sys: 0.00 s, total: 0.08 s
Wall time: 0.08 s

sage: time speedtest_compiled(10^5)
CPU times: user 7.87 s, sys: 0.00 s, total: 7.87 s
Wall time: 7.87 s

The interpretated Python-Code of speedtest() behaves absolutely normal, as you can see - BUT the compiled Cython-Code behaves strange, not only that it is so extremely slow, its consumed time is not even close to linear in n (notice that I could not evaluate speedtest_compiled(10^6), it took too long. What is the explanation for this - and how to compile correctly to gain the 10-1000x speedup I've read about?

add a comment

answered 14 years ago

DSM

5242 ●21 ●68 ●112

updated 14 years ago

First, while compiling something will often cause a slight increase in speed when the loop itself is a problem, mostly the real speedup comes not just from compiling functions but by marking them up with type information ("cdef int a") and so on.

The reason interpreted languages like Python are slower than compiled languages isn't because of the interpreter, that's a common misconception. (If that were so then removing the interpreter loop and executing the bytecode directly would be fast, and it's not: people have tried.) The reason they're slow is more because of "multiple dispatch": when you give Python "a+b", it has to check the types of a and b, see whether there's an add method, etc., all of which takes lots and lots of cycles while C just compiles to (say) one flop and a copy. So the real gain comes from specifying types, but of course this comes at a cost: C doesn't have a type corresponding to an arbitrary-length integer, for example.

As for your particular functions, you can understand the behaviour if you actually look at what the functions return (I added a "return f" to the end of each function, which is something I think you probably should have done first):


sage: time speedtest(10**2)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
648037159632282329931500412681871807401802252780713670001839379
sage: time speedtest_compiled(10**2)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
1232110414200870712562346350972327740484463519610977249336410934754319533890407441902694104728894548432498279447853775120001L

Two things are going on: the first is that the operations are binding differently, so it's parsing f=17*f^2 % 123.. as f=17*f^(2 %123..), or f=17*f^2, so no mod is being taken. The second is that the "^" is being interpreted as an XOR operation (like in C), not as an exponentiation. To see what code is actually being generated, I looked in .sage/temp, and it showed that it was interpreting it as


 *     f = 1
 *     for i in range(0,n):
 *         f= 17*f^2 % 1234565432134567898765456787654323456787654345678765456787654321             # <<<<<<<<<<<<<<
 *     return f
 * 
 */
    __pyx_t_3 = PyNumber_Multiply(__pyx_int_17, __pyx_v_f); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 9; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_3);
    __pyx_t_4 = PyNumber_Xor(__pyx_t_3, __pyx_int_2); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 9; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_4);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __Pyx_DECREF(__pyx_v_f);
    __pyx_v_f = __pyx_t_4;
    __pyx_t_4 = 0;
  }
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;

which isn't what we wanted. The following code

def speedtest_compiled_v2(n):
    f = 1   
    for i in range(0,n):
        f=(17*f**2) % 1234565432134567898765456787654323456787654345678765456787654321
    return f


should return the right values, but it still won't be faster.  Where the real speed gain comes in is when you can do this:


#pure python
def speedtest_v3(n):
    f = 1    
    for i in range(n):
        f= (17*f*f) % 1234565
    return f

# cython
def speedtest_compiled_v3(long n):
    cdef long i,f
    f = 1    
    for i in range(n):
        f= (17*f ...

(more)

link

Comments

Thank you very much - I did lots of mistakes (Im beginner in Sage/Python and not much experience in programming in general, so that comment helps!). I experiment with my implementations of modern integer-factoring algorithm, so the integers have usually >>20 digits - so theres no straight-forward way to speed up with Cython I guess?

Mustafa ( 14 years ago )

Depends on your definition of straightforward -- you can use gmp/mpir integers with some work, which are faster than using Python ints -- and on where most of the time is being spent. You don't have to mark up everything, so subproblems which can be done in longs can be sped up and Python overhead removed. But I'd recommend against spending much time worrying about optimizations in an experimental phase.

DSM ( 14 years ago )

add a comment

Time-consuming of compiled files

1 Answer

Comments

Your Answer

Question Tools

Stats

Related questions

Time-consuming of compiled files savecancel

1 Answer

Comments

Your Answer

Question Tools

Stats

Related questions

Time-consuming of compiled files