Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

answered 2011-01-11 02:10:59 -0500

DSM gravatar image

First, while compiling something will often cause a slight increase in speed when the loop itself is a problem, mostly the real speedup comes not just from compiling functions but by marking them up with type information ("cdef int a") and so on.

The reason interpreted languages like Python are slower than compiled languages isn't because of the interpreter, that's a common misconception. (If that were so then removing the interpreter loop and executing the bytecode directly would be fast, and it's not: people have tried.) The reason they're slow is more because of "multiple dispatch": when you give Python "a+b", it has to check the types of a and b, see whether there's an add method, etc., all of which takes lots and lots of cycles while C just compiles to (say) one flop and a copy. So the real gain comes from marking things up with type information ("cdef int a"), but of course this comes at a cost: C doesn't have a type corresponding to an arbitrary-length integer, for example.

As for your particular functions, you can understand the behaviour if you actually look at what the functions return (I added a "return f" to the end of each function, which is something I think you probably should have done first):


sage: time speedtest(10**2)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
648037159632282329931500412681871807401802252780713670001839379
sage: time speedtest_compiled(10**2)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
1232110414200870712562346350972327740484463519610977249336410934754319533890407441902694104728894548432498279447853775120001L

Two things are going on: the first is that the operations are binding differently, so it's parsing f=17f^2 % 123.. as f=17f^(2 %123..) , or f=17*f^2, so no mod is being taken. The second is that the "^" is being interpreted as an XOR operation (like in C), not as an exponentiation. To see what code is actually being generated, I looked in .sage/temp, and it showed that it was interpreting it as


 *     f = 1
 *     for i in range(0,n):
 *         f= 17*f^2 % 1234565432134567898765456787654323456787654345678765456787654321             # <<<<<<<<<<<<<<
 *     return f
 * 
 */
    __pyx_t_3 = PyNumber_Multiply(__pyx_int_17, __pyx_v_f); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 9; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_3);
    __pyx_t_4 = PyNumber_Xor(__pyx_t_3, __pyx_int_2); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 9; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_4);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __Pyx_DECREF(__pyx_v_f);
    __pyx_v_f = __pyx_t_4;
    __pyx_t_4 = 0;
  }
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
which isn't what we wanted. The following code
def speedtest_compiled_v2(n):
 f = 1  
 for i in range(0,n):
    f=(17*f**2) % 1234565432134567898765456787654323456787654345678765456787654321
 return f
should return the right values, but it still won't be faster. Where the real speed gain comes in is when you can do this:

#pure python
def speedtest_v3(n):
    f = 1    
    for i in range(n):
        f= (17*f*f) % 1234565
    return f

# cython
def speedtest_compiled_v3(long n):
    cdef long i,f
    f = 1    
    for i in range(n):
        f= (17*f*f) % 1234565
    return f

sage: time speedtest_v3(10**6)
CPU times: user 0.72 s, sys: 0.00 s, total: 0.73 s
Wall time: 0.77 s
599323
sage: time speedtest_compiled_v3(10**6)
CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s
Wall time: 0.02 s
599323

Anything where you have operations in the innermost loop which you can't easily imagine writing in C simply aren't going to get much faster, I'm afraid.

First, while compiling something will often cause a slight increase in speed when the loop itself is a problem, mostly the real speedup comes not just from compiling functions but by marking them up with type information ("cdef int a") and so on.

The reason interpreted languages like Python are slower than compiled languages isn't because of the interpreter, that's a common misconception. (If that were so then removing the interpreter loop and executing the bytecode directly would be fast, and it's not: people have tried.) The reason they're slow is more because of "multiple dispatch": when you give Python "a+b", it has to check the types of a and b, see whether there's an add method, etc., all of which takes lots and lots of cycles while C just compiles to (say) one flop and a copy. So the real gain comes from marking things up with type information ("cdef int a"), but of course this comes at a cost: C doesn't have a type corresponding to an arbitrary-length integer, for example.

As for your particular functions, you can understand the behaviour if you actually look at what the functions return (I added a "return f" to the end of each function, which is something I think you probably should have done first):


sage: time speedtest(10**2)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
648037159632282329931500412681871807401802252780713670001839379
sage: time speedtest_compiled(10**2)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
1232110414200870712562346350972327740484463519610977249336410934754319533890407441902694104728894548432498279447853775120001L

Two things are going on: the first is that the operations are binding differently, so it's parsing f=17f^2 f=17*f^2 % 123.. as f=17f^(2 f=17*f^(2 %123..) , or f=17*f^2, so no mod is being taken. The second is that the "^" is being interpreted as an XOR operation (like in C), not as an exponentiation. To see what code is actually being generated, I looked in .sage/temp, and it showed that it was interpreting it as


 *     f = 1
 *     for i in range(0,n):
 *         f= 17*f^2 % 1234565432134567898765456787654323456787654345678765456787654321             # <<<<<<<<<<<<<<
 *     return f
 * 
 */
    __pyx_t_3 = PyNumber_Multiply(__pyx_int_17, __pyx_v_f); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 9; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_3);
    __pyx_t_4 = PyNumber_Xor(__pyx_t_3, __pyx_int_2); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 9; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_4);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __Pyx_DECREF(__pyx_v_f);
    __pyx_v_f = __pyx_t_4;
    __pyx_t_4 = 0;
  }
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
which isn't what we wanted. The following code
def speedtest_compiled_v2(n):
 f = 1  
 for i in range(0,n):
    f=(17*f**2) % 1234565432134567898765456787654323456787654345678765456787654321
 return f
should return the right values, but it still won't be faster. Where the real speed gain comes in is when you can do this:

#pure python
def speedtest_v3(n):
    f = 1    
    for i in range(n):
        f= (17*f*f) % 1234565
    return f

# cython
def speedtest_compiled_v3(long n):
    cdef long i,f
    f = 1    
    for i in range(n):
        f= (17*f*f) % 1234565
    return f

sage: time speedtest_v3(10**6)
CPU times: user 0.72 s, sys: 0.00 s, total: 0.73 s
Wall time: 0.77 s
599323
sage: time speedtest_compiled_v3(10**6)
CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s
Wall time: 0.02 s
599323

Anything where you have operations in the innermost loop which you can't easily imagine writing in C simply aren't isn't going to get much faster, I'm afraid.

First, while compiling something will often cause a slight increase in speed when the loop itself is a problem, mostly the real speedup comes not just from compiling functions but by marking them up with type information ("cdef int a") and so on.

The reason interpreted languages like Python are slower than compiled languages isn't because of the interpreter, that's a common misconception. (If that were so then removing the interpreter loop and executing the bytecode directly would be fast, and it's not: people have tried.) The reason they're slow is more because of "multiple dispatch": when you give Python "a+b", it has to check the types of a and b, see whether there's an add method, etc., all of which takes lots and lots of cycles while C just compiles to (say) one flop and a copy. So the real gain comes from marking things up with type information ("cdef int a"), specifying types, but of course this comes at a cost: C doesn't have a type corresponding to an arbitrary-length integer, for example.

As for your particular functions, you can understand the behaviour if you actually look at what the functions return (I added a "return f" to the end of each function, which is something I think you probably should have done first):


sage: time speedtest(10**2)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
648037159632282329931500412681871807401802252780713670001839379
sage: time speedtest_compiled(10**2)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
1232110414200870712562346350972327740484463519610977249336410934754319533890407441902694104728894548432498279447853775120001L

Two things are going on: the first is that the operations are binding differently, so it's parsing f=17*f^2 % 123.. as f=17*f^(2 %123..), or f=17*f^2, so no mod is being taken. The second is that the "^" is being interpreted as an XOR operation (like in C), not as an exponentiation. To see what code is actually being generated, I looked in .sage/temp, and it showed that it was interpreting it as


 *     f = 1
 *     for i in range(0,n):
 *         f= 17*f^2 % 1234565432134567898765456787654323456787654345678765456787654321             # <<<<<<<<<<<<<<
 *     return f
 * 
 */
    __pyx_t_3 = PyNumber_Multiply(__pyx_int_17, __pyx_v_f); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 9; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_3);
    __pyx_t_4 = PyNumber_Xor(__pyx_t_3, __pyx_int_2); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 9; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_4);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __Pyx_DECREF(__pyx_v_f);
    __pyx_v_f = __pyx_t_4;
    __pyx_t_4 = 0;
  }
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
which isn't what we wanted. The following code
def speedtest_compiled_v2(n):
 f = 1  
 for i in range(0,n):
    f=(17*f**2) % 1234565432134567898765456787654323456787654345678765456787654321
 return f
should return the right values, but it still won't be faster. Where the real speed gain comes in is when you can do this:

#pure python
def speedtest_v3(n):
    f = 1    
    for i in range(n):
        f= (17*f*f) % 1234565
    return f

# cython
def speedtest_compiled_v3(long n):
    cdef long i,f
    f = 1    
    for i in range(n):
        f= (17*f*f) % 1234565
    return f

sage: time speedtest_v3(10**6)
CPU times: user 0.72 s, sys: 0.00 s, total: 0.73 s
Wall time: 0.77 s
599323
sage: time speedtest_compiled_v3(10**6)
CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s
Wall time: 0.02 s
599323

Anything where you have operations in the innermost loop which you can't easily imagine writing in C simply isn't going to get much faster, I'm afraid.

First, while compiling something will often cause a slight increase in speed when the loop itself is a problem, mostly the real speedup comes not just from compiling functions but by marking them up with type information ("cdef int a") and so on.

The reason interpreted languages like Python are slower than compiled languages isn't because of the interpreter, that's a common misconception. (If that were so then removing the interpreter loop and executing the bytecode directly would be fast, and it's not: people have tried.) The reason they're slow is more because of "multiple dispatch": when you give Python "a+b", it has to check the types of a and b, see whether there's an add method, etc., all of which takes lots and lots of cycles while C just compiles to (say) one flop and a copy. So the real gain comes from specifying types, but of course this comes at a cost: C doesn't have a type corresponding to an arbitrary-length integer, for example.

As for your particular functions, you can understand the behaviour if you actually look at what the functions return (I added a "return f" to the end of each function, which is something I think you probably should have done first):


sage: time speedtest(10**2)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
648037159632282329931500412681871807401802252780713670001839379
sage: time speedtest_compiled(10**2)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
1232110414200870712562346350972327740484463519610977249336410934754319533890407441902694104728894548432498279447853775120001L

Two things are going on: the first is that the operations are binding differently, so it's parsing f=17*f^2 % 123.. as f=17*f^(2 %123..), or f=17*f^2, so no mod is being taken. The second is that the "^" is being interpreted as an XOR operation (like in C), not as an exponentiation. To see what code is actually being generated, I looked in .sage/temp, and it showed that it was interpreting it as


 *     f = 1
 *     for i in range(0,n):
 *         f= 17*f^2 % 1234565432134567898765456787654323456787654345678765456787654321             # <<<<<<<<<<<<<<
 *     return f
 * 
 */
    __pyx_t_3 = PyNumber_Multiply(__pyx_int_17, __pyx_v_f); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 9; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_3);
    __pyx_t_4 = PyNumber_Xor(__pyx_t_3, __pyx_int_2); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 9; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_4);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __Pyx_DECREF(__pyx_v_f);
    __pyx_v_f = __pyx_t_4;
    __pyx_t_4 = 0;
  }
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
which isn't what we wanted. The following code
def speedtest_compiled_v2(n):
  f = 1     for i in range(0,n):
     f=(17*f**2) % 1234565432134567898765456787654323456787654345678765456787654321
  return f
should return the right values, but it still won't be faster. Where the real speed gain comes in is when you can do this:

#pure python
def speedtest_v3(n):
    f = 1    
    for i in range(n):
        f= (17*f*f) % 1234565
    return f

# cython
def speedtest_compiled_v3(long n):
    cdef long i,f
    f = 1    
    for i in range(n):
        f= (17*f*f) % 1234565
    return f

sage: time speedtest_v3(10**6)
CPU times: user 0.72 s, sys: 0.00 s, total: 0.73 s
Wall time: 0.77 s
599323
sage: time speedtest_compiled_v3(10**6)
CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s
Wall time: 0.02 s
599323

Anything where you have operations in the innermost loop which you can't easily imagine writing in C simply isn't going to get much faster, I'm afraid.