Ask Your Question
3

perfomance: GAP code vs SAGE code

asked 2011-10-24 10:07:05 +0100

petRUShka gravatar image

updated 2011-10-24 10:07:25 +0100

I have a heavy procedure in GAP. I want to speed it up.

Is it a good idea to rewrite as much as I can on SAGE and use GAP only where It is needed?

For example rewrite parts that works with lists, with iterators and with files.

Or may be GAP realization of such object is faster (because of language C) than SAGE version.

P.S. Of course I hate GAP language and I want to write on SAGE instead. But performance of calculations is very important for me.

edit retag flag offensive close merge delete

6 Answers

Sort by ยป oldest newest most voted
2

answered 2011-10-25 21:26:06 +0100

Volker Braun gravatar image

The GAP<->Sage interface is rather slow. LibGAP will make it much faster at one point, but is not finished yet. So as a general strategy, you should do as much as you can just in Sage or just in GAP.

GAP is very fast for group theoretic computations. But working with lists, iterators, and files will always be beaten by Sage/Cython simply because its compiled code. So, well, it depends on what exactly your problem is.

edit flag offensive delete link more

Comments

Thanks for answer. Please, tell me how to use cython? About pure python SAGE I got not good benchmarks (see my answer on this question).

petRUShka gravatar imagepetRUShka ( 2011-10-26 10:37:16 +0100 )edit
G-Sage gravatar imageG-Sage ( 2011-10-27 13:31:10 +0100 )edit
2

answered 2011-10-26 12:35:13 +0100

Volker Braun gravatar image

Your code is mostly a div/mod benchmark if you run it in the profiler. For example, the following is ~200 times faster. Apart from the obvious use of machine ints, note that appending to a list is much faster than prepending:

%cython
cimport cython
cdef generate_subgroup_list_by_list_of_subgroups_and_number(subgroups, int number):
    cdef int number_subgroups = len(subgroups)
    subgroups_list = []
    cdef int i, exp2, factor
    for i in range(number_subgroups-1, -1, -1): 
        exp2 = 2<<i
        factor = cython.cdiv(number, exp2)
        number = cython.cmod(number, exp2)
        if factor == 1:
            # subgroups_list = [subgroups[i]] + subgroups_list
            subgroups_list.append(subgroups[i])
    return subgroups_list

def dostuff():
    array = range(1,11)
    for i in range(1,101):
       generate_subgroup_list_by_list_of_subgroups_and_number(array, i)
edit flag offensive delete link more
2

answered 2011-10-24 10:24:29 +0100

parzan gravatar image

In my humble experience GAP is rather fast, and I doubt migrating code to sage will speed it up (unless sage has better algorithms for the tasks).

Also take into account that calling to GAP from within sage is slow: http://ask.sagemath.org/question/477/...

edit flag offensive delete link more
1

answered 2011-10-27 04:27:03 +0100

Simon King gravatar image

Hi there!

Concerning "slowness of the interface": That ist a general problem for the pexpect-interfaces. The idea of these interfaces is that Sage pretends to be a user who is interactively working with a GAP/Singular/Magma/... session. So, Sage sends strings that GAP/Singular/Magma interpretes as commands; then Sage waits for the result; when they appear, the strings printed by GAP/Singular/Magma are transferred back to Sage and interpreted.

Of course, this has one big disadvantage: Communication with a sub-process by ascii strings and parsing of printed results has a huge overhead. Therefore, when using such pexpect interfaces, it is sometimes needed to do, for example, a loop in Singular, even though a loop in Sage (using Cython, at least) is much faster.

I give you some examples using the Singular pexpect interface:

sage: P.<x,y,z> = QQ[]
sage: sP = singular(P)
sage: M = singular.matrix(100,100,0)
sage: def test1(M):
....:     for i in range(1,101):
....:         for j in range(1,101):
....:             M[i,j] = singular(i)*singular(j)
....:
sage: def test2(M):
....:     for i in range(1,101):
....:         for j in range(1,101):
....:             M[i,j] = i*j
....:
sage: def test3(M):
....:     singular.eval('int i')
....:     singular.eval('int j')
....:     singular.eval('for (i=1;i<=100;i++){for (j=1;j<=100;j++){%s[i,j]=i*j;}}'%M.name())
....:

The three functions are equivalent:

sage: M = singular.matrix(100,100,0)
sage: test1(M)
sage: N = singular.matrix(100,100,0)
sage: test2(N)
sage: O = singular.matrix(100,100,0)
sage: test3(O)
sage: list(M)==list(N)==list(O)
True

However the running times are vastly different, since the first two functions use the interface very inefficiently:

sage: M = singular.matrix(100,100,0)
sage: %time test1(M)
CPU times: user 8.50 s, sys: 1.96 s, total: 10.46 s
Wall time: 10.57 s
sage: M = singular.matrix(100,100,0)
sage: %time test2(M)
CPU times: user 4.44 s, sys: 1.06 s, total: 5.50 s
Wall time: 5.65 s
sage: M = singular.matrix(100,100,0)
sage: %time test3(M)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.14 s

The overhead of a pexpect interface can be completely avoided by using a C library instead. In the case of Singular, such library is available: libSingular. But, as Volker said, LibGAP is not totally ready, yet.

After stating so many negative things about pexpect, you might wonder why it exists. In fact, the pexpect approach has at least two advantages:

  • The same interface (after modifying certain details: "What is the prompt?", "How is equality tested?" and so on) can be used for interacting with any command line program. In particular, we can offer a pexpect interface to Magma and Maple, but it would be illegal to include them in Sage as C libraries
  • It gives you full access to all ...
(more)
edit flag offensive delete link more
0

answered 2011-10-27 14:52:18 +0100

Simon King gravatar image

My previous example exposed a case in which it is better to do low-level stuff inside Singular/GAP/Magma/... sub-processes.

I guess I also owe you an example that shows how much faster Python and Cython are. Consider an almost empty loop:

sage: singular.eval('int i')
'int i;'
sage: singular.eval('int j')
'int j;'
sage: def test1():
....:     singular.eval('for (i=1;i<=100000;i++){j=i;}')
....:     
sage: def test2():
....:     for i in range(1,100001):
....:         j = i
....:         
sage: def test3():
....:     for i in xrange(1,100001):
....:         j = i
....:         
sage: cython("""
....: def test4():
....:     cdef int i
....:     cdef int j
....:     for i from 1<=i<=100000:
....:         j = i
....: """)
sage: %timeit test1()
5 loops, best of 3: 445 ms per loop
sage: %timeit test2()
125 loops, best of 3: 4.74 ms per loop
sage: %timeit test3()
125 loops, best of 3: 3.27 ms per loop
sage: %timeit test4()
625 loops, best of 3: 126 ns per loop

You see, if such basic things are concerned, Python is much faster than Singular (I don't know how fast GAP is), and Cython much faster than Python.

edit flag offensive delete link more

Comments

You can use the normal range() syntax for cython loops:"for i in range(1,100001):" will produce the same code as "for i from 1<=i<=100000". This was'nt the case in older Cython releases, but nowadays there is no difference. Also, the C compiler will just optimize out the empty loop. So your cython example actually does'nt loop over anything, it only measures the overhead of calling a C function.

Volker Braun gravatar imageVolker Braun ( 2011-10-27 18:21:11 +0100 )edit

Ouch, I sometimes forget how clever Cython is. I knew that Cython doesn't loop if one has `for i in range(N): pass`, but I thought that the assignement `j=i` would prevent it!

Simon King gravatar imageSimon King ( 2011-10-28 10:24:59 +0100 )edit
0

answered 2011-10-26 10:35:27 +0100

petRUShka gravatar image

updated 2011-10-26 11:30:36 +0100

For the test I wrote such code in GAP

GenerateSubgroupListByListOfSubgroupsAndNumber := function (subgroups, number)
  local number_subgroups, subgroups_list, factor, i;

  number_subgroups := Size(subgroups);
  subgroups_list := [];

  for i in Reversed([0..number_subgroups-1]) do
    factor := Int(number / 2^i);
    number := RemInt(number, 2^i);

    if factor = 1 then
      Add(subgroups_list, subgroups[i+1], 1);
    fi;
  od;

  return subgroups_list;
end;

And SAGE equivalent

 def generate_subgroup_list_by_list_of_subgroups_and_number(subgroups, number):
  number_subgroups = len(subgroups);
  subgroups_list = []

  for i in reversed(range(0, number_subgroups)):
    factor, number = int( number / 2^i ), number % 2^i

    if factor == 1:
      subgroups_list = [subgroups[i]] + subgroups_list

  return subgroups_list

I think that code above is a good example of working with lists, numbers and etc.

And I ran following benchmarks:

GAP:

array := [1..10];
for n in [1..10000] do
  for i in [1..100] do
    GenerateSubgroupListByListOfSubgroupsAndNumber(array, i);
  od;
od;

Result:

$ time echo "Read(\"gap_benchmark.g\"); quit;" | gap -b -q -T

real    0m7.651s
user    0m7.520s
sys 0m0.056s

SAGE:

array = [1..10]
for n in [1..10000]:
  for i in [1..100]:
    generate_subgroup_list_by_list_of_subgroups_and_number(array, i)

Result:

$ time sage sage_benchmark.sage 

real    0m50.127s
user    0m49.707s
sys 0m0.268s

So, 7 seconds of GAP work vs 50 second of SAGE.

I will try to use cython with SAGE and will update this answer.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 2011-10-24 10:07:05 +0100

Seen: 1,881 times

Last updated: Oct 27 '11