Ask Your Question
0

Sage stalls during computation

asked 2019-10-11 09:27:37 +0200

Szabolcs gravatar image

updated 2020-06-01 11:41:02 +0200

FrédéricC gravatar image

I was using SageMath 8.8's feedback_edge_set function on a large dataset. The calculation would take a couple of days to finish.

Unfortunately, Sage would keep stalling every few hours. The CPU usage of the process would simply drop to zero, and it would stop doing anything. I had to kill the process (a simple kill worked, no kill -9 necessary).

Has anyone else experienced this? Is it a known problem? Is there a workaround? It is very frustrating to have to keep manually killing and restarting the process. This problem also prevents me from running a calculation overnight: in the morning I might find that it stopped merely an hour after I started it (if I'm unlucky).


More information:

  • I was using Sage on Linux
  • This function uses MixedIntegerLinearProgram. I used the default GLPK (didn't change settings)
  • I used alarm() to set a time limit on each computation.
  • I always believed GLPK not to be interruptible, so I am not even sure which interruption (either manual or alarm) work on Sage with this function. This may be relevant.
  • The memory usage of the process stayed low. It did not stall because the memory got filled and the machine started swapping.

Minimal example:

Put this in a source file:

import datetime
i=1
while True:
    alarm(60)
    try:    
        dg=digraphs.RandomDirectedGNM(300,750);
        fes = dg.feedback_edge_set()
        print(len(fes))
    except AlarmInterrupt:
        print("timeout")
    cancel_alarm()
    print(i)
    print(str(datetime.datetime.now()))
    print("\n")
    i = i+1

Run it as sage sourcefile.sage.

In my last test, it took 291 iterations and ~4-5 hours before it stalled.

It appears to stall only when GLPK is used as the MIP solver (i.e. feedback_edge_set(solver='GLPK'), which is also the default). If I use solver='Coin' (which needs to be installed first using sage -i cbc) then it does not stall. However, the calculation is several times slower with Coin than with GLPK.

edit retag flag offensive close merge delete

Comments

Could you please isolate the graph that causes the issue and see if it is reproducible when the method is applied on that graph ?

tmonteil gravatar imagetmonteil ( 2019-10-12 21:58:16 +0200 )edit

You can provide a short string representation of your graph with:

sage: G.dig6_string()
tmonteil gravatar imagetmonteil ( 2019-10-12 23:29:20 +0200 )edit

@tmonteil It's not because of a specific graph. The stalling appears to be random. I know this because I normally run this on a set of pre-computed graphs, so I could easily re-try on the graph where it stalled. Also, running it twice on the same data won't stall at the same step.

Szabolcs gravatar imageSzabolcs ( 2019-10-13 11:59:34 +0200 )edit

aha, i was trying to idendify the culprit graphs.

tmonteil gravatar imagetmonteil ( 2019-10-13 18:00:05 +0200 )edit

2 Answers

Sort by » oldest newest most voted
0

answered 2019-10-21 13:37:50 +0200

Szabolcs gravatar image

updated 2019-10-21 13:38:33 +0200

This looks like a bug in Sage or Sage's GLPK interface.

The practical solution that worked for me was to run the calculation in a separate process using Sage's parallelization features. It can be done by wrapping the feedback arc set calculation with a few function that has the @fork or @parallel decorators.

I can confirm that even after a several-day running time, the calculation has not stalled.

edit flag offensive delete link more
1

answered 2019-10-11 10:55:26 +0200

tmonteil gravatar image

updated 2019-10-11 10:57:45 +0200

I do not know the issue, but as a workaround, you can try:

sage -i cbc

and restart the computation.

Also, could you please provide the whole code so that someone can try to reproduce the issue ?

edit flag offensive delete link more

Comments

It is not that trivial to boil down the code to a minimal example. If I put in the work to do that, would you be willing to try to reproduce the problem? Trying to reproduce would also be some trouble: It would involve leaving Sage running for up to a day (or running multiple Sage processes for several hours to increase the likelihood that one of them would stall).

Szabolcs gravatar imageSzabolcs ( 2019-10-11 11:00:38 +0200 )edit

Could you reproduce the problem with a large random digraph ?

tmonteil gravatar imagetmonteil ( 2019-10-11 11:57:42 +0200 )edit

You can choose among different models:

sage: digraphs.Random<TAB_COMPLETION>
tmonteil gravatar imagetmonteil ( 2019-10-11 11:59:38 +0200 )edit

Sure—I'll get back to you when I have a small example that reliably fails. It'll take time as it usually takes hours for it to fail.

Szabolcs gravatar imageSzabolcs ( 2019-10-11 20:07:27 +0200 )edit

Some comments on sage -i cbc: this appears to set the default_mip_solver() to "Coin", but feedback_edge_set still uses GLPK by default. I can set it to use Coin manually, but GLPK is far faster. GLPK is also much faster than Gurobi when using this specific function. Perhaps this function uses some feature which is GLPK-specific and that's why it'll have better performance with GLPK. I don't know.

Szabolcs gravatar imageSzabolcs ( 2019-10-11 20:18:18 +0200 )edit

@tmonteil I added a minimal example. Can you try to reproduce the problem?

Szabolcs gravatar imageSzabolcs ( 2019-10-12 20:04:00 +0200 )edit

I am running it for 2 hours now.

tmonteil gravatar imagetmonteil ( 2019-10-12 22:27:39 +0200 )edit

I confirm that it stalled after 264 iterations and 3 hours 25.

tmonteil gravatar imagetmonteil ( 2019-10-13 00:58:29 +0200 )edit

Are you sure about your statement on Glpk vs Coin ? With a single test, i got:

sage: dg=digraphs.RandomDirectedGNM(300,750)
sage: %time fes = dg.feedback_edge_set(solver='Coin')
CPU times: user 19min 35s, sys: 9.29 s, total: 19min 45s
Wall time: 19min 39s
sage: %time fes = dg.feedback_edge_set(solver='Glpk')
CPU times: user 49min 23s, sys: 46.1 ms, total: 49min 23s
Wall time: 49min 23s

Perhaps could you try with Coin solver, and tell us if the problem can be reproduced ? It would tell us if the problem is specific to Glpk or not.

tmonteil gravatar imagetmonteil ( 2019-10-14 00:13:55 +0200 )edit

are you sure feedback_edge_set uses GLPK by default even after installing cbc? to check, set verbose=1, the displayed information are different from a solver to another.

David Coudert gravatar imageDavid Coudert ( 2019-10-14 08:51:51 +0200 )edit

I looked to the code, and there is nothing like this.

tmonteil gravatar imagetmonteil ( 2019-10-14 12:41:04 +0200 )edit

@david Yes, I am sure. If I run %time g.feedback_edge_set(solver='GLPK') then the timing is the same as with %time g.feedback_edge_set(). If I run %time g.feedback_edge_set(solver='Coin') then the timing is several times longer. Therefore, I want to use GLPK for this application.

Szabolcs gravatar imageSzabolcs ( 2019-10-14 12:52:29 +0200 )edit

@tmonteil

sage: set_random_seed(1234)
sage: g=digraphs.RandomDirectedGNM(200,500)
sage: %time g.feedback_edge_set(solver='GLPK')
CPU times: user 27.6 s, sys: 28 ms, total: 27.7 s
Wall time: 27.7 s
sage: %time g.feedback_edge_set(solver='Coin')
CPU times: user 1min 19s, sys: 1.25 s, total: 1min 21s
Wall time: 1min 19s
Szabolcs gravatar imageSzabolcs ( 2019-10-14 12:56:52 +0200 )edit

@tmonteil You are right that the code of feedback_edge_set does not seem to select (or mention) GLPK. Maybe MixedIntegerLinearProgram(solver=None) initializes to use GLPK by default despite default_mip_solver() returning 'Coin'. I don't know how to check what MixedIntegerLinearProgram() has actually initialized to. All I can see is that the timing with solver=None is the same as the timing with solver='GLPK'.

Szabolcs gravatar imageSzabolcs ( 2019-10-14 13:05:00 +0200 )edit

OK. Could you reproduce the problem with explicitely set solver='Coin' ?

tmonteil gravatar imagetmonteil ( 2019-10-14 13:47:53 +0200 )edit

@tomteil It's running, I'll let you know. In the meantime, could you please try if 'Coin' is faster for you on smaller graphs too? I have not found a graph on which 'Coin would be faster than 'GLPK', but I only tried easy ones on which even Coin finishes within 5 min. Your example took much longer than that.

Szabolcs gravatar imageSzabolcs ( 2019-10-14 17:53:59 +0200 )edit

With your own example:

sage: set_random_seed(1234)
sage: g=digraphs.RandomDirectedGNM(200,500)
sage: %time g.feedback_edge_set(solver='GLPK')
CPU times: user 1min 15s, sys: 16.1 ms, total: 1min 15s
Wall time: 1min 15s
[(6, 100),
 (13, 15),
 (13, 79),
 (16, 79),
  ...
sage: %time g.feedback_edge_set(solver='Coin')
CPU times: user 1min 4s, sys: 1.07 s, total: 1min 5s
Wall time: 1min 6s
[(0, 13),
 (6, 100),
 (15, 13),
 (16, 79),
 (24, 9),
  ...
tmonteil gravatar imagetmonteil ( 2019-10-14 20:31:49 +0200 )edit

@tmonteil Wow, that's weird! Do you have any idea what could explain this difference? I verified that only a single CPU core is in use with both. I am using Sage 8.9, built from source without changing any options. The solutions that both GLPK and Coin return seem to be the same ones that you quote.

I also tried the calculation with GLPK on a Mac laptop (where I have not yet managed to install CBC), and it took 20 seconds, which is consistent with the 27 s timing I got on the Linux machine. Why would it be taking a full minute on your computer?

P.S. It does not look like it's going to stall with Coin, but let me keep it running for 24 hrs to be sure.

Szabolcs gravatar imageSzabolcs ( 2019-10-14 21:36:23 +0200 )edit

@tomteil 2692 iterations, well over a day, and still going. We can safely say that with Coin it does not stall. However, using Coin would be a significant step-down because its worse performance. I am still puzzled about why there is no performance difference on your computer.

Szabolcs gravatar imageSzabolcs ( 2019-10-16 09:05:55 +0200 )edit

@tmonteil Are you aware of any feasible workarounds to this stalling? I am trying to understand how parallelization works in Sage. It appears that it will spawn subprocesses which can be killed after a time limit. I am trying to find out if I can run each computation in a new process this way, and kill the process after the time limit (instead of using alarm).

Szabolcs gravatar imageSzabolcs ( 2019-10-17 10:19:31 +0200 )edit

@tmonteil I think I figured it out how to get this done with @parallel

Szabolcs gravatar imageSzabolcs ( 2019-10-17 11:04:21 +0200 )edit

@tmonteil I finally managed to compile Sage on a Mac, and compared Coin and GLPK again. Based on the timings, it still looks like solver=None defaults to GLPK in feedback_edge_set but _not_ in MixedIntergerLinearProgram. Coin is still slower than GLPK in the total time used, however, now it will sometimes use more than one CPU core, thus the wall time becomes competitive with GLPK (though typically still not faster). I do not know why it does not use multiple cores on Linux.

Since you seem to be involved with Sage, can you file an issue for this (solver=None should not be GLPK) if you think it is appropriate to do so?

Szabolcs gravatar imageSzabolcs ( 2019-10-17 13:38:22 +0200 )edit

Yes i will, but i have to find some time to confirm your hypothesis.

tmonteil gravatar imagetmonteil ( 2019-10-21 13:45:35 +0200 )edit

Anyway, thanks a lot for your exploration of the problem.

tmonteil gravatar imagetmonteil ( 2019-10-21 13:47:06 +0200 )edit

@tmonteil Thanks for all the help! I do keep checking back here and I am still interested in understanding what is happening.

Szabolcs gravatar imageSzabolcs ( 2019-10-21 15:53:24 +0200 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2019-10-11 09:27:37 +0200

Seen: 460 times

Last updated: Oct 21 '19