Computation gets killed every couple of hours

asked 2014-07-10 11:55:19 -0500

yeohaikal gravatar image

updated 2014-07-11 18:17:34 -0500

I'm running the following computation (in parallel for different range of curves) but sage keps killing my computation every couple of hours. For the first few curves, it does fine and then when computation time takes something in the range of >3000s (~1hr), it will suddenly kill the computation without any warning. This had led to a lot of frustration because there is no way for me to tell what is actually going wrong.

We initially thought it was a memory leak issue and hence we added in gc.enable and this only helped for the simpler computations. It still didn't seem to make any difference for the ones that would have failed in the case where gc.enable was not added in previously.

I can understand if the computations may take a really long because the computation for the L-series is exponential (?), but that doesn't seem to be a reason for the computation to get killed. Also, when I start again at the last prime p that the computation gets killed at, sometimes it works and the computation carries on but gets killed later, sometimes it doesn't work at all.

Could anybody potentially help to remedy this or point me in the direction of how to make the code better? Thanks.

curves_list= ['30502b1', '30503b1', '30518c1', '30518d1', '30519f1', '30520a1', '30525w1', '30525x1', '30525bb1', '30530c1', '30530f1', '30534a1', '30535a1', '30535c1', '30537i1', '30537l1', '30544f1', '30544k1', '30550b1', '30550e1', '30550n1', '30550q1', '30550v1', '30550y1', '30558a1', '30558d1', '30564b1', '30564c1', '30564l1', '30565b1', '30566b1', '30573d1', '30575c1', '30576b1', '30576u1', '30576bf1', '30576bz1', '30576cb1', '30576cr1', '30576cs1', '30585c1', '30589b1', '30589d1', '30594a1', '30594b1', '30594d1']

Step 3: find the last curve you computed data for

import gc gc.enable() i = curves_list.index('30519f1') #this is the curve for which it stopped

Step 4: put in the [i:] in the next line

for x in curves_list[i:]: ##put in [i:] a = gc.collect() E = EllipticCurve(x) print E.cremona_label() sys.stdout.flush()

Step 5: now add this as a case to pick up where you left off

if x == '30519f1':
    ##add in the starting prime
    for p in prime_range(224,1000):
        if E.is_good(p) and E.is_ordinary(p):
            t1 = cputime()
            output = E.sha().p_primary_bound(p)
            print 'memory usage: ', get_memory_usage()
            a=gc.collect()
            t2 = cputime()
            a=gc.collect()
            print 'bound at p=%s is %s'%(p,output)
            sys.stdout.flush()
            print 'memory usage: ', get_memory_usage()
            sys.stdout.flush()
            a=gc.collect()
            if output > 0:
                print 'BOUND IS > 0 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
                sys.stdout.flush()
            print 'time to compute: ', t2 - t1
            sys.stdout.flush()
            a=gc.collect()

Step 6: add in an else and the code below

else:
    time_count_E_start = cputime()
    for p in prime_range(5,1000):
        if E.is_good(p) and E.is_ordinary(p):
            t1 = cputime()
            output = E.sha().p_primary_bound(p)
            print 'memory usage: ', get_memory_usage()
            a=gc.collect()
            t2 = cputime()
            a=gc.collect()
            print 'bound at p=%s is %s'%(p,output)
            sys.stdout.flush()
            print 'memory usage: ', get_memory_usage()
            sys.stdout.flush()
            a=gc.collect()
            if output > 0:
                print 'BOUND IS > 0 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
                sys.stdout.flush()
            print 'time to ...
(more)
edit retag flag offensive close merge delete

Comments

Question: is this in the notebook or command line or cloud? You can set timeouts for the notebook, and the command line should not have this issue.

kcrisman gravatar imagekcrisman ( 2014-07-10 15:46:46 -0500 )edit

This is in the cloud and using a notebook. I have infinite timeout on the cloud so it's not a timeout issue. When looking at the results, the memory usage shows that it doesn't fully reset or remain constant. Should this be the case even thought I already have gc.collect()?

yeohaikal gravatar imageyeohaikal ( 2014-07-10 18:06:23 -0500 )edit