20210409 01:22:53 +0200  received badge  ● Popular Question
(source)

20191210 05:25:48 +0200  received badge  ● Notable Question
(source)

20180324 10:29:10 +0200  received badge  ● Popular Question
(source)

20161124 11:49:04 +0200  received badge  ● Nice Question
(source)

20161123 02:33:33 +0200  asked a question  Parallel Interface to the Sage interpreter 2.0 The parallel Sage interface PSage() http://doc.sagemath.org/html/en/refer... works fine with the given example, but I have trouble with a more complex case, which would be a typical application of this very useful feature. The following code works exactly as advertised: >>> v = [ PSage() for _ in range(5)]
>>> w = [x('factor(2**%s1)'% randint(250,310)) for x in v]
>>> print w
[127 * 13367 * 164511353 * 17137716527 * 51954390877748655744256192963206220919272895548843817842228913,
,
<<currently executing code>>,
3 * 5^2 * 11 * 31 * 41 * 53 * 131 * 157 * 521 * 1613 * 2731 * 8191 * 51481 * 409891 * 7623851 * 34110701 * 108140989558681 * 145295143558111,
]
[127 * 13367 * 164511353 * 17137716527 * 51954390877748655744256192963206220919272895548843817842228913,
7 * 73 * 16183 * 34039 * 1437967 * 2147483647 * 833732508401263 * 658812288653553079 * 2034439836951867299888617,
<<currently executing code>>,
3 * 5^2 * 11 * 31 * 41 * 53 * 131 * 157 * 521 * 1613 * 2731 * 8191 * 51481 * 409891 * 7623851 * 34110701 * 108140989558681 * 145295143558111,
]
[127 * 13367 * 164511353 * 17137716527 * 51954390877748655744256192963206220919272895548843817842228913,
7 * 73 * 16183 * 34039 * 1437967 * 2147483647 * 833732508401263 * 658812288653553079 * 2034439836951867299888617,
<<currently executing code>>,
3 * 5^2 * 11 * 31 * 41 * 53 * 131 * 157 * 521 * 1613 * 2731 * 8191 * 51481 * 409891 * 7623851 * 34110701 * 108140989558681 * 145295143558111,
7 * 78903841 * 28753302853087 * 618970019642690137449562111 * 24124332437713924084267316537353]
[127 * 13367 * 164511353 * 17137716527 * 51954390877748655744256192963206220919272895548843817842228913,
7 * 73 * 16183 * 34039 * 1437967 * 2147483647 * 833732508401263 * 658812288653553079 * 2034439836951867299888617,
131071 * 12761663 * 179058312604392742511009 * 3320934994356628805321733520790947608989420068445023,
3 * 5^2 * 11 * 31 * 41 * 53 * 131 * 157 * 521 * 1613 * 2731 * 8191 * 51481 * 409891 * 7623851 * 34110701 * 108140989558681 * 145295143558111,
7 * 78903841 * 28753302853087 * 618970019642690137449562111 * 24124332437713924084267316537353]
Printing w repeatedly shows the progress of the five factorizations running in parallel (monitoring it looking at top is showing 5 sage/python jobs running simultaneously).
The following example is global optimization, starting from 5 different starting points using the differential evolution algorithm available in SciPy. The setup is more complex but still, only a single command string is passed to PSage() . >>> v = [ PSage() for _ in range(5)]
>>> w = [x('from scipy.optimize import rosen, differential_evolution; differential_evolution(rosen, [(0,2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2)])') for x in v]
>>> print w
[Sage, Sage, Sage, Sage, Sage]
Apparently, something is wrong here, it doesn't work. But why? Let's see what happens with the serial Sage interpreter Sage() . >>> s = Sage()
>>> t = s('from scipy.optimize import rosen, differential_evolution;differential_evolution(rosen,[(0,2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2)])')
>>> print t
Sage
Looks like the same problem. However, Sage() can be made to work using the eval() method. >>> s = Sage()
>>> t = s.eval('from scipy.optimize import rosen, differential_evolution;differential_evolution(rosen,[(0,2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2)])')
>>> print t
fun: 1.2785524204717224e18
PSage() also has an eval() method, which AFAIK uses Sage().eval() internally, but unfortunately, it doesn't work. >>> v = [ PSage() for _ in range(5)]
>>> w = [x.eval('from scipy.optimize import rosen, differential_evolution; differential_evolution(rosen, [(0,2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2)])') for x in v]
>>> print w
['<<currently executing code>>', '<<currently
executing code>>', '<<currently executing code>>',
'<<currently executing code>>', '<<currently executing
code>>']
Based on what I see looking at top the five optimization jobs do run in parallel, but even after they finish, no matter how many times I print w all I get is <<currently executing code>> . The bottom line is that PSage() is either not working at all without using the eval() method, or, it seems to be working with the eval() method, but somehow is stuck in a bad internal state and never producing the output. Any comment is highly appreciated, it would be great to get this to work consistently. 
20161123 01:11:43 +0200  received badge  ● Enthusiast

20161115 17:23:32 +0200  answered a question  Parallel Interface to the Sage interpreter It turns out that the problem has nothing to do with scalar vs. vector variable. The bottom line is that PSage() takes a single STRING as parameter enclosed in quotes. The following (somewhat awkward) modification makes the parallel interface work properly. from scipy.optimize import rosen
x0 = [1.3, 0.7, 0.8, 1.9, 1.2]
rosen(x0)
848.22000000000003
v = [PSage() for _ in range(3)]
w = [x(eval('rosen(%s)'% str(x0))) for x in v]
w
[848.22, 848.22, 848.22]
Unfortunately, using eval() does not solve the problem. Because the example here is instantaneous to compute, I haven't realized that the calculations were, in fact, executed sequentially and not in parallel. After looking at the PSage() code it is clear that the argument must be a single string and the fact that this example "worked" although not in parallel, must be some artifact of using an explicit eval() . Mea culpa, HOWEVER, I created a better example and opened a new ticket, because PSage() would be an exteremely good tool for this kind of thing. 
20161115 15:04:56 +0200  received badge  ● Nice Question
(source)

20161115 01:16:14 +0200  asked a question  Parallel Interface to the Sage interpreter I am using the PSage() parallel interpreter as described here http://doc.sagemath.org/html/en/refer.... The interface works fine to evaluate multiple instances of function calls to functions of one or more scalar variables, however, if a function variable is arraylike, the interface doesn't seem to work any more. I wonder if any special quoting is necessary for arraylike arguments. Here is a minimal example. For reference, I copy the example from the above page, which works just fine, factor() has a single scalar variable. v = [PSage() for _ in range(3)]
w = [x('factor(2^%s1)'% randint(250,310)) for x in v]
w
[4057 * 8191 * 6740339310641 * 3340762283952395329506327023033,
31 * 13367 * 2940521 * 164511353 * 70171342151 *
3655725065508797181674078959681,
31 * 13367 * 2940521 * 164511353 * 70171342151 *
3655725065508797181674078959681]
However, the rosen() function with a single arraylike/vector argument doesn't seem to work in the parallel interface. (The example below just calculates the same function value three times, but that is not the point here.) from scipy.optimize import rosen
x0 = [1.3, 0.7, 0.8, 1.9, 1.2]
rosen(x0)
848.22000000000003
v = [PSage() for _ in range(3)]
w = [x('rosen(x0)') for x in v]
w
[Sage, Sage, Sage]
Does anyone have any suggestion? 
20160930 08:45:00 +0200  received badge  ● Popular Question
(source)

20160930 08:45:00 +0200  received badge  ● Notable Question
(source)

20160911 16:28:27 +0200  received badge  ● Notable Question
(source)

20160911 16:28:27 +0200  received badge  ● Popular Question
(source)

20160911 16:28:27 +0200  received badge  ● Famous Question
(source)

20160808 09:02:08 +0200  received badge  ● Notable Question
(source)

20160808 09:02:08 +0200  received badge  ● Popular Question
(source)

20160808 09:02:08 +0200  received badge  ● Famous Question
(source)

20160622 06:05:46 +0200  received badge  ● Necromancer
(source)

20160622 05:00:13 +0200  received badge  ● Scholar
(source)

20160621 23:18:16 +0200  answered a question  calling a parallel decorated function on an iterator Niles is right. I also ran into this problem recently but found a straightforward way to solve it. I have some sample code below with detailed explanation. First, let's see a simple comparison between serial and parallel execution of a function. @parallel(p_iter='multiprocessing', ncpus=3)
def test_parallel(n):
f = factor(2^128+n)
return len(f)
t=walltime()
r = range(1000)
p = sorted(list( test_parallel(r)))
print p[1]
print walltime(t)
t=walltime()
for i in range(1000):
f = factor(2^128+i)
print f
print walltime(t)
(((999,), {}), 5)
6.359593153
5 * 23 * 383 * 1088533 * 7097431848554855065803619703
17.0849101543
test_parallel is a simple function that takes a nontrivial time to execute for testing purposes. It returns the number of distinct factors of the factorization of 2^128+n. The argument of test_parallel is a list created by the range function. Note that this has to be a list, there is currently no alternative, so e.g. xrange cannot be used in place of range because xrange generates numbers on the fly rather than creating a whole list of them. This can be a serious problem (mainly memory problem) but it can be overcome as will be shown further below. test_parallel as any parallel decorated function returns a special object, which is an iterator over 2tuples and the order of the 2tuples is entirely random! So, the output (((999,), {}), 5) above representing the last item in the calculation p[1] includes the input value of n=999 , an empty input keyword list/dictionary {} , and the return value of 5 . It should be noted that in order to be able to parse the output from test_parallel it should be cast to a sorted list. In this particular run the parallel calculation (using 3 cores) took some 6 seconds whereas at the bottom of the sample code the serial equivalent took some 17 seconds to execute (and the result of the last factorization confirms that there were 5 distinct factors). This is all well and my great appreciation to the developers. Unfortunately, however, a serious problem arises when the list argument to a parallel function grows too big. One solution that has worked very well for me involves chunks as Niles suggested plus numpy arrays. The following code is a more robust and significantly more efficient alternative to the naive parallel code above. %timeit
import numpy as np
sizof_chunk = 10^3
numof_chunk = 10^2
np_array = np.zeros((sizof_chunk*numof_chunk,), dtype=np.uint8)
for i in range(numof_chunk):
beg = i * sizof_chunk
end = beg + sizof_chunk
tuples = sorted(list(test_parallel(range(beg,end))))
iter = [ x[1] for x in tuples ]
np_array[beg:end] = np.fromiter(iter, np.uint8)
print np_array
[1 2 3 ..., 6 8 3]
CPU time: 13.88 s, Wall time: 670.06 s
sizof_chunk is set to the same number 1000 and numof_chunk can be set to anything. If it is set to 1 then the calculation will be the exact same as above (and will take about 6 seconds ... (more) 
20160618 04:19:21 +0200  commented answer  Performance issues with parallel decoration In version 7.2 I no longer see the load balancing issue. Istvan 
20160618 03:25:10 +0200  received badge  ● Popular Question
(source)

20160423 02:37:10 +0200  received badge  ● Notable Question
(source)

20160423 02:37:10 +0200  received badge  ● Popular Question
(source)

20151117 11:49:28 +0200  received badge  ● Popular Question
(source)

20150905 03:21:27 +0200  commented answer  How make Notebook not to write in /tmp I tried both and SAGENB_TMPDIR doesn't work, but setting TMPDIR does. Thanks! 
20150904 22:03:39 +0200  asked a question  How make Notebook not to write in /tmp I am using Sage Notebook to generate very large files and then process them. The size of the files is tens of Gigabytes but that wouldn't be a problem the way things are set up in my worksheet. However, it seems no matter where the worksheet is stored the associated files are temporarily stored in /tmp whil ethe worksheet is working. For example, $ ll /raid/istvan/Playground/Sage.sagenb/home/__store__/2/21/212/2123/admin/17/cells/4/
total 8
drwx 2 istvan istvan 4096 Sep 4 15:36 ./
drwxrwxrx 9 istvan istvan 4096 Sep 4 15:34 ../
lrwxrwxrwx 1 istvan istvan 49 Sep 4 15:36 A.mmap > /tmp/tmpa6CJmh/A.mmap
lrwxrwxrwx 1 istvan istvan 49 Sep 4 15:36 B.mmap > /tmp/tmpa6CJmh/B.mmap
lrwxrwxrwx 1 istvan istvan 49 Sep 4 15:36 C.mmap > /tmp/tmpa6CJmh/C.mmap
lrwxrwxrwx 1 istvan istvan 49 Sep 4 15:36 D.mmap > /tmp/tmpa6CJmh/D.mmap
$ ll /tmp/tmpa6CJmh
total 68860
drwx 2 istvan istvan 4096 Sep 4 15:35 ./
drwxrwxrwt 14 root root 12288 Sep 4 15:35 ../
rwrwr 1 istvan istvan 1688 Sep 4 15:35 ___code___.py
rwrwr 1 istvan istvan 10000000000 Sep 4 15:37 A.mmap
lrwxrwxrwx 1 istvan istvan 54 Sep 4 15:35 data > /raid/istvan/Playground/Sage.sagenb/home/admin/17/data/
rwrwr 1 istvan istvan 10000000000 Sep 4 15:37 B.mmap
rwrwr 1 istvan istvan 80000000000 Sep 4 15:37 C.mmap
rwrwr 1 istvan istvan 10000000000 Sep 4 15:37 D.mmap
rwrwr 1 istvan istvan 2219 Sep 4 15:35 _sage_input_6.py
The total size of the files is about 110 GB and I have plenty of room on the /raid partition where the Sage Notebook resides, but my / including /tmp partition is way too small for that. How can I make the Notebook NOT to write to /tmp? Is there an envar to set that? Thanks for any suggestion, Istvan 
20150402 14:09:14 +0200  commented answer  Is HDF5 or the Python interface h5py supported in Sage? Download the package from http://www.hdfgroup.org/downloads/index.html and the installation is straightforward. On Linux: 1) untar the downloaded file
2) cd to the hdf5 directory
3) ./configure prefix=/where/you/want/hdf5/to/be/installed (in my case it was /usr/local/hdf5)
4) make
5) make check
6) sudo make install (sudo needed if the location is not in your own user area)
7) sudo make checkinstall

20150401 19:18:23 +0200  received badge  ● SelfLearner
(source)

20150401 08:51:28 +0200  answered a question  Cannot add a comment to my own question I could answer my question as opposed to comment on it. It could be my browser, I don't know but let's consider this ticket closed. 
20150401 08:48:10 +0200  answered a question  Is HDF5 or the Python interface h5py supported in Sage? Of course, hdf5 must be installed first and since it is not a Python package, pip will likely need explicit information about hdf5 libraries and include files. The following command worked for me: $ sage pip install globaloption=build_ext globaloption="L/usr/local/hdf5/lib" globaloption="l/usr/local/hdf5/lib" globaloption="I/usr/local/hdf5/include" globaloption="R/usr/local/hdf5/lib" h5py 
20150331 22:24:14 +0200  asked a question  Cannot add a comment to my own question I am trying to get further help regarding "Is HDF5 or the Python interface h5py supported in Sage?" I added my related question as a comment (while logged in) twice, the site seemed to have processed it but it won't show up under the above title. Should I open a new ticket? UPDATE This is fixed now :) 
20150318 14:06:25 +0200  commented answer  Is HDF5 or the Python interface h5py supported in Sage? Got it. Great, thank you! Istvan 
20150317 18:37:47 +0200  asked a question  Is HDF5 or the Python interface h5py supported in Sage? I am saving very large list objects to disk using the save() command in Sage, which is utilizing Python's Pickles package. There is a known deficiency/bug that is unlikely to go away, namely that deeply buried in the compression code in the Python standard library that Pickles and therefore save() use, there are legacy 32bit integers that result in a serious limitation in using save() with even moderately large (hundreds of MB) objects. See OverflowError: size does not fit in an int. The h5py package provides a Python interface to the HDF5 library and HDF5 can deal with multiple terabytes easily. Does anyone know if h5py is/will be implemented in Sage? Is there another alternative to save() in Sage to save objects to disk? 
20150317 17:33:14 +0200  commented question  How to change the prefix to SAGE_TMP? I am using v6.4.1.~/.sage/temp/ is different. If I run a Notebook session, its files are stored under ~/.sage/notebook/... but while a Sage calculation is running all the files there are just links to /tmp/something where /tmp/something is the value of SAGE_TMP in that session. I am saving large objects via the save() command and that's where I have the problem, because they can't fit in /tmp . There has got to be a way in Sage to set the prefix for SAGE_TMP from /tmp to something else. 
20150316 10:27:54 +0200  received badge  ● Nice Question
(source)

20150313 23:43:56 +0200  received badge  ● Editor
(source)

20150313 23:42:18 +0200  asked a question  How to change the prefix to SAGE_TMP? SAGE_TMP looks something like this by default: /tmp/tmpGMP2PR . My /tmp is full but I have plenty of space in a different tmp directory on another disk. How can I change the default prefix in SAGE_TMP from /tmp to, say, /raid/scratch ? Thanks. 
20150304 21:15:56 +0200  commented answer  Performance issues with parallel decoration Thanks, Vincent. Of course, you are right this is not an exact apples to apples comparison and your point about the calculation being too fast is valid. I too experimented with very large numbers and indeed parallel performance is better. Nonetheless I do see 6 python jobs starting out but very quickly four of them finish and only two and then only one is running for quite a while which means that load balancing is far from ideal. I'll check to source code to see how the list is passed to the function. Thanks again, Istvan 
20150303 05:39:23 +0200  commented answer  How to format questions in this forum 
20150303 02:10:08 +0200  asked a question  How to format questions in this forum Sorry, I am new to this forum but already find it very helpful. I noticed that most posts have nicely formatted code snippets, but I couldn't figure out how to do it. When cutandpaste from notebook() my code looks awful and unformatted. Thanks for any suggestions. 
20150303 02:04:53 +0200  asked a question  Performance issues with parallel decoration Experimenting with @parallel resulted in unexpected performance issues in Sage 6.4.1. Here is a very simple example: @parallel(p_iter='multiprocessing', ncpus=6)
def f(n):
return factor(n)
t=walltime()
r = range(1,1000000)
p = sorted(list( f(r)))
print walltime(t)
82.0724880695
t=walltime()
for i in range(1,1000000):
factor(i)
print walltime(t)
12.1648099422
I have 6 physical cores, yet the serial calculation runs more than 6 times faster, even though I can see 6 instances of python running on my computer. Maybe it is pilot error, I have the following questions:
1) Does Sage require a special way of compiling it in order to take full advantage of @parallel?
2) In this case using 'fork' is even worse, it never completes the calculation.
3) How does @parallel distribute the calculations? Since, in general, it takes significantly longer for factor() to process larger numbers, it seems that assigning the case n=1,7,13,... to core_0, n=2,8,14,... to core_1, etc., makes sense. Shuffling the original serial list given to f(n) also seems plausible. However, dividing the whole serial range to 6 intervals and assigning them to the 6 cores, respectively, would be a bad choice and for most of the time only one or two python processes would do anything. Does anyone know what scheme is used in Sage? Thanks for any suggestions. 