Ask Your Question

ikol's profile - activity

2018-03-24 04:29:10 -0500 received badge  Popular Question (source)
2016-11-24 04:49:04 -0500 received badge  Nice Question (source)
2016-11-22 19:33:33 -0500 asked a question Parallel Interface to the Sage interpreter 2.0

The parallel Sage interface PSage()http://doc.sagemath.org/html/en/refer... works fine with the given example, but I have trouble with a more complex case, which would be a typical application of this very useful feature. The following code works exactly as advertised:

>>> v = [ PSage() for _ in range(5)]
>>> w = [x('factor(2**%s-1)'% randint(250,310)) for x in v]
>>> print w
[127 * 13367 * 164511353 * 17137716527 * 51954390877748655744256192963206220919272895548843817842228913,
 ,
 <<currently executing code>>,
 3 * 5^2 * 11 * 31 * 41 * 53 * 131 * 157 * 521 * 1613 * 2731 * 8191 * 51481 * 409891 * 7623851 * 34110701 * 108140989558681 * 145295143558111,
 ]
[127 * 13367 * 164511353 * 17137716527 * 51954390877748655744256192963206220919272895548843817842228913,
 7 * 73 * 16183 * 34039 * 1437967 * 2147483647 * 833732508401263 * 658812288653553079 * 2034439836951867299888617,
 <<currently executing code>>,
 3 * 5^2 * 11 * 31 * 41 * 53 * 131 * 157 * 521 * 1613 * 2731 * 8191 * 51481 * 409891 * 7623851 * 34110701 * 108140989558681 * 145295143558111,
 ]
[127 * 13367 * 164511353 * 17137716527 * 51954390877748655744256192963206220919272895548843817842228913,
 7 * 73 * 16183 * 34039 * 1437967 * 2147483647 * 833732508401263 * 658812288653553079 * 2034439836951867299888617,
 <<currently executing code>>,
 3 * 5^2 * 11 * 31 * 41 * 53 * 131 * 157 * 521 * 1613 * 2731 * 8191 * 51481 * 409891 * 7623851 * 34110701 * 108140989558681 * 145295143558111,
 7 * 78903841 * 28753302853087 * 618970019642690137449562111 * 24124332437713924084267316537353]
[127 * 13367 * 164511353 * 17137716527 * 51954390877748655744256192963206220919272895548843817842228913,
 7 * 73 * 16183 * 34039 * 1437967 * 2147483647 * 833732508401263 * 658812288653553079 * 2034439836951867299888617,
 131071 * 12761663 * 179058312604392742511009 * 3320934994356628805321733520790947608989420068445023,
 3 * 5^2 * 11 * 31 * 41 * 53 * 131 * 157 * 521 * 1613 * 2731 * 8191 * 51481 * 409891 * 7623851 * 34110701 * 108140989558681 * 145295143558111,
 7 * 78903841 * 28753302853087 * 618970019642690137449562111 * 24124332437713924084267316537353]

Printing w repeatedly shows the progress of the five factorizations running in parallel (monitoring it looking at top is showing 5 sage/python jobs running simultaneously). The following example is global optimization, starting from 5 different starting points using the differential evolution algorithm available in SciPy. The setup is more complex but still, only a single command string is passed to PSage().

>>> v = [ PSage() for _ in range(5)]
>>> w = [x('from scipy.optimize import rosen, differential_evolution; differential_evolution(rosen, [(0,2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2)])') for x in v]
>>> print w
[Sage, Sage, Sage, Sage, Sage]

Apparently, something is wrong here, it doesn't work. But why? Let's see what happens with the serial Sage interpreter Sage().

>>> s = Sage()
>>> t = s('from scipy.optimize import rosen, differential_evolution;differential_evolution(rosen,[(0,2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2)])')
>>> print t
Sage

Looks like the same problem. However, Sage() can be made to work using the eval() method.

>>> s = Sage()
>>> t = s.eval('from scipy.optimize import rosen, differential_evolution;differential_evolution(rosen,[(0,2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2)])')
>>> print t
fun: 1.2785524204717224e-18

PSage() also has an eval() method, which AFAIK uses Sage().eval() internally, but unfortunately, it doesn't work.

>>> v = [ PSage() for _ in range(5)]
>>> w = [x.eval('from scipy.optimize import rosen, differential_evolution; differential_evolution(rosen, [(0,2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2), (0, 2)])') for x in v]
>>> print w
['<<currently executing code>>', '<<currently
executing code>>', '<<currently executing code>>',
'<<currently executing code>>', '<<currently executing
code>>']

Based on what I see looking at top the five optimization jobs do run in parallel, but even after they finish, no matter how many times I print w all I get is <<currently executing code>>.

The bottom line is that PSage() is either not working at all without using the eval() method, or, it seems to be working with the eval() method, but somehow is stuck in a bad internal state and never producing the output. Any comment is highly appreciated, it would be great to get this to work consistently.

2016-11-22 18:11:43 -0500 received badge  Enthusiast
2016-11-15 10:23:32 -0500 answered a question Parallel Interface to the Sage interpreter

It turns out that the problem has nothing to do with scalar vs. vector variable. The bottom line is that PSage() takes a single STRING as parameter enclosed in quotes. The following (somewhat awkward) modification makes the parallel interface work properly.

from scipy.optimize import rosen

x0 = [1.3, 0.7, 0.8, 1.9, 1.2]
rosen(x0)
848.22000000000003

v = [PSage() for _ in range(3)]
w = [x(eval('rosen(%s)'% str(x0))) for x in v]
w
[848.22, 848.22, 848.22]

Unfortunately, using eval() does not solve the problem. Because the example here is instantaneous to compute, I haven't realized that the calculations were, in fact, executed sequentially and not in parallel. After looking at the PSage() code it is clear that the argument must be a single string and the fact that this example "worked" although not in parallel, must be some artifact of using an explicit eval(). Mea culpa, HOWEVER, I created a better example and opened a new ticket, because PSage() would be an exteremely good tool for this kind of thing.

2016-11-15 08:04:56 -0500 received badge  Nice Question (source)
2016-11-14 18:16:14 -0500 asked a question Parallel Interface to the Sage interpreter

I am using the PSage()parallel interpreter as described here http://doc.sagemath.org/html/en/refer.... The interface works fine to evaluate multiple instances of function calls to functions of one or more scalar variables, however, if a function variable is array-like, the interface doesn't seem to work any more. I wonder if any special quoting is necessary for array-like arguments. Here is a minimal example. For reference, I copy the example from the above page, which works just fine, factor() has a single scalar variable.

v = [PSage() for _ in range(3)]
w = [x('factor(2^%s-1)'% randint(250,310)) for x in v]
w
[4057 * 8191 * 6740339310641 * 3340762283952395329506327023033,
 31 * 13367 * 2940521 * 164511353 * 70171342151 *
3655725065508797181674078959681,
 31 * 13367 * 2940521 * 164511353 * 70171342151 *
3655725065508797181674078959681]

However, the rosen() function with a single array-like/vector argument doesn't seem to work in the parallel interface. (The example below just calculates the same function value three times, but that is not the point here.)

from scipy.optimize import rosen

x0 = [1.3, 0.7, 0.8, 1.9, 1.2]
rosen(x0)
848.22000000000003

v = [PSage() for _ in range(3)]
w = [x('rosen(x0)') for x in v]
w
[Sage, Sage, Sage]

Does anyone have any suggestion?

2016-09-30 01:45:00 -0500 received badge  Notable Question (source)
2016-09-30 01:45:00 -0500 received badge  Popular Question (source)
2016-09-11 09:28:27 -0500 received badge  Popular Question (source)
2016-09-11 09:28:27 -0500 received badge  Famous Question (source)
2016-09-11 09:28:27 -0500 received badge  Notable Question (source)
2016-08-08 02:02:08 -0500 received badge  Famous Question (source)
2016-08-08 02:02:08 -0500 received badge  Notable Question (source)
2016-08-08 02:02:08 -0500 received badge  Popular Question (source)
2016-06-21 23:05:46 -0500 received badge  Necromancer (source)
2016-06-21 22:00:13 -0500 received badge  Scholar (source)
2016-06-21 16:18:16 -0500 answered a question calling a parallel decorated function on an iterator

Niles is right. I also ran into this problem recently but found a straightforward way to solve it. I have some sample code below with detailed explanation. First, let's see a simple comparison between serial and parallel execution of a function.

@parallel(p_iter='multiprocessing', ncpus=3)
def test_parallel(n):
  f = factor(2^128+n)
return len(f)

t=walltime()
r = range(1000)
p = sorted(list( test_parallel(r)))
print p[-1]
print walltime(t)

t=walltime()
for i in range(1000):
  f = factor(2^128+i)
print f
print walltime(t)

(((999,), {}), 5)
6.359593153
5 * 23 * 383 * 1088533 * 7097431848554855065803619703
17.0849101543

test_parallel is a simple function that takes a nontrivial time to execute for testing purposes. It returns the number of distinct factors of the factorization of 2^128+n. The argument of test_parallel is a list created by the range function. Note that this has to be a list, there is currently no alternative, so e.g. xrange cannot be used in place of range because xrange generates numbers on the fly rather than creating a whole list of them. This can be a serious problem (mainly memory problem) but it can be overcome as will be shown further below. test_parallel as any parallel decorated function returns a special object, which is an iterator over 2-tuples and the order of the 2-tuples is entirely random! So, the output (((999,), {}), 5) above representing the last item in the calculation p[-1] includes the input value of n=999, an empty input keyword list/dictionary {}, and the return value of 5. It should be noted that in order to be able to parse the output from test_parallel it should be cast to a sorted list.

In this particular run the parallel calculation (using 3 cores) took some 6 seconds whereas at the bottom of the sample code the serial equivalent took some 17 seconds to execute (and the result of the last factorization confirms that there were 5 distinct factors).

This is all well and my great appreciation to the developers. Unfortunately, however, a serious problem arises when the list argument to a parallel function grows too big. One solution that has worked very well for me involves chunks as Niles suggested plus numpy arrays. The following code is a more robust and significantly more efficient alternative to the naive parallel code above.

%timeit
import numpy as np

sizof_chunk = 10^3
numof_chunk = 10^2
np_array    = np.zeros((sizof_chunk*numof_chunk,), dtype=np.uint8)

for i in range(numof_chunk):

  beg    = i *   sizof_chunk
  end    = beg + sizof_chunk
  tuples = sorted(list(test_parallel(range(beg,end))))
  iter = [ x[1] for x in tuples ]
  np_array[beg:end] = np.fromiter(iter, np.uint8)

print np_array

[1 2 3 ..., 6 8 3]
CPU time: 13.88 s,  Wall time: 670.06 s

sizof_chunk is set to the same number 1000 and numof_chunk can be set to anything. If it is set to 1 then the calculation will be the exact same as above (and will take about 6 seconds ...

(more)
2016-06-17 21:19:21 -0500 commented answer Performance issues with parallel decoration

In version 7.2 I no longer see the load balancing issue.

Istvan

2016-06-17 20:25:10 -0500 received badge  Popular Question (source)
2016-04-22 19:37:10 -0500 received badge  Notable Question (source)
2016-04-22 19:37:10 -0500 received badge  Popular Question (source)
2015-11-17 04:49:28 -0500 received badge  Popular Question (source)
2015-09-04 20:21:27 -0500 commented answer How make Notebook not to write in /tmp

I tried both and SAGENB_TMPDIR doesn't work, but setting TMPDIR does.

Thanks!

2015-09-04 15:03:39 -0500 asked a question How make Notebook not to write in /tmp

I am using Sage Notebook to generate very large files and then process them. The size of the files is tens of Gigabytes but that wouldn't be a problem the way things are set up in my worksheet. However, it seems no matter where the worksheet is stored the associated files are temporarily stored in /tmp whil ethe worksheet is working. For example,

$ ll /raid/istvan/Playground/Sage.sagenb/home/__store__/2/21/212/2123/admin/17/cells/4/
total 8
drwx------ 2 istvan istvan 4096 Sep  4 15:36 ./
drwxrwxr-x 9 istvan istvan 4096 Sep  4 15:34 ../
lrwxrwxrwx 1 istvan istvan   49 Sep  4 15:36 A.mmap -> /tmp/tmpa6CJmh/A.mmap
lrwxrwxrwx 1 istvan istvan   49 Sep  4 15:36 B.mmap -> /tmp/tmpa6CJmh/B.mmap
lrwxrwxrwx 1 istvan istvan   49 Sep  4 15:36 C.mmap -> /tmp/tmpa6CJmh/C.mmap
lrwxrwxrwx 1 istvan istvan   49 Sep  4 15:36 D.mmap -> /tmp/tmpa6CJmh/D.mmap

$ ll /tmp/tmpa6CJmh
total 68860
drwx------  2 istvan istvan        4096 Sep  4 15:35 ./
drwxrwxrwt 14 root   root         12288 Sep  4 15:35 ../
-rw-rw-r--  1 istvan istvan        1688 Sep  4 15:35 ___code___.py
-rw-rw-r--  1 istvan istvan 10000000000 Sep  4 15:37 A.mmap
lrwxrwxrwx  1 istvan istvan          54 Sep  4 15:35 data -> /raid/istvan/Playground/Sage.sagenb/home/admin/17/data/
-rw-rw-r--  1 istvan istvan 10000000000 Sep  4 15:37 B.mmap
-rw-rw-r--  1 istvan istvan 80000000000 Sep  4 15:37 C.mmap
-rw-rw-r--  1 istvan istvan 10000000000 Sep  4 15:37 D.mmap
-rw-rw-r--  1 istvan istvan        2219 Sep  4 15:35 _sage_input_6.py

The total size of the files is about 110 GB and I have plenty of room on the /raid partition where the Sage Notebook resides, but my / including /tmp partition is way too small for that. How can I make the Notebook NOT to write to /tmp? Is there an envar to set that?

Thanks for any suggestion,

Istvan

2015-04-02 07:09:14 -0500 commented answer Is HDF5 or the Python interface h5py supported in Sage?

Download the package from http://www.hdfgroup.org/downloads/index.html and the installation is straightforward. On Linux:

1) untar the downloaded file
2) cd to the hdf5 directory
3) ./configure --prefix=/where/you/want/hdf5/to/be/installed (in my case it was /usr/local/hdf5)
4) make
5) make check
6) sudo make install (sudo needed if the location is not in your own user area)
7) sudo make check-install
2015-04-01 12:18:23 -0500 received badge  Self-Learner (source)
2015-04-01 01:51:28 -0500 answered a question Cannot add a comment to my own question

I could answer my question as opposed to comment on it. It could be my browser, I don't know but let's consider this ticket closed.

2015-04-01 01:48:10 -0500 answered a question Is HDF5 or the Python interface h5py supported in Sage?

Of course, hdf5 must be installed first and since it is not a Python package, pip will likely need explicit information about hdf5 libraries and include files. The following command worked for me:

$ sage -pip install --global-option=build_ext --global-option="-L/usr/local/hdf5/lib" --global-option="-l/usr/local/hdf5/lib" --global-option="-I/usr/local/hdf5/include" --global-option="-R/usr/local/hdf5/lib" h5py

2015-03-31 15:24:14 -0500 asked a question Cannot add a comment to my own question

I am trying to get further help regarding "Is HDF5 or the Python interface h5py supported in Sage?" I added my related question as a comment (while logged in) twice, the site seemed to have processed it but it won't show up under the above title. Should I open a new ticket?

UPDATE

This is fixed now :)

2015-03-18 08:06:25 -0500 commented answer Is HDF5 or the Python interface h5py supported in Sage?

Got it. Great, thank you!

Istvan

2015-03-17 12:37:47 -0500 asked a question Is HDF5 or the Python interface h5py supported in Sage?

I am saving very large list objects to disk using the save() command in Sage, which is utilizing Python's Pickles package. There is a known deficiency/bug that is unlikely to go away, namely that deeply buried in the compression code in the Python standard library that Pickles and therefore save() use, there are legacy 32-bit integers that result in a serious limitation in using save() with even moderately large (hundreds of MB) objects. See OverflowError: size does not fit in an int. The h5py package provides a Python interface to the HDF5 library and HDF5 can deal with multiple terabytes easily. Does anyone know if h5py is/will be implemented in Sage? Is there another alternative to save() in Sage to save objects to disk?

2015-03-17 11:33:14 -0500 commented question How to change the prefix to SAGE_TMP?

I am using v6.4.1.~/.sage/temp/ is different. If I run a Notebook session, its files are stored under ~/.sage/notebook/... but while a Sage calculation is running all the files there are just links to /tmp/something where /tmp/something is the value of SAGE_TMP in that session. I am saving large objects via the save() command and that's where I have the problem, because they can't fit in /tmp. There has got to be a way in Sage to set the prefix for SAGE_TMP from /tmp to something else.

2015-03-16 04:27:54 -0500 received badge  Nice Question (source)
2015-03-13 17:43:56 -0500 received badge  Editor (source)
2015-03-13 17:42:18 -0500 asked a question How to change the prefix to SAGE_TMP?

SAGE_TMP looks something like this by default: /tmp/tmpGMP2PR. My /tmp is full but I have plenty of space in a different tmp directory on another disk. How can I change the default prefix in SAGE_TMP from /tmp to, say, /raid/scratch?

Thanks.

2015-03-04 14:15:56 -0500 commented answer Performance issues with parallel decoration

Thanks, Vincent. Of course, you are right this is not an exact apples to apples comparison and your point about the calculation being too fast is valid. I too experimented with very large numbers and indeed parallel performance is better. Nonetheless I do see 6 python jobs starting out but very quickly four of them finish and only two and then only one is running for quite a while which means that load balancing is far from ideal. I'll check to source code to see how the list is passed to the function.

Thanks again,

Istvan

2015-03-02 22:39:23 -0500 commented answer How to format questions in this forum

Got it, thank you.

2015-03-02 19:10:08 -0500 asked a question How to format questions in this forum

Sorry, I am new to this forum but already find it very helpful. I noticed that most posts have nicely formatted code snippets, but I couldn't figure out how to do it. When cut-and-paste from notebook() my code looks awful and unformatted.

Thanks for any suggestions.

2015-03-02 19:04:53 -0500 asked a question Performance issues with parallel decoration

Experimenting with @parallel resulted in unexpected performance issues in Sage 6.4.1. Here is a very simple example:

@parallel(p_iter='multiprocessing', ncpus=6)
def f(n):
    return factor(n)
t=walltime()
r = range(1,1000000)
p = sorted(list( f(r)))
print walltime(t)
82.0724880695

t=walltime()
for i in range(1,1000000):
    factor(i)
print walltime(t)
12.1648099422

I have 6 physical cores, yet the serial calculation runs more than 6 times faster, even though I can see 6 instances of python running on my computer. Maybe it is pilot error, I have the following questions: 1) Does Sage require a special way of compiling it in order to take full advantage of @parallel? 2) In this case using 'fork' is even worse, it never completes the calculation. 3) How does @parallel distribute the calculations? Since, in general, it takes significantly longer for factor() to process larger numbers, it seems that assigning the case n=1,7,13,... to core_0, n=2,8,14,... to core_1, etc., makes sense. Shuffling the original serial list given to f(n) also seems plausible. However, dividing the whole serial range to 6 intervals and assigning them to the 6 cores, respectively, would be a bad choice and for most of the time only one or two python processes would do anything. Does anyone know what scheme is used in Sage?

Thanks for any suggestions.

2015-03-01 23:06:17 -0500 commented answer Why does append() overwrite/clobber every existing element of a list with the one that was just appended?

That's very helpful, thanks!

2015-03-01 20:34:35 -0500 asked a question Why does append() overwrite/clobber every existing element of a list with the one that was just appended?
M = []

L = [0 for i in range(10)]

print L

M.append(L)

print M

L[0] +=1

L[7] +=1

print L

M.append(L)

print M

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

[1, 0, 0, 0, 0, 0, 0, 1, 0, 0]

[[1, 0, 0, 0, 0, 0, 0, 1, 0, 0], [1, 0, 0, 0, 0, 0, 0, 1, 0, 0]]
2015-02-22 22:09:05 -0500 received badge  Teacher (source)