Ask Your Question
3

Loading a 5GB dictionary of matrices uses up all of 64GB RAM

asked 2019-01-16 00:42:46 +0200

Leon gravatar image

In a 5GB file.sage, I stored a chain complex as a dictionary of sparse matrices (created it in Mathematica with no problems). When I run load(file.sage), the program uses up all 64GB RAM + 64GB swap and crashes. Why does Sage use that much of memory for a small file?

I tried splitting the file into three smaller ones and load one after another, but already with the first 1.3GB file, the system crashes after using all RAM, but without using swap. I get:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-1-8cecfac681b8> in <module>()
----> 1 load('/home/leon/file.sage');

sage/structure/sage_object.pyx in sage.structure.sage_object.load (build/cythonized/sage/structure/sage_object.c:12879)()

/usr/lib/python2.7/dist-packages/sage/repl/load.pyc in load(filename, globals, attach)
    245             if attach:
    246                 add_attached_file(fpath)
--> 247             exec(preparse_file(open(fpath).read()) + "\n", globals)
    248     elif ext == '.spyx' or ext == '.pyx':
    249         if attach:

MemoryError:
edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
5

answered 2019-01-16 07:12:52 +0200

nbruin gravatar image

updated 2019-01-20 22:14:05 +0200

See https://bugs.python.org/issue26415 . Python's parser is not good for memory-efficiently parsing large expressions. For one thing, it will compile the entire expression to bytecode that produces the data structure. In principle that could be done in memory usage linear in the input, but possibly with a nasty constant.

If you want to read in expressions in an efficient way, you should probably consider a more restricted file format that has parsers implemented that work more efficiently. For a matrix, a "csv" file or a json file may well work better.

The Python parser (and the sage preparser!) make trade-offs that don't make them suitable to parse large data structures.

EDIT: for sparse matrices, CSV is probably not such a great solution, because it basically is a textual spreadsheet format. JSON would probably be fairly good at encoding a list of coordinate-and-value pairs that would be suitable for representing a sparse matrix in text form, but you'd have to read up on python JSON tools.

I would probably try get a file on which the lines contain i,j,A[i,j] and write a quick loop to read the lines from that file and fill in a matrix from it, but there might be more elegant solutions than that (and it may be easy to parse the file that you already have).

Mathematica can read the file without problem? That's a nice job. You have to take some care to parse data in such a way that extremely long data structures are parsed quickly and efficiently. Python's parser definitely does not have that property. I guess most python solutions decide to read/write special data formats (or exchange formats such as JSON or CSV instead).

Note that Python would probably also be able to write the file quite easily.

Using JSON is a bit hard-going, because only the basics are present, and I don't think sage types have particularly good JSON support. However, the following might give you some inspiration:

sage: import json
sage: D=dict( (str((i,j)),1r) for i in range(1000) for j in range(1000) )
sage: S=json.dumps(D) #encode as a json string (fast)
sage: D2=json.loads(S) #get a dictionary back (also fast)
sage: D == D2
True

(note that the default json has not been extended to handle anything beyond strings as keys and does not handle Sage integers either. There are probably better libraries out there. This is just what comes with python by default.)

EDIT 2: if you want to produce a file with which you can do this, look at the string "S" and make sure to write your file in that format. You'd have to do some post-processing on the dictionary to make it suitable for input into the matrix constructor. Beware: a lot of reading tools in python tend to read the entire contents of the file into memory in one big string. For most cases that's pretty efficient and computers have a lot of memory nowadays, but for a 5GB file it's probably not a good idea.

If you're going to write a custom routine to produce a file anyway, you might as well make up your own format. If I were to do this, I'd figure out on the mathematica side how to write a file that looks like

100 100
0 10 1
0 15 -3
1 17 1
...

consisting of lines i j v indicating that A[i,j] = v, of course only for the non-zero entries of the matrix. The program to construct the matrix on the sage side would then be

F=open("matrix_file","r")
ns,ms = F.readline().split()
A=matrix(int(ns),int(ms),0,sparse=True)
for line in F:
    i_s, js, vs = line.split()
    A[int(i_s),int(js)]=ZZ(vs)
F.close()

(problem: is is a reserved word, so we use i_s instead)

It may not be super-fast, but it is guaranteed to have only one copy in memory of the big object (the sparse matrix) and it really reads the file line-by-line from disk (the OS will buffer in larger blocks, though).

edit flag offensive delete link more

Comments

Or for something that large, a binary format.

Iguananaut gravatar imageIguananaut ( 2019-01-16 11:31:11 +0200 )edit

Oh, that's disappointing. I thought Sage was supposed to be much more efficient than Mathematica. Even when I try to import just the largest matrix (that takes up less that 800MB), Sage uses up 28GB of RAM and crashes.

Leon gravatar imageLeon ( 2019-01-16 21:25:05 +0200 )edit

"I thought Sage was supposed to be much more efficient than Mathematica" it depends on what you mean by "efficient" and exactly what tasks you're judging on. For loading a huge matrix it may be just as efficient, you just have to be using the best data format for the task (which, generally, is not representing huge datasets as code).

Iguananaut gravatar imageIguananaut ( 2019-01-17 17:05:50 +0200 )edit

Hmm, I still haven't been able to import the file, but probably due to my incompetence. Could you give me specific instructions on how (in what form) to export the file from Mathematica and how to import it in Sage? Do I export it to file.json? Should its content be e.g. [ "bdrs={", "1: matrix(ZZ,1,7,{}),", "2: matrix(ZZ,7,21,{(3,3):-1, (3,9):-1, (3,14):-1}),", "};" ] How do I import this into sage? If I run load('file.json'), I get No such file or directory: '/home/file.json.sobj'. If I run json.loads('file.json'), I get ValueError: No JSON object could be decoded.

Leon gravatar imageLeon ( 2019-01-19 17:39:49 +0200 )edit

I do prefer option in EDIT 2, however I would include the dimensions on the first line of the file. Note that in your script you use is, js for the string but wrote int(i), int(j) for the conversion.

vdelecroix gravatar imagevdelecroix ( 2019-01-20 10:40:33 +0200 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2019-01-16 00:42:46 +0200

Seen: 1,246 times

Last updated: Jan 20 '19