Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

See https://bugs.python.org/issue26415 . Python's parser is not good for memory-efficiently parsing large expressions. For one thing, it will compile the entire expression to bytecode that produces the data structure. In principle that could be done in memory usage linear in the input, but possibly with a nasty constant.

If you want to read in expressions in an efficient way, you should probably consider a more restricted file format that has parsers implemented that work more efficiently. For a matrix, a "csv" file or a json file may well work better.

The Python parser (and the sage preparser!) make trade-offs that don't make them suitable to parse large data structures.

See https://bugs.python.org/issue26415 . Python's parser is not good for memory-efficiently parsing large expressions. For one thing, it will compile the entire expression to bytecode that produces the data structure. In principle that could be done in memory usage linear in the input, but possibly with a nasty constant.

If you want to read in expressions in an efficient way, you should probably consider a more restricted file format that has parsers implemented that work more efficiently. For a matrix, a "csv" file or a json file may well work better.

The Python parser (and the sage preparser!) make trade-offs that don't make them suitable to parse large data structures.

EDIT: for sparse matrices, CSV is probably not such a great solution, because it basically is a textual spreadsheet format. JSON would probably be fairly good at encoding a list of coordinate-and-value pairs that would be suitable for representing a sparse matrix in text form, but you'd have to read up on python JSON tools.

I would probably try get a file on which the lines contain i,j,A[i,j] and write a quick loop to read the lines from that file and fill in a matrix from it, but there might be more elegant solutions than that (and it may be easy to parse the file that you already have).

Mathematica can read the file without problem? That's a nice job. You have to take some care to parse data in such a way that extremely long data structures are parsed quickly and efficiently. Python's parser definitely does not have that property. I guess most python solutions decide to read/write special data formats (or exchange formats such as JSON or CSV instead).

Note that Python would probably also be able to write the file quite easily.

See https://bugs.python.org/issue26415 . Python's parser is not good for memory-efficiently parsing large expressions. For one thing, it will compile the entire expression to bytecode that produces the data structure. In principle that could be done in memory usage linear in the input, but possibly with a nasty constant.

If you want to read in expressions in an efficient way, you should probably consider a more restricted file format that has parsers implemented that work more efficiently. For a matrix, a "csv" file or a json file may well work better.

The Python parser (and the sage preparser!) make trade-offs that don't make them suitable to parse large data structures.

EDIT: for sparse matrices, CSV is probably not such a great solution, because it basically is a textual spreadsheet format. JSON would probably be fairly good at encoding a list of coordinate-and-value pairs that would be suitable for representing a sparse matrix in text form, but you'd have to read up on python JSON tools.

I would probably try get a file on which the lines contain i,j,A[i,j] and write a quick loop to read the lines from that file and fill in a matrix from it, but there might be more elegant solutions than that (and it may be easy to parse the file that you already have).

Mathematica can read the file without problem? That's a nice job. You have to take some care to parse data in such a way that extremely long data structures are parsed quickly and efficiently. Python's parser definitely does not have that property. I guess most python solutions decide to read/write special data formats (or exchange formats such as JSON or CSV instead).

Note that Python would probably also be able to write the file quite easily.

Using JSON is a but hard-going, because only the basics are present, and I don't think sage types have particularly good JSON support. However, the following might give you some inspiration:

sage: import json
sage: D=dict( (str((i,j)),1r) for i in range(1000) for j in range(1000) )
sage: S=json.dumps(D) #encode as a json string (fast)
sage: D2=json.loads(S) #get a dictionary back (also fast)
sage: D == D2
True

(note that the default json has not been extended to handle anything beyond strings as keys and does not handle Sage integers either. There are probably better libraries out there. This is just what comes with python by default.)

See https://bugs.python.org/issue26415 . Python's parser is not good for memory-efficiently parsing large expressions. For one thing, it will compile the entire expression to bytecode that produces the data structure. In principle that could be done in memory usage linear in the input, but possibly with a nasty constant.

If you want to read in expressions in an efficient way, you should probably consider a more restricted file format that has parsers implemented that work more efficiently. For a matrix, a "csv" file or a json file may well work better.

The Python parser (and the sage preparser!) make trade-offs that don't make them suitable to parse large data structures.

EDIT: for sparse matrices, CSV is probably not such a great solution, because it basically is a textual spreadsheet format. JSON would probably be fairly good at encoding a list of coordinate-and-value pairs that would be suitable for representing a sparse matrix in text form, but you'd have to read up on python JSON tools.

I would probably try get a file on which the lines contain i,j,A[i,j] and write a quick loop to read the lines from that file and fill in a matrix from it, but there might be more elegant solutions than that (and it may be easy to parse the file that you already have).

Mathematica can read the file without problem? That's a nice job. You have to take some care to parse data in such a way that extremely long data structures are parsed quickly and efficiently. Python's parser definitely does not have that property. I guess most python solutions decide to read/write special data formats (or exchange formats such as JSON or CSV instead).

Note that Python would probably also be able to write the file quite easily.

Using JSON is a but bit hard-going, because only the basics are present, and I don't think sage types have particularly good JSON support. However, the following might give you some inspiration:

sage: import json
sage: D=dict( (str((i,j)),1r) for i in range(1000) for j in range(1000) )
sage: S=json.dumps(D) #encode as a json string (fast)
sage: D2=json.loads(S) #get a dictionary back (also fast)
sage: D == D2
True

(note that the default json has not been extended to handle anything beyond strings as keys and does not handle Sage integers either. There are probably better libraries out there. This is just what comes with python by default.)

See https://bugs.python.org/issue26415 . Python's parser is not good for memory-efficiently parsing large expressions. For one thing, it will compile the entire expression to bytecode that produces the data structure. In principle that could be done in memory usage linear in the input, but possibly with a nasty constant.

If you want to read in expressions in an efficient way, you should probably consider a more restricted file format that has parsers implemented that work more efficiently. For a matrix, a "csv" file or a json file may well work better.

The Python parser (and the sage preparser!) make trade-offs that don't make them suitable to parse large data structures.

EDIT: for sparse matrices, CSV is probably not such a great solution, because it basically is a textual spreadsheet format. JSON would probably be fairly good at encoding a list of coordinate-and-value pairs that would be suitable for representing a sparse matrix in text form, but you'd have to read up on python JSON tools.

I would probably try get a file on which the lines contain i,j,A[i,j] and write a quick loop to read the lines from that file and fill in a matrix from it, but there might be more elegant solutions than that (and it may be easy to parse the file that you already have).

Mathematica can read the file without problem? That's a nice job. You have to take some care to parse data in such a way that extremely long data structures are parsed quickly and efficiently. Python's parser definitely does not have that property. I guess most python solutions decide to read/write special data formats (or exchange formats such as JSON or CSV instead).

Note that Python would probably also be able to write the file quite easily.

Using JSON is a bit hard-going, because only the basics are present, and I don't think sage types have particularly good JSON support. However, the following might give you some inspiration:

sage: import json
sage: D=dict( (str((i,j)),1r) for i in range(1000) for j in range(1000) )
sage: S=json.dumps(D) #encode as a json string (fast)
sage: D2=json.loads(S) #get a dictionary back (also fast)
sage: D == D2
True

(note that the default json has not been extended to handle anything beyond strings as keys and does not handle Sage integers either. There are probably better libraries out there. This is just what comes with python by default.)

EDIT 2: if you want to produce a file with which you can do this, look at the string "S" and make sure to write your file in that format. You'd have to do some post-processing on the dictionary to make it suitable for input into the matrix constructor. Beware: a lot of reading tools in python tend to read the entire contents of the file into memory in one big string. For most cases that's pretty efficient and computers have a lot of memory nowadays, but for a 5GB file it's probably not a good idea.

If you're going to write a custom routine to produce a file anyway, you might as well make up your own format. If I were to do this, I'd figure out on the mathematica side how to write a file that looks like

0 10 1
0 15 -3
1 17 1
...

consisting of lines i j v indicating that A[i,j] = v, of course only for the non-zero entries of the matrix. The program to construct the matrix on the sage side would then be

A=matrix(n,m,0,sparse=True)
F=open("matrix_file","r")
for line in F:
    is, js, vs = line.split()
    A[int(i),int(j)]=ZZ(vs)
F.close()

It may not be super-fast, but it is guaranteed to have only one copy in memory of the big object (the sparse matrix) and it really reads the file line-by-line from disk (the OS will buffer in larger blocks, though).

See https://bugs.python.org/issue26415 . Python's parser is not good for memory-efficiently parsing large expressions. For one thing, it will compile the entire expression to bytecode that produces the data structure. In principle that could be done in memory usage linear in the input, but possibly with a nasty constant.

If you want to read in expressions in an efficient way, you should probably consider a more restricted file format that has parsers implemented that work more efficiently. For a matrix, a "csv" file or a json file may well work better.

The Python parser (and the sage preparser!) make trade-offs that don't make them suitable to parse large data structures.

EDIT: for sparse matrices, CSV is probably not such a great solution, because it basically is a textual spreadsheet format. JSON would probably be fairly good at encoding a list of coordinate-and-value pairs that would be suitable for representing a sparse matrix in text form, but you'd have to read up on python JSON tools.

I would probably try get a file on which the lines contain i,j,A[i,j] and write a quick loop to read the lines from that file and fill in a matrix from it, but there might be more elegant solutions than that (and it may be easy to parse the file that you already have).

Mathematica can read the file without problem? That's a nice job. You have to take some care to parse data in such a way that extremely long data structures are parsed quickly and efficiently. Python's parser definitely does not have that property. I guess most python solutions decide to read/write special data formats (or exchange formats such as JSON or CSV instead).

Note that Python would probably also be able to write the file quite easily.

Using JSON is a bit hard-going, because only the basics are present, and I don't think sage types have particularly good JSON support. However, the following might give you some inspiration:

sage: import json
sage: D=dict( (str((i,j)),1r) for i in range(1000) for j in range(1000) )
sage: S=json.dumps(D) #encode as a json string (fast)
sage: D2=json.loads(S) #get a dictionary back (also fast)
sage: D == D2
True

(note that the default json has not been extended to handle anything beyond strings as keys and does not handle Sage integers either. There are probably better libraries out there. This is just what comes with python by default.)

EDIT 2: if you want to produce a file with which you can do this, look at the string "S" and make sure to write your file in that format. You'd have to do some post-processing on the dictionary to make it suitable for input into the matrix constructor. Beware: a lot of reading tools in python tend to read the entire contents of the file into memory in one big string. For most cases that's pretty efficient and computers have a lot of memory nowadays, but for a 5GB file it's probably not a good idea.

If you're going to write a custom routine to produce a file anyway, you might as well make up your own format. If I were to do this, I'd figure out on the mathematica side how to write a file that looks like

0 10 1
0 15 -3
1 17 1
...

consisting of lines i j v indicating that A[i,j] = v, of course only for the non-zero entries of the matrix. The program to construct the matrix on the sage side would then be

A=matrix(n,m,0,sparse=True)
F=open("matrix_file","r")
for line in F:
    is, js, vs = line.split()
    A[int(i),int(j)]=ZZ(vs)
A[int(is),int(js)]=ZZ(vs)
F.close()

It may not be super-fast, but it is guaranteed to have only one copy in memory of the big object (the sparse matrix) and it really reads the file line-by-line from disk (the OS will buffer in larger blocks, though).

See https://bugs.python.org/issue26415 . Python's parser is not good for memory-efficiently parsing large expressions. For one thing, it will compile the entire expression to bytecode that produces the data structure. In principle that could be done in memory usage linear in the input, but possibly with a nasty constant.

If you want to read in expressions in an efficient way, you should probably consider a more restricted file format that has parsers implemented that work more efficiently. For a matrix, a "csv" file or a json file may well work better.

The Python parser (and the sage preparser!) make trade-offs that don't make them suitable to parse large data structures.

EDIT: for sparse matrices, CSV is probably not such a great solution, because it basically is a textual spreadsheet format. JSON would probably be fairly good at encoding a list of coordinate-and-value pairs that would be suitable for representing a sparse matrix in text form, but you'd have to read up on python JSON tools.

I would probably try get a file on which the lines contain i,j,A[i,j] and write a quick loop to read the lines from that file and fill in a matrix from it, but there might be more elegant solutions than that (and it may be easy to parse the file that you already have).

Mathematica can read the file without problem? That's a nice job. You have to take some care to parse data in such a way that extremely long data structures are parsed quickly and efficiently. Python's parser definitely does not have that property. I guess most python solutions decide to read/write special data formats (or exchange formats such as JSON or CSV instead).

Note that Python would probably also be able to write the file quite easily.

Using JSON is a bit hard-going, because only the basics are present, and I don't think sage types have particularly good JSON support. However, the following might give you some inspiration:

sage: import json
sage: D=dict( (str((i,j)),1r) for i in range(1000) for j in range(1000) )
sage: S=json.dumps(D) #encode as a json string (fast)
sage: D2=json.loads(S) #get a dictionary back (also fast)
sage: D == D2
True

(note that the default json has not been extended to handle anything beyond strings as keys and does not handle Sage integers either. There are probably better libraries out there. This is just what comes with python by default.)

EDIT 2: if you want to produce a file with which you can do this, look at the string "S" and make sure to write your file in that format. You'd have to do some post-processing on the dictionary to make it suitable for input into the matrix constructor. Beware: a lot of reading tools in python tend to read the entire contents of the file into memory in one big string. For most cases that's pretty efficient and computers have a lot of memory nowadays, but for a 5GB file it's probably not a good idea.

If you're going to write a custom routine to produce a file anyway, you might as well make up your own format. If I were to do this, I'd figure out on the mathematica side how to write a file that looks like

100 100
0 10 1
0 15 -3
1 17 1
...

consisting of lines i j v indicating that A[i,j] = v, of course only for the non-zero entries of the matrix. The program to construct the matrix on the sage side would then be

A=matrix(n,m,0,sparse=True)
F=open("matrix_file","r")
ns,ms = F.readline().split()
A=matrix(int(ns),int(ms),0,sparse=True)
for line in F:
    is, i_s, js, vs = line.split()
    A[int(is),int(js)]=ZZ(vs)
A[int(i_s),int(js)]=ZZ(vs)
F.close()

(problem: is is a reserved word, so we use i_s instead)

It may not be super-fast, but it is guaranteed to have only one copy in memory of the big object (the sparse matrix) and it really reads the file line-by-line from disk (the OS will buffer in larger blocks, though).