NEW VERSION: Sparse matrix stores something for each row?
NEW VERSION. This is what this question has boiled down to upon further investigation. Everything below this paragraph in boldface and the subsequent code is the old version and serves purely historical purposes =) I hope this was the right thing to do rather than creating a new question. The issue is that the amount of memory occupied by a sparse matrix seems to depend linearly on the number of rows in the matrix. In particular, my setup wont even let me create a zero $10^9\times 1$ sparse matrix but has no problem with a zero $1\times 10^9$ sparse matrix. This can all be illustrated by running the below code (and it's hardly the way things are supposed to be).
print get_memory_usage()
B=matrix(QQ, 1, 1000000000, {})
print get_memory_usage()
B=matrix(QQ, 10000000, 1, {})
print get_memory_usage()
B=matrix(QQ, 1000000000, 1, {})
print get_memory_usage()
I'm running the following code. Here I create a sparse $10^5\times 10^5$ identity matrix and then apply it repeatedly to a vector in $\mathbb R^{10^5}$ with 100 nonzero coordinates (which is stored as $10^5\times 1$ sparse matrix). Each time I add the result to a list until I run out of 2 GB of memory.
A=matrix(QQ, 100000, 100000, {(i,i):1. for i in range(100000)})
print get_memory_usage()
B=[matrix(QQ, 100000, 1, {(i,0):1. for i in range(100)})]
while (get_memory_usage()<2000): B.append(A*B[-1])
print len(B)
print get_memory_usage()
del B
del A
print get_memory_usage()
I'm receiving (on a freshly started kernel)
204.54296875
196
2003.51953125
1399.0234375
This raises two questions.
Why is there so much memory (1.4 GB) still in use after I successfully ran the code and deleted both objects I've created? That's a leak, right?
Why does deleting a list of 196 sparse matrices with 100 nonzero elements each free up 600 MB? Each such matrix should only take up a few KB, right?
I'm on Windows 8.1/SAGE 8.4.
UPDATE. Transposing the matrices, i.e. writing
...
B=[matrix(QQ, 1, 100000, {(0,i):1. for i in range(100)})]
for i in range(200): B.append(B[-1]*A)
...
seems to work well memory-wise, it returns
214.50390625
201
215.53125
186.62109375
However, it takes up much more time than the first version. This is probably due to the implementation of sparse matrix multiplication unfortunately containing a loop over the columns of the right matrix. Is there any simple way around this high memory/low speed dilemma?
UPDATE 2. This might not be a memory leak and have more to do with the implementation of sparse matrices in general rather than their multiplication in particular. Apparently, a sparse matrix stores something for each of its rows as shown by
print get_memory_usage()
B=matrix(QQ, 1, 10000000, {})
print get_memory_usage()
B=matrix(QQ, 10000000, 1, {})
print get_memory_usage()
165.45703125
165.51953125
1082.68359375
This has got to be a known issue. I was not able to find a discussion, however. (This might be what is known as csr_matrix
in scipy
but why this would be chosen as the general standard here is beyond me.)