Hi everyone,
I am currently running computations which involve a lot of operations of the type matrix*vector, where both objects are sparse and the result can be expected to be sparse as well. The matrices' base field is the symbolic ring. If I drop certain normalizations I could also work over QQ, I think.
The above operations appear sequentially. Although there are some options for parallelizing these at least in part, the biggest acceleration would be achieved if the multiplication itself could be parallelized. Is there a way to do this in Sage?
Greetings and thank you very much.
Edit: Work in progress
Below I put some code that I produced so far which is very far from a solution but produces new slow-downs that I do not understand. The strategy is to save a sparse matrix by its nonzero rows which is facilitated by the following code snippet:
def default_to_row(M):
rows_of_M=M.rows()
relevant=[]
for r in range(len(rows_of_M)):
row=rows_of_M[r]
if dot_sparse(row,row)!=0:
relevant.append([r,row])
return relevant
The row*vector multiplications are facilitated by the function
# indexed_row of type [i,v] where v is a sparse vector and i is
# the row in the matrix it corresponds to.
def mult_sparse(indexed_row,v2):
res=0
for k in indexed_row[1].dict():
res=res+indexed_row[1][k]*v2[k]
return [indexed_row[0],res]
and regular vector multiplication is given by
def dot_sparse(v,w):
res=0
for k in v.dict():
res=res+v[k]*w[k]
return res
Now to the actual benchmarks:
import concurrent.futures
import time
import itertools
dim=5000
M=random_matrix(QQ,nrows=dim,ncols=dim,sparse=True,density=0.0001)
M_by_rows=default_to_row(M)
v=vector(QQ,dim,list(range(dim)),sparse=True)
print(len(M_by_rows))
Using the concurrent.futures library yields the following performance:
start = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor() as executor:
start2 = time.perf_counter()
results = executor.map(mult_sparse,M_by_rows,itertools.repeat(v) )
finish2 = time.perf_counter()
coefficients={}
for result in results:
if result[1]!=0:
coefficients[result[0]]=result[1]
new_v=vector(QQ,v.degree(),coefficients,sparse=True)
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} seconds with Pooling.')
print(f'The executor alone took {round(finish2-start2, 2)} seconds.')
where the dimensions are adjusted so that the computation does not exceed my RAM capacities because every spawned subprocess takes copies (at least of v). A comparison with serial approaches, both the standard one and the serial version of my particular straegy, show that the parallel version is significantly slower (18 and 7 seconds in the above outputs) while the serial versions do not differ (0.02 seconds in both cases):
start = time.perf_counter()
new_v=M*v
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s) for the regular computation')
start = time.perf_counter()
coefficients={}
for row in M_by_rows:
coefficients[row[0]]=mult_sparse(row,v)[1]
new_v=vector(QQ,v.degree(),coefficients,sparse=True)
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s) in a serial computation')
print(f'Check if the vectors coincide:')
test=new_v-M*v
n=dot_sparse(test,test)
if n==0:
print(True)
else:
print(False)