Ask Your Question

elastica's profile - activity

2022-05-06 22:56:45 +0100 received badge  Famous Question (source)
2022-05-06 22:56:45 +0100 received badge  Notable Question (source)
2021-10-16 14:29:42 +0100 received badge  Famous Question (source)
2020-05-13 00:10:41 +0100 received badge  Popular Question (source)
2019-07-21 16:39:35 +0100 received badge  Nice Question (source)
2019-07-18 18:45:13 +0100 commented question Boost implementation of Dijkstra's algorithm

After reading the Boost docs more carefully, change the adjacency graph first template parameter from listS to vecS and this causes a nice speedup: now Boost Dijkstra takes 0.15s instead of 0.5s on a RandomGNP(6000, 0.3) graph. Building the graph is expensive (0.36s). The code is HERE and the dataset is HERE.

2019-07-18 18:45:13 +0100 received badge  Commentator
2019-07-17 22:55:38 +0100 commented answer Sage graph backend

That's unfortunate. Thanks for the answer.

2019-07-17 22:55:23 +0100 received badge  Scholar (source)
2019-07-17 22:55:21 +0100 received badge  Supporter (source)
2019-07-16 21:53:34 +0100 asked a question Sage graph backend

A Sage graph G has a backend graph (say GG) accessible with the private _backend attribute. The later graph refers to two "C-graphs" (say G1 and G2) accessibles via the c-graph attribute. Some code with a weighted graph G to illustrate my question :

import string
ALPHA = string.ascii_uppercase

n = 10
wedges = [(0, 1, 7.0), (0, 3, 1.0), (0, 9, 9.0), (1, 7, 6.0), (2, 8, 7.0),
          (2, 9, 2.0), (3, 6, 7.0), (4, 9, 5.0), (5, 9, 6.0), (6, 9, 9.0),
          (6, 7, 1.0), (7, 8, 9.0), (7, 9, 5.0), (8, 9, 3.0)]

wedges = [(ALPHA[i], ALPHA[j], 10 * w) for (i, j, w) in wedges]
G = Graph()
G.add_edges(wedges)
G.weighted(True)

for v in G:
    print v, ''.join(G[v])

print

print "G type:", type(G)
print "-----------------------------"
GG = G._backend
print "GG type:", type(GG)
print list(GG.iterator_verts())

print "-----------------------------"
G1, G2 = GG.c_graph()
print "G1 type:", type(G1)
print

for i in range(n):
    print i, ''.join(map(str, G1.out_neighbors(i)))

outputting:

A BDJ
C JI
B AH
E J
D AG
G DJH
F J
I JHC
H BJIG
J AHCIGEF

G type: <class 'sage.graphs.graph.Graph'>
-----------------------------
GG type: <type 'sage.graphs.base.sparse_graph.SparseGraphBackend'>
['A', 'C', 'B', 'E', 'D', 'G', 'F', 'I', 'H', 'J']
-----------------------------
G1 type: <type 'sage.graphs.base.sparse_graph.SparseGraph'>

0 123
1 04
2 07
3 0456789
4 1367
5 36
6 345
7 234
8 3
9 3

The graphs above G, GG and G1 have n=10 vertices. The difference is that G1's vertices are labelled from 0 to n-1. As you can imagine, the two graph G and G1 are isomorphic.

So my question is simple: does anybody know how to access the mapping between the vertice sets?

The documentation explains that two dictionaries vertex_ints and vertex_labels are available to make translation from vertices id to integers and vice-versa, unfortunately, GG.vertex_ints and GG.vertex_labels cause an attribute error.

As you may see, and contrary to what I was expecting, the correpondance is not $i\mapsto G[i]$.

2019-07-15 00:02:42 +0100 commented question Boost implementation of Dijkstra's algorithm

Surpringly, Boost library is fast but not terribly fast. Converting my Python code above to standard C++, I have tested with a RandomGNP(6000, 0.3) graph: pure Boost takes 0.50s and my standard C++ code only 0.02s. But only for graphs with vertices 0, 1, 2, ... You can always reduce to this case, at the price of converting (with a good C/C++ hash table) nodes id to integer (and back). But, writing C/C++/Cython is complicated. So, my inclination goes to Numba (with Numpy), you can hope even better performances than pure C++. If I achieve good result with Numba, I'll post again in this forum. It was a pleasure to talk with you David.

2019-07-14 19:06:55 +0100 commented question Boost implementation of Dijkstra's algorithm

I tried with graph-tool (Python module wrapping code from Boost Graph library). Takes 0.50s for the same data set.

2019-07-14 17:16:42 +0100 commented question Boost implementation of Dijkstra's algorithm

I tried with Networkit (C++ library wrapped with Cython). Takes 0.25s for the same data set.

2019-07-14 16:39:54 +0100 commented question Boost implementation of Dijkstra's algorithm

Agree about the conversion, in my edited code, 60% of the time is spent converting from the Sage graph format G to the adjacency list adj. I could avoid the conversion in the following way:

    for v in G[u]:
        w=G.edge_label(u,v)
        dd=d+w

instead of

    for v, w in adj[u]:
        dd = d + w

but it is slower. So the problem is not only conversion, but also fast access to some attribute. I will try Numba to boost the code (recently they added heapqueue and dictionary support).

2019-07-14 11:07:27 +0100 commented question Boost implementation of Dijkstra's algorithm

Thanks for the clarification about the default algorithm.

Finally, I have written the test in C++/Boost with the dataset above: takes 0.13s. Using Boost Graph Library within Sage causes a significant slowdown (about a factor 10).

2019-07-14 01:12:32 +0100 commented question Boost implementation of Dijkstra's algorithm

Depend on your starting point. If you start from a bunch of weighted edges, probably Sage graph + Boost is slower than pure Python code, this sounds a problem to me (recall that Sage graphs are written in Cython for speed), doesn't it? If your starting point is a Sage Graph (I avoid NetworkX graphs as their dict of dict of dict implementation is slow), my pure Python code has to extract the weighted adjacency list and pure Python code is now slower (cf. my update above), but the ratio (about x1.7) is far from the ratio speed between pure Python code and C++ code (about 25 in our case I imagine) and this was my initial topic.

2019-07-13 22:44:24 +0100 commented question Boost implementation of Dijkstra's algorithm

You are right, the test in unfair ... for the Pure Python code. Did you realize that, on the Boost side, I did'nt take into account the time elapsed to build the graph? On the contrary, on the Pure Python side, I took in account the correspondind time since the Dijkstra function built the graph from the weighted list of edges and converted from string node to integer node.

2019-07-13 00:12:40 +0100 received badge  Notable Question (source)
2019-07-12 14:08:13 +0100 asked a question Boost implementation of Dijkstra's algorithm

I have benchmarked Dijkstra's algorithm using the Boost Graph library interface provided by SageMath against a basic and academic one but written in pure Python. And the later performs always better. This is very surprising since Boost Graph library is a templated C++ library. Perphaps I'using it in the wrong way. Here is the code :

from time import clock
from heapq import heappush, heappop

def make_graph(n, p):
    import networkx as nx
    G=nx.erdos_renyi_graph(n, p)

    G=nx.relabel_nodes(G, lambda k:'v'+str(k))    
    edges=list(G.edges())
    wedges=[]
    for u, v in edges:
        w=float(randrange(1, 10))
        G[u][v]["weight"]=w
        wedges.append((u, v, w))

    return G, wedges

def wedges2adjw(nodes, wedges):
    "From weighted wedges to weighted adjacency list"
    n=len(nodes)
    node2int={node:i for (i,node) in enumerate(nodes)}
    adj=[[] for _ in range(n)]
    for a, b,w in wedges:
        i=node2int[a]
        j=node2int[b]        
        adj[i].append((j,w))
        adj[j].append((i,w))

    return adj, node2int

def dijkstra_int(adj, src):
    """
    adj: weighted adjacency lists of a graph with n nodes
    vertices indexed fropm 0 to n-1
    Returns shortest path list from src to all vertices"""

    n=len(adj)
    processed=[0]*n
    processed[src]=1
    distances=[(0,src)]
    INF=float("inf")
    dist=[INF]*n
    dist[src]=0
    while distances:
        d, u=heappop(distances)
        if processed[u]==1:
            processed[u]=2

            for v,w in adj[u]:
                dd=d+w
                if processed[v]==0 or dd<dist[v]:
                    heappush(distances, (dd,v))
                    dist[v]=dd
                    if processed[v]==0:
                        processed[v]=1
    return dist

def dijkstra(nodes, wedges, n, src):
    adj, node2int=wedges2adjw(nodes, wedges)
    int2node={i:node for (node,i) in node2int.items()}
    src_int=node2int[src]
    return {int2node[i]:d for (i, d) in enumerate(dijkstra_int(adj, src_int)) if d!=float("inf")}



print("------------  Generating Graph --------------------")
debut=clock()

n=3000
p=0.3

Gnx, wedges=make_graph(n, p)
src=sample(Gnx.nodes(),1)[0]
nodes=Gnx.nodes()
gtime=(clock()-debut)
print "Number of nodes:", len(Gnx)
print "Number of edges:", len(Gnx.edges())
print("Graph generation: %.2fs" %gtime) 


print("------------  Boost --------------------")


G = Graph()
G.add_vertices(nodes)
G.add_edges(wedges)
G.weighted(True)

debut=clock()
dist_boost = G.shortest_path_lengths(src,by_weight = True, algorithm='Dijkstra_Boost')

boost=(clock()-debut)
print("Boost: %.2fs" %boost)  


print("------------  Pure Python --------------------")
debut=clock()
dist_pp=dijkstra(nodes, wedges, n, src)
py_time=(clock()-debut)
print("Pure Python: %.2fs" %py_time) 

print"----------"
print "Checking:", dist_boost==dist_pp

Output :

------------  Generating Graph --------------------
Number of nodes: 3000
Number of edges: 1350005
Graph generation: 11.76s
------------  Boost --------------------
Boost: 1.61s
------------  Pure Python --------------------
Pure Python: 1.48s
----------
Checking: True

EDIT

Now, the tested functions have the same interface:

from time import clock
from heapq import heappush, heappop

def make_graph(n, p):
    G = Graph()
    G.add_vertices(range(n))
    G.add_edges([(u, v, float(randrange(1, 10)))
                 for (u, v, _) in graphs.RandomGNP(n, p).edges()])
    G.weighted(True)

    return G

def dijkstra(G, src):
    adj = [[] for _ in range(len(G))]

    for i, j, w in G.edges():
        adj[i].append((j, w))
        adj[j].append((i, w))    

    n = len(adj)
    processed = [0] * n
    processed[src] = 1
    distances = [(0, src)]
    INF = float("inf")
    dist = [INF] * n
    dist[src] = 0
    while distances:
        d, u = heappop(distances)
        if processed[u] == 1:
            processed[u] = 2

            for v, w in adj[u]:
                dd = d + w
                if processed[v] == 0 or dd < dist[v]:
                    heappush(distances, (dd, v))
                    dist[v] = dd
                    if processed[v] == 0:
                        processed[v] = 1
    return {i: dist[i] for i in G if dist[i] != INF}


print("------------  Generating Graph --------------------")
debut = clock()

n = 3000
p = 0.3

G = make_graph(n, p)
src = randrange(n)
nodes = G.vertices()
gtime = (clock() - debut)
print "Number of nodes:", len(G)
print "Number of edges:", len(G.edges())
print("Graph generation: %.2fs" % gtime)


print("------------  Sage --------------------")

debut = clock()
dist_sage = G.shortest_path_lengths(
    src, by_weight=True)
default_time = (clock() - debut)
print("Default: %.2fs" % default_time)


print("------------  Boost --------------------")

debut = clock()
dist_boost = G.shortest_path_lengths(
    src, by_weight=True, algorithm='Dijkstra_Boost')
boost_time = (clock() - debut)
print("Boost: %.2fs" % boost_time)

print("------------  Pure Python --------------------")
debut = clock()
dist_pp = dijkstra(G, src)
py_time = (clock() - debut)
print("Pure Python: %.2fs" % py_time)

print "----------"
print "Checking:", dist_sage == dist_boost == dist_pp

print "Time ratio: %.2f" %(py_time/boost_time)

output:

------------  Generating Graph --------------------
Number of nodes: 3000
Number of edges: 1349390
Graph generation: 7.74s
------------  Sage --------------------
Default: 2.23s
------------  Boost --------------------
Boost: 1.53s
------------  Pure Python --------------------
Pure Python: 2.54s
----------
Checking: True
Time ratio: 1.66

By the way, the docs explain about the default implementation:

(default): Sage chooses the best algorithm: 'BFS' if by_weight is False, 'Dijkstra_Boost' if all weights are positive, 'Bellman-Ford_Boost' otherwise.

This is not consistent with the timing above. [As explained by David Coudert in the comments, the difference is due to sign checking]

2019-04-06 10:10:19 +0100 received badge  Popular Question (source)
2019-01-05 11:54:46 +0100 received badge  Nice Answer (source)
2019-01-04 11:09:17 +0100 received badge  Editor (source)
2019-01-03 16:06:23 +0100 commented answer Sage Floyd algorithm in Cython

Thanks for the suggestion, I will reflect on it, but not much time at the moment. I have never done this before so could be time consuming to dive in this process. More over, the Floyd'algorithm Python implementation above needs some wrapping to match Sage interface, perpaps causing lower performance, this needs some testing before.

2019-01-03 12:10:16 +0100 received badge  Self-Learner (source)
2019-01-03 12:10:16 +0100 received badge  Teacher (source)
2019-01-02 23:25:31 +0100 answered a question Sage Floyd algorithm in Cython

I have searched more ;)

From Sagemath source code, we can spot the following that leaves no room for hope:

    Traceback (most recent call last):
    ...
    ValueError: Algorithm 'Floyd-Warshall-Cython' does not work with weights.

But all is not lost. For each of Numpy (via the NetworkX library) and Scipy, there is an efficient implementation of Floyd algorithm (for Scipy, the implementation is as efficient as pure C code). The following code shows how to use them within SageMath:

import networkx as nx
from scipy.sparse import csr_matrix
from scipy.sparse.csgraph import floyd_warshall
import numpy as np

def wedges2matrix(wedges, n, loop=True):

    M=[[float('inf')]*n for _ in range(n)]
    for a, b, w in wedges:
        M[a][b]=w
    if loop:
        for a in range(n):
            M[a][a]=0
    return M

def floyd_scipy(wedges, n):
    M=wedges2matrix(wedges, n)
    return floyd_warshall(np.array(M)) 


def floyd_np(wedges, n):
    G = nx.DiGraph()
    G.add_weighted_edges_from(wedges)   
    M=wedges2matrix(wedges, n)    
    dist=nx.floyd_warshall_numpy(G, nodelist=range(n))
    return dist


n=10 #nb of vertices
wedges=[(9, 3, 38.0), (6, 0, 6.0), (0, 2, 53.0), (4, 7, 66.0), 
        (1, 9, 84.0), (8, 5, 24.0), (2, 0, 73.0), (4, 2, 79.0), 
         (6, 7, 78.0), (2, 4, 90.0), (7, 2, 81.0), (7, 0, 57.0), 
        (1, 3, 89.0), (0, 3, 17.0)]

A=floyd_np(wedges, n)
B=floyd_scipy(wedges, n)

print A
print "-------------"
print B

outputting identical matrices.

This is an opportunity to benchmark Sage implementation:

from time import clock

def dist2matrix(dist,n):
    M=matrix(n, n, [[float('inf')]*n for _ in range(n)])
    for a in dist:
        for b in dist[a]:
            M[a,b]=dist[a][b]
    for i in range(n):
            M[i,i]=0
    return M   

def random_edges(n, density, max_weight=100):
    M=n*(n-1)//2
    m=int(density*M)
    edges=set()
    wedges=[]
    while len(edges)<m:
        L=(randrange(n),randrange(n))
        if L[0]!=L[1] and L not in edges:
            w=float(randrange(max_weight))
            wedges.append(L+(w,))
            edges.add(L)
    return wedges


def floyd(wedges, n):
    M=wedges2matrix(wedges, n)
    for k in range(n):
        A=M[k]
        for i in range(n):
            B=M[i]
            for j in range(n):                
                if B[k] + A[j]<B[j]:
                    B[j]=B[k] + A[j]
    return M

def floyd_sage(wedges, n):
    G = DiGraph(wedges)
    dist, _ = G.shortest_path_all_pairs(by_weight = True, algorithm='Floyd-Warshall-Python')
    return dist    

n=300
density=0.33

wedges=random_edges(n, density)

start=clock()
M=floyd_scipy(wedges, n)
end=clock()
print("Scipy Floyd \t: %.2f" %(end-start))


start=clock()
M=floyd_np(wedges, n)
end=clock()
print("Numpy Floyd  \t: %.2f" %(end-start))

start=clock()
A=floyd(wedges, n)
end=clock()
print("Usual Floyd  \t: %.2f" %(end-start))

start=clock()
dist=floyd_sage(wedges, n)
end=clock()
print("Sage Floyd \t: %.2f" %(end-start))

matrix(RR, A)==dist2matrix(dist,n)

outputting on my machine:

Scipy Floyd     : 0.03
Numpy Floyd     : 0.11
Usual Floyd     : 2.52
Sage Floyd  : 10.43
True

It seems that Sage pure Python implementation is not very efficient...

Edit: final check was missing.

2019-01-02 20:35:05 +0100 received badge  Nice Question (source)
2018-12-31 12:48:36 +0100 received badge  Student (source)
2018-12-31 12:47:43 +0100 asked a question Sage Floyd algorithm in Cython

Hi

Classical Floyd algorithm computes all shortest path in a weighted graph and this algorithm has a $O(n^3)$ complexity (-> it can be slow). So i'm a bit surprised to see that the Cython version provided in Sage works only if by_weight==False meaning that all weights are 1 by default (its computes the transitive closure of the graph).

As a consequence, does it mean the only way to compute the "true" all-shortest-path in Sage is by using the pure Python implementation ?