# Generate Maximal subsets based on mutual/subset property

Sorry for the imprecise language - I'm not a mathematician.

Given a list/set/dict/etc., is there a function/method available that will generate subsets, with the possibility of non-null intersections, based on some user defined mutual/subset property?

E.g., given a list [0,1,2,3,4,5], generate the maximal lists such that the maximum difference within each list is 3. For the given list, the function would produce [[0,1,2,3],[1,2,3,4],[2,3,4,5]]. For the list [0,3,9,10,17,30,33], this function would produce [[0,3],[9,10],,[30,33]].

This is different than the usual filtering operation in that the defining property is about the returned lists/sets/etc., not just about the individual elements.

Function should work with members of ZZ, QQ, RR, RDF, CC, CDF, and others if possible.

edit retag close merge delete

Sort by » oldest newest most voted

You can also notice that your property defining the set B is a property that is tested on pairs of elements of A. Hence, you can construct the graph corresponding to admissible pairs:

sage: A = Set([0,3,9,10,17,30,33])
sage: G = Graph()
sage: for u,v in CartesianProduct(A,A):
....:    if u != v and max(u,v) - min(u,v) <=3:


What you are looking for is the set of maximal cliques of G:

sage: G.cliques_maximal()
[[0, 3], [9, 10], , [33, 30]]


I let you benchmark between the two methods and tell us which is better in which cases !

To understand the link between the cliques of the graph G and the simplicial complex C in my previous answer, you can read about "Rips complex".

EDIT : This method works, but there is a Sage bug when the set is made of RDF elements (see the following discussion and trac ticket 14853). Here is a dirty workaround until the bug is fixed: we replace a RDF element by a tuple made of a single RDF element.

sage: A = Set([RDF(0),RDF(3),RDF(9),RDF(10),RDF(17),RDF(30),RDF(33)])
sage: Atuple = map(lambda x:(x,), A)
sage: G = Graph()
sage: for u,v in CartesianProduct(Atuple, Atuple):
....:    if u != v and max(u,v) - min(u,v) <=3:


Then,

sage: C = [[c for c in clique] for clique in G.cliques_maximal()] ; C
[[0.0, 3.0], [9.0, 10.0], [17.0], [33.0, 30.0]]


EDIT : The bug has been fixed in sage 5.12, so you can now use this method as well without having to use the workaround, even with vertices in RDF.

more

It seems to work for me with RDF entries: sage: A = Set([RDF(0),RDF(3),RDF(9),RDF(10),RDF(17),RDF(30),RDF(33)]) sage: G = Graph() sage: G.add_vertices(A) sage: G Graph on 7 vertices sage: G.vertices() [0, 3, 9, 10.0, 17.0, 30.0, 33.0] sage: for u,v in CartesianProduct(A,A): ....: if u != v and max(u,v) - min(u,v) <=3: ....: G.add_edge(u,v) ....: sage: G.cliques_maximal() [[0, 3], [9, 10.0], [17.0], [33.0, 30.0]] Could you give an example where it fails ?

Sorry, should have been more explicit. Try: A=Set([RDF.random_element(min=0,max=10) for k in range(10)])

Oh! You are right, there seems to be a problem in the construction of graphs: sage: A=Set([RDF.random_element(min=0,max=10) for k in range(10)]) ; A {6.42320967152, 1.77698693175, 2.95922392964, 9.50745089775, 4.60546838289, 3.67300191731, 5.21254750195, 5.90579400282, 7.55309974188, 0.442799267782} sage: G = Graph() sage: G.add_vertices(A) sage: G.vertices() [0, 1, 2, 3, 4, 5, 6, 7, 9] Nathann i am lost, could you explain that behaviour (and help us with a workaround) ?

1) That's clearly a bug 2) I have no idea where it comes from, but probably from Robert Miller's attempt to avoid label if it is possible 3) I have no way to receive emails when comments are added to a question on ASK Sage (I never received any mail from AskSage in general) so asking me questions there is bound to fail :-P

First, if your want to filter (non-necessary maximal) subsets with a condition that implies testing all subsets of your given set A, you can try something like:

sage: A = Set([0,1,2,3,4,5])
sage: B = Set(a for a in A.subsets() if len(a) != 0 and max(a) - min(a) <= 3)


Note that i had to exclude the empty set to be able to pick the minimal and the maximal element of the iterated subsets of A. Note that i hd to use a Sage Set and not a python set to be able to get its subsets.

Then, you could notice that your property is stable by subsets, so you have more than a set of subsets, you have a simplicial complex:

sage: SimplicialComplex(B)
Simplicial complex with vertex set (0, 1, 2, 3, 4, 5) and facets {(1, 2, 3, 4), (2, 3, 4, 5), (0, 1, 2, 3)}


From which you can get maximal faces:

sage: C = SimplicialComplex(B).maximal_faces() ; C
{(1, 2, 3, 4), (2, 3, 4, 5), (0, 1, 2, 3)}


In your second example, you get:

sage: A = Set([0,3,9,10,17,30,33])
sage: B = Set(a for a in A.subsets() if len(a) != 0 and max(a) - min(a) <= 3)
sage: C = SimplicialComplex(B).maximal_faces() ; C
{(9, 10), (17,), (30, 33), (0, 3)}

more

This also works with RDF members of set A. Important to me.

Selecting this as the best answer for the examples given, mostly for its relative simplicity. Thanks!

Changed my mind about preferred solution after doing some performance testing. Determining maximal cliques seems to be *** orders of magnitude *** faster.