Ask Your Question
1

Generate Maximal subsets based on mutual/subset property

asked 2013-07-03 21:54:17 -0500

rickhg12hs gravatar image

updated 2013-10-19 11:13:51 -0500

tmonteil gravatar image

Sorry for the imprecise language - I'm not a mathematician.

Given a list/set/dict/etc., is there a function/method available that will generate subsets, with the possibility of non-null intersections, based on some user defined mutual/subset property?

E.g., given a list [0,1,2,3,4,5], generate the maximal lists such that the maximum difference within each list is 3. For the given list, the function would produce [[0,1,2,3],[1,2,3,4],[2,3,4,5]]. For the list [0,3,9,10,17,30,33], this function would produce [[0,3],[9,10],[17],[30,33]].

This is different than the usual filtering operation in that the defining property is about the returned lists/sets/etc., not just about the individual elements.

Function should work with members of ZZ, QQ, RR, RDF, CC, CDF, and others if possible.

edit retag flag offensive close merge delete

2 answers

Sort by ยป oldest newest most voted
2

answered 2013-07-03 23:18:10 -0500

tmonteil gravatar image

updated 2013-10-19 11:09:52 -0500

You can also notice that your property defining the set B is a property that is tested on pairs of elements of A. Hence, you can construct the graph corresponding to admissible pairs:

sage: A = Set([0,3,9,10,17,30,33])
sage: G = Graph()
sage: G.add_vertices(A)
sage: for u,v in CartesianProduct(A,A):
....:    if u != v and max(u,v) - min(u,v) <=3:
....:        G.add_edge(u,v)

What you are looking for is the set of maximal cliques of G:

sage: G.cliques_maximal()
[[0, 3], [9, 10], [17], [33, 30]]

I let you benchmark between the two methods and tell us which is better in which cases !

To understand the link between the cliques of the graph G and the simplicial complex C in my previous answer, you can read about "Rips complex".



EDIT : This method works, but there is a Sage bug when the set is made of RDF elements (see the following discussion and trac ticket 14853). Here is a dirty workaround until the bug is fixed: we replace a RDF element by a tuple made of a single RDF element.

sage: A = Set([RDF(0),RDF(3),RDF(9),RDF(10),RDF(17),RDF(30),RDF(33)])
sage: Atuple = map(lambda x:(x,), A)
sage: G = Graph()
sage: G.add_vertices(Atuple)
sage: for u,v in CartesianProduct(Atuple, Atuple):
....:    if u != v and max(u[0],v[0]) - min(u[0],v[0]) <=3:
....:        G.add_edge(u,v)

Then,

sage: C = [[c[0] for c in clique] for clique in G.cliques_maximal()] ; C
[[0.0, 3.0], [9.0, 10.0], [17.0], [33.0, 30.0]]


EDIT : The bug has been fixed in sage 5.12, so you can now use this method as well without having to use the workaround, even with vertices in RDF.

edit flag offensive delete link more

Comments

This does not seem to work with RDF members of Set A.

rickhg12hs gravatar imagerickhg12hs ( 2013-07-03 23:34:49 -0500 )edit

It seems to work for me with RDF entries: sage: A = Set([RDF(0),RDF(3),RDF(9),RDF(10),RDF(17),RDF(30),RDF(33)]) sage: G = Graph() sage: G.add_vertices(A) sage: G Graph on 7 vertices sage: G.vertices() [0, 3, 9, 10.0, 17.0, 30.0, 33.0] sage: for u,v in CartesianProduct(A,A): ....: if u != v and max(u,v) - min(u,v) <=3: ....: G.add_edge(u,v) ....: sage: G.cliques_maximal() [[0, 3], [9, 10.0], [17.0], [33.0, 30.0]] Could you give an example where it fails ?

tmonteil gravatar imagetmonteil ( 2013-07-03 23:42:45 -0500 )edit

Sorry, should have been more explicit. Try: `A=Set([RDF.random_element(min=0,max=10) for k in range(10)])`

rickhg12hs gravatar imagerickhg12hs ( 2013-07-03 23:44:41 -0500 )edit

Oh! You are right, there seems to be a problem in the construction of graphs: sage: A=Set([RDF.random_element(min=0,max=10) for k in range(10)]) ; A {6.42320967152, 1.77698693175, 2.95922392964, 9.50745089775, 4.60546838289, 3.67300191731, 5.21254750195, 5.90579400282, 7.55309974188, 0.442799267782} sage: G = Graph() sage: G.add_vertices(A) sage: G.vertices() [0, 1, 2, 3, 4, 5, 6, 7, 9] Nathann i am lost, could you explain that behaviour (and help us with a workaround) ?

tmonteil gravatar imagetmonteil ( 2013-07-03 23:56:25 -0500 )edit

1) That's clearly a bug 2) I have no idea where it comes from, but probably from Robert Miller's attempt to avoid label if it is possible 3) I have no way to receive emails when comments are added to a question on ASK Sage (I never received any mail from AskSage in general) so asking me questions there is bound to fail :-P

Nathann gravatar imageNathann ( 2013-07-04 00:02:05 -0500 )edit
2

answered 2013-07-03 22:51:13 -0500

tmonteil gravatar image

updated 2013-07-03 22:54:47 -0500

First, if your want to filter (non-necessary maximal) subsets with a condition that implies testing all subsets of your given set A, you can try something like:

sage: A = Set([0,1,2,3,4,5])
sage: B = Set(a for a in A.subsets() if len(a) != 0 and max(a) - min(a) <= 3)

Note that i had to exclude the empty set to be able to pick the minimal and the maximal element of the iterated subsets of A. Note that i hd to use a Sage Set and not a python set to be able to get its subsets.

Then, you could notice that your property is stable by subsets, so you have more than a set of subsets, you have a simplicial complex:

sage: SimplicialComplex(B)
Simplicial complex with vertex set (0, 1, 2, 3, 4, 5) and facets {(1, 2, 3, 4), (2, 3, 4, 5), (0, 1, 2, 3)}

From which you can get maximal faces:

sage: C = SimplicialComplex(B).maximal_faces() ; C
{(1, 2, 3, 4), (2, 3, 4, 5), (0, 1, 2, 3)}

In your second example, you get:

sage: A = Set([0,3,9,10,17,30,33])
sage: B = Set(a for a in A.subsets() if len(a) != 0 and max(a) - min(a) <= 3)
sage: C = SimplicialComplex(B).maximal_faces() ; C
{(9, 10), (17,), (30, 33), (0, 3)}
edit flag offensive delete link more

Comments

This also works with RDF members of set A. Important to me.

rickhg12hs gravatar imagerickhg12hs ( 2013-07-03 23:34:12 -0500 )edit

Selecting this as the best answer for the examples given, mostly for its relative simplicity. Thanks!

rickhg12hs gravatar imagerickhg12hs ( 2013-07-07 15:52:40 -0500 )edit

Changed my mind about preferred solution after doing some performance testing. Determining maximal cliques seems to be *** orders of magnitude *** faster.

rickhg12hs gravatar imagerickhg12hs ( 2013-10-19 22:00:31 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 2013-07-03 21:54:17 -0500

Seen: 608 times

Last updated: Oct 19 '13