Finding subset of lists with common entries

Suppose I have a list of lists. For example,

[ [1, 2, 3, 4], [1, 3, 1, 3], [4, 1, 4, 1], [1, 3, 2, 1], [4, 1, 2, 4] ]

I want SageMath to give me a sub-list of this list that consist of lists where any two have at least one common entry. For example,

[ [1, 2, 3, 4], [1, 3, 1, 3], [1, 3, 2, 1] ]

or

[ [4, 1, 4, 1], [1, 3, 2, 1], [4, 1, 2, 4] ] .

What would be even more amazing is if SageMath could find the largest possible such sub-list. How could I get SageMath to do this? Thank you in advance!

edit retag close merge delete

Does "having a common entry" for two lists A and B mean set(A).intersection(set(B)) != set() or any(a == b for a,b in zip(A,B)) (assuming A and B have the same length)?

( 2023-01-04 20:00:53 +0200 )edit

If you can describe an algorithm, then the code should not be hard to write. What algorithm do you have in mind?

( 2023-01-04 20:02:01 +0200 )edit

By "having a common entry" I mean that the lists have to have the same value in the same column. For example, [1, 2, 3, 4] and [1, 3, 1, 3] both have 1 as entry 1. I honestly don't even know where to begin to come up with an algorithm for this. Whatever I come up with seems convoluted.

( 2023-01-04 20:36:57 +0200 )edit

Sort by ยป oldest newest most voted

Give this one a try:

def find_common_lists(lists):
common_lists = []
for i in range(len(lists)):
for j in range(i+1, len(lists)):
for k in range(len(lists[i])):
if lists[i][k] == lists[j][k]:
common_lists.append(lists[i])
common_lists.append(lists[j])
break
return common_lists

def find_largest_common_lists(lists):
largest_common_lists = []
for i in range(len(lists)):
for j in range(i+1, len(lists)):
for k in range(len(lists[i])):
if lists[i][k] == lists[j][k]:
common_lists = [lists[i], lists[j]]
for l in range(j+1, len(lists)):
for m in range(len(lists[l])):
if lists[l][m] == lists[i][m]:
common_lists.append(lists[l])
break
if len(common_lists) > len(largest_common_lists):
largest_common_lists = common_lists
return largest_common_lists

# Test the find_common_lists function
lists = [ [1, 2, 3, 4], [1, 3, 1, 3], [4, 1, 4, 1], [1, 3, 2, 1], [4, 1, 2, 4] ]
print(find_common_lists(lists))

# Test the find_largest_common_lists function
lists = [lists = [ [1, 2, 3, 4], [1, 3, 1, 3], [4, 1, 4, 1], [1, 3, 2, 1], [4, 1, 2, 4] ]
print(find_largest_common_lists(lists))

more

1

@cmark Thank you. This, however, finds lists with common elements, but not necessarily the same entries in the same column. Sorry, I was unclear about this. Is there a way to find a sub-list that consists of lists that share at least one entry in the same column? For example, [1, 2, 3, 4] and [1, 3, 1, 3] both have entry 1 in column 1.

( 2023-01-04 20:50:22 +0200 )edit

I've edited the code, please try that (I'm not at a computer at the moment so I couldn't try, I just hope it does work as you expected, please excuse me if it doesn't).

( 2023-01-04 21:59:43 +0200 )edit

Name the list:

sage: L = [ [1, 2, 3, 4], [1, 3, 1, 3], [4, 1, 4, 1], [1, 3, 2, 1], [4, 1, 2, 4] ]


Generate all desirable sublists:

sage: [list(map(list,S)) for S in Subsets(map(tuple,L)) if all(any(a == b for a, b in zip(A,B)) for A, B in Combinations(S, 2))]
[[],
[[1, 2, 3, 4]],
[[1, 3, 1, 3]],
[[4, 1, 4, 1]],
[[1, 3, 2, 1]],
[[4, 1, 2, 4]],
[[1, 3, 1, 3], [1, 2, 3, 4]],
[[1, 3, 2, 1], [1, 2, 3, 4]],
[[4, 1, 2, 4], [1, 2, 3, 4]],
[[1, 3, 2, 1], [1, 3, 1, 3]],
[[1, 3, 2, 1], [4, 1, 4, 1]],
[[4, 1, 2, 4], [4, 1, 4, 1]],
[[1, 3, 2, 1], [4, 1, 2, 4]],
[[1, 3, 2, 1], [1, 3, 1, 3], [1, 2, 3, 4]],
[[1, 3, 2, 1], [4, 1, 2, 4], [1, 2, 3, 4]],
[[1, 3, 2, 1], [4, 1, 2, 4], [4, 1, 4, 1]]]


Or just some longest one:

sage: max((list(map(list,S)) for S in Subsets(map(tuple,L)) if all(any(a == b for a, b in zip(A,B)) for A, B in Combinations(S, 2))), key=lambda C: len(C))
[[1, 3, 2, 1], [1, 3, 1, 3], [1, 2, 3, 4]]

more

1

An improvement of your method is to build a graph with one vertex per list and edges between lists with a common entry.

sage: n = len(L)
sage: G = Graph(n)
sage: G.add_edges((i, j) for i, j in Combinations(n, 2)  if any(a == b for a, b in zip(L[i], L[j])))


The longest sequence is the maximum clique

sage: [L[i] for i in G.clique_maximum()]
[[4, 1, 4, 1], [1, 3, 2, 1], [4, 1, 2, 4]]

( 2023-01-05 18:47:07 +0200 )edit