# More efficient methods for finding subsets of Subsets()

Calling SS = Subsets([1..100000], 4) takes 50 ms to generate a list with about $4 \times 10^{18}$ elements. %%prun suggests the majority of that time is used by misc.py:556(_stable_uniq)

(Note: trying this with $> \sim 10^5$ elements causes an overflow.)

But getting a subset of that list... hoo boy.

NList = [i for i in Subsets([1..100], 4) if sum(i) == 100] takes 45 s to generate a list with 5952 elements, out of $3.9 \times 10^6$ possibilities. About half of that time is used by set.py:465(__init__)

Note that the size of the main set has been decreased by a factor of 1000. That's because this list comprehension seems to run at about $O(n^4)$, so going from 100 to 1000 multiplies the time by around $10^4$; that increases the time from ~45 s to 5... days. (Creating the list, then filtering it, doesn't seem to change the time much.)

So my question is, is there any faster way to filter through such a long list that I don't know of? I'd like to be working in the [1..2000] range, but I don't think my kernel will hold out for 80 days to make that happen. (Probably not even 5 days).

Any suggestions welcome!

edit retag close merge delete

Note that :

sage: binomial(10000,4)
416416712497500


i. e.

sage: binomial(10000,4).n()
4.16416712497500e14


sage: r.choose(10000,4).sage()
416416712497500


So does combinatorics :

sage: Subsets(10000,4).cardinality()
416416712497500


Something may be amiss in your computation...

( 2023-02-11 04:40:37 +0200 )edit

Can you formulate more clearly what problem you are solving?

( 2023-02-11 04:53:56 +0200 )edit

@Emmanuel Charpentier: Your calculations are correct, but the initial calculation up there uses $100000$, not $10000$.

@Max Alekseyev: The context is attempting to make this algorithm work. The initial step requires finding all subsets of [1..n] with a given sum, then finding sets of three distinct subsets whose squares have the same sum (which takes a fraction of the time of the first step).

( 2023-02-11 09:10:47 +0200 )edit

When using your algorithm, you postulate a=100. Do you know (or postulate) b ?

You do not need to build your sets : building iterators in them should suffice to exhibit a solution. Finding all of them is another question... I doubt that it can be done realistically by brute force...

( 2023-02-11 09:50:29 +0200 )edit

Sort by » oldest newest most voted

If you need partitions of an integerk into 4 distinct parts, you can rely on Sage Partitions() generator:

for a,b,c,d in Partitions(k, length=4, max_slope=-1):
...


If you want additionally to bound the parts, then add max_part=... parameter. See this documentation for all available options.

more

I hadn't even thought about just doing it as a partition. Perfect, thanks!

( 2023-02-12 08:38:31 +0200 )edit

Here is an iterator over 4-tuples (a, b, c, d) such that a + b + c + d = k and a < b < c < d. You can also ensure that d <= nmax, which seems a requirement of your problem.

def get_series(k, nmax=None):
"""
Iterator over all series (a, b, c, d) such that:
- a + b + c + d = k
- a < b < c < d

When nmax is specified, we furthermore ensure that a,b,c,d <= nmax.
"""
if nmax is None:
for a in range(1, (k - 6) // 4 + 1):
for b in range(a + 1, (k - a - 3) // 3 + 1):
for c in range(b + 1, (k - a - b - 1) // 2 + 1):
yield (a, b, c, k - a - b - c)
else:
for a in range(1, min(nmax, (k - 6) // 4) + 1):
for b in range(a + 1, min(nmax, (k - a - 3) // 3) + 1):
for c in range(b + 1, min(nmax, (k - a - b - 1) // 2) + 1):
d = k - a - b - c
if d <= nmax:
yield (a, b, c, d)


you can check that

This is way faster than your brute force attempt

sage: %time NList = [i for i in Subsets([1..100], 4) if sum(i) == 100]
CPU times: user 31 s, sys: 83.3 ms, total: 31.1 s
Wall time: 31.1 s
sage: %time L = list(get_series(100))
CPU times: user 7.23 ms, sys: 136 µs, total: 7.36 ms
Wall time: 7.35 ms
sage: len(L) == len(NList)
True
sage: L == [tuple(sorted(i)) for i in NList]
True


This is of course not sufficient for solving efficiently your problem since you have plenty of 4-tuples to consider and then triples of 4-tuples...

sage: %time L = list(get_series(1000))
CPU times: user 2.82 s, sys: 99.1 ms, total: 2.92 s
Wall time: 2.92 s
sage: len(L)
6840777


Another useful idea is to store the tuples in a dictionary with key a^2 + b^2 + c^2 + d^2 and value the list of corresponding tuples (so with same sum and same sum of squares).

more

This is reinventing of Partitions() functionality. See my answer below.

( 2023-02-11 15:22:39 +0200 )edit

Right, thanks.

( 2023-02-12 11:12:41 +0200 )edit