ASKSAGE: Sage Q&A Forum - RSS feedhttps://ask.sagemath.org/questions/Q&A Forum for SageenCopyright Sage, 2010. Some rights reserved under creative commons license.Sun, 12 Feb 2023 11:12:41 +0100More efficient methods for finding subsets of Subsets()https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/Calling `SS = Subsets([1..100000], 4)` takes 50 ms to generate a list with about $4 \times 10^{18}$ elements. `%%prun` suggests the majority of that time is used by `misc.py:556(_stable_uniq)`
(Note: trying this with $> \sim 10^5$ elements causes an overflow.)
But getting a subset of *that* list... hoo boy.
`NList = [i for i in Subsets([1..100], 4) if sum(i) == 100]` takes 45 s to generate a list with 5952 elements, out of $3.9 \times 10^6$ possibilities. About half of that time is used by `set.py:465(__init__)`
Note that the size of the main set has been decreased by a factor of *1000*. That's because this list comprehension seems to run at about $O(n^4)$, so going from 100 to 1000 multiplies the time by around $10^4$; that increases the time from ~45 s to 5... days. (Creating the list, then filtering it, doesn't seem to change the time much.)
So my question is, is there any faster way to filter through such a long list that I don't know of? I'd like to be working in the `[1..2000]` range, but I don't think my kernel will hold out for 80 days to make that happen. (Probably not even 5 days).
Any suggestions welcome!Fri, 10 Feb 2023 23:48:29 +0100https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/Comment by Emmanuel Charpentier for <p>Calling <code>SS = Subsets([1..100000], 4)</code> takes 50 ms to generate a list with about $4 \times 10^{18}$ elements. <code>%%prun</code> suggests the majority of that time is used by <code>misc.py:556(_stable_uniq)</code></p>
<p>(Note: trying this with $> \sim 10^5$ elements causes an overflow.)</p>
<p>But getting a subset of <em>that</em> list... hoo boy.</p>
<p><code>NList = [i for i in Subsets([1..100], 4) if sum(i) == 100]</code> takes 45 s to generate a list with 5952 elements, out of $3.9 \times 10^6$ possibilities. About half of that time is used by <code>set.py:465(__init__)</code></p>
<p>Note that the size of the main set has been decreased by a factor of <em>1000</em>. That's because this list comprehension seems to run at about $O(n^4)$, so going from 100 to 1000 multiplies the time by around $10^4$; that increases the time from ~45 s to 5... days. (Creating the list, then filtering it, doesn't seem to change the time much.)</p>
<p>So my question is, is there any faster way to filter through such a long list that I don't know of? I'd like to be working in the <code>[1..2000]</code> range, but I don't think my kernel will hold out for 80 days to make that happen. (Probably not even 5 days).</p>
<p>Any suggestions welcome!</p>
https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?comment=66340#post-id-66340When using [your algorithm](http://www.multimagie.com/English/Morgenstern24.htm#Note), you postulate `a=100`. Do you know (or postulate) `b` ?
You do not need to build your sets : building *iterators* in them should suffice to *exhibit* a solution. Finding all of them is another question... I doubt that it can be done realistically by brute force...Sat, 11 Feb 2023 09:50:29 +0100https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?comment=66340#post-id-66340Comment by Eric Snyder for <p>Calling <code>SS = Subsets([1..100000], 4)</code> takes 50 ms to generate a list with about $4 \times 10^{18}$ elements. <code>%%prun</code> suggests the majority of that time is used by <code>misc.py:556(_stable_uniq)</code></p>
<p>(Note: trying this with $> \sim 10^5$ elements causes an overflow.)</p>
<p>But getting a subset of <em>that</em> list... hoo boy.</p>
<p><code>NList = [i for i in Subsets([1..100], 4) if sum(i) == 100]</code> takes 45 s to generate a list with 5952 elements, out of $3.9 \times 10^6$ possibilities. About half of that time is used by <code>set.py:465(__init__)</code></p>
<p>Note that the size of the main set has been decreased by a factor of <em>1000</em>. That's because this list comprehension seems to run at about $O(n^4)$, so going from 100 to 1000 multiplies the time by around $10^4$; that increases the time from ~45 s to 5... days. (Creating the list, then filtering it, doesn't seem to change the time much.)</p>
<p>So my question is, is there any faster way to filter through such a long list that I don't know of? I'd like to be working in the <code>[1..2000]</code> range, but I don't think my kernel will hold out for 80 days to make that happen. (Probably not even 5 days).</p>
<p>Any suggestions welcome!</p>
https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?comment=66339#post-id-66339@Emmanuel Charpentier: Your calculations are correct, but the initial calculation up there uses $100000$, not $10000$.
@Max Alekseyev: The context is attempting to make [this algorithm](http://www.multimagie.com/English/Morgenstern24.htm#Note) work. The initial step requires finding all subsets of `[1..n]` with a given sum, then finding sets of three distinct subsets whose squares have the same sum (which takes a fraction of the time of the first step).Sat, 11 Feb 2023 09:10:47 +0100https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?comment=66339#post-id-66339Comment by Max Alekseyev for <p>Calling <code>SS = Subsets([1..100000], 4)</code> takes 50 ms to generate a list with about $4 \times 10^{18}$ elements. <code>%%prun</code> suggests the majority of that time is used by <code>misc.py:556(_stable_uniq)</code></p>
<p>(Note: trying this with $> \sim 10^5$ elements causes an overflow.)</p>
<p>But getting a subset of <em>that</em> list... hoo boy.</p>
<p><code>NList = [i for i in Subsets([1..100], 4) if sum(i) == 100]</code> takes 45 s to generate a list with 5952 elements, out of $3.9 \times 10^6$ possibilities. About half of that time is used by <code>set.py:465(__init__)</code></p>
<p>Note that the size of the main set has been decreased by a factor of <em>1000</em>. That's because this list comprehension seems to run at about $O(n^4)$, so going from 100 to 1000 multiplies the time by around $10^4$; that increases the time from ~45 s to 5... days. (Creating the list, then filtering it, doesn't seem to change the time much.)</p>
<p>So my question is, is there any faster way to filter through such a long list that I don't know of? I'd like to be working in the <code>[1..2000]</code> range, but I don't think my kernel will hold out for 80 days to make that happen. (Probably not even 5 days).</p>
<p>Any suggestions welcome!</p>
https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?comment=66338#post-id-66338Can you formulate more clearly what problem you are solving?Sat, 11 Feb 2023 04:53:56 +0100https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?comment=66338#post-id-66338Comment by Emmanuel Charpentier for <p>Calling <code>SS = Subsets([1..100000], 4)</code> takes 50 ms to generate a list with about $4 \times 10^{18}$ elements. <code>%%prun</code> suggests the majority of that time is used by <code>misc.py:556(_stable_uniq)</code></p>
<p>(Note: trying this with $> \sim 10^5$ elements causes an overflow.)</p>
<p>But getting a subset of <em>that</em> list... hoo boy.</p>
<p><code>NList = [i for i in Subsets([1..100], 4) if sum(i) == 100]</code> takes 45 s to generate a list with 5952 elements, out of $3.9 \times 10^6$ possibilities. About half of that time is used by <code>set.py:465(__init__)</code></p>
<p>Note that the size of the main set has been decreased by a factor of <em>1000</em>. That's because this list comprehension seems to run at about $O(n^4)$, so going from 100 to 1000 multiplies the time by around $10^4$; that increases the time from ~45 s to 5... days. (Creating the list, then filtering it, doesn't seem to change the time much.)</p>
<p>So my question is, is there any faster way to filter through such a long list that I don't know of? I'd like to be working in the <code>[1..2000]</code> range, but I don't think my kernel will hold out for 80 days to make that happen. (Probably not even 5 days).</p>
<p>Any suggestions welcome!</p>
https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?comment=66337#post-id-66337Note that :
sage: binomial(10000,4)
416416712497500
*i. e.*
sage: binomial(10000,4).n()
4.16416712497500e14
about 10000 times less that your result. R agrees :
sage: r.choose(10000,4).sage()
416416712497500
So does combinatorics :
sage: Subsets(10000,4).cardinality()
416416712497500
Something may be amiss in your computation...Sat, 11 Feb 2023 04:40:37 +0100https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?comment=66337#post-id-66337Answer by Max Alekseyev for <p>Calling <code>SS = Subsets([1..100000], 4)</code> takes 50 ms to generate a list with about $4 \times 10^{18}$ elements. <code>%%prun</code> suggests the majority of that time is used by <code>misc.py:556(_stable_uniq)</code></p>
<p>(Note: trying this with $> \sim 10^5$ elements causes an overflow.)</p>
<p>But getting a subset of <em>that</em> list... hoo boy.</p>
<p><code>NList = [i for i in Subsets([1..100], 4) if sum(i) == 100]</code> takes 45 s to generate a list with 5952 elements, out of $3.9 \times 10^6$ possibilities. About half of that time is used by <code>set.py:465(__init__)</code></p>
<p>Note that the size of the main set has been decreased by a factor of <em>1000</em>. That's because this list comprehension seems to run at about $O(n^4)$, so going from 100 to 1000 multiplies the time by around $10^4$; that increases the time from ~45 s to 5... days. (Creating the list, then filtering it, doesn't seem to change the time much.)</p>
<p>So my question is, is there any faster way to filter through such a long list that I don't know of? I'd like to be working in the <code>[1..2000]</code> range, but I don't think my kernel will hold out for 80 days to make that happen. (Probably not even 5 days).</p>
<p>Any suggestions welcome!</p>
https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?answer=66343#post-id-66343If you need partitions of an integer`k` into 4 distinct parts, you can rely on Sage `Partitions()` generator:
for a,b,c,d in Partitions(k, length=4, max_slope=-1):
...
If you want additionally to bound the parts, then add `max_part=...` parameter. See [this documentation](https://doc.sagemath.org/html/en/reference/combinat/sage/combinat/partition.html#sage.combinat.partition.Partitions) for all available options.Sat, 11 Feb 2023 15:22:03 +0100https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?answer=66343#post-id-66343Comment by Eric Snyder for <p>If you need partitions of an integer<code>k</code> into 4 distinct parts, you can rely on Sage <code>Partitions()</code> generator:</p>
<pre><code>for a,b,c,d in Partitions(k, length=4, max_slope=-1):
...
</code></pre>
<p>If you want additionally to bound the parts, then add <code>max_part=...</code> parameter. See <a href="https://doc.sagemath.org/html/en/reference/combinat/sage/combinat/partition.html#sage.combinat.partition.Partitions">this documentation</a> for all available options.</p>
https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?comment=66350#post-id-66350I hadn't even thought about just doing it as a partition. Perfect, thanks!Sun, 12 Feb 2023 08:38:31 +0100https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?comment=66350#post-id-66350Answer by David Coudert for <p>Calling <code>SS = Subsets([1..100000], 4)</code> takes 50 ms to generate a list with about $4 \times 10^{18}$ elements. <code>%%prun</code> suggests the majority of that time is used by <code>misc.py:556(_stable_uniq)</code></p>
<p>(Note: trying this with $> \sim 10^5$ elements causes an overflow.)</p>
<p>But getting a subset of <em>that</em> list... hoo boy.</p>
<p><code>NList = [i for i in Subsets([1..100], 4) if sum(i) == 100]</code> takes 45 s to generate a list with 5952 elements, out of $3.9 \times 10^6$ possibilities. About half of that time is used by <code>set.py:465(__init__)</code></p>
<p>Note that the size of the main set has been decreased by a factor of <em>1000</em>. That's because this list comprehension seems to run at about $O(n^4)$, so going from 100 to 1000 multiplies the time by around $10^4$; that increases the time from ~45 s to 5... days. (Creating the list, then filtering it, doesn't seem to change the time much.)</p>
<p>So my question is, is there any faster way to filter through such a long list that I don't know of? I'd like to be working in the <code>[1..2000]</code> range, but I don't think my kernel will hold out for 80 days to make that happen. (Probably not even 5 days).</p>
<p>Any suggestions welcome!</p>
https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?answer=66342#post-id-66342Here is an iterator over 4-tuples `(a, b, c, d)` such that `a + b + c + d = k` and `a < b < c < d`. You can also ensure that `d <= nmax`, which seems a requirement of your problem.
def get_series(k, nmax=None):
"""
Iterator over all series (a, b, c, d) such that:
- a + b + c + d = k
- a < b < c < d
When nmax is specified, we furthermore ensure that a,b,c,d <= nmax.
"""
if nmax is None:
for a in range(1, (k - 6) // 4 + 1):
for b in range(a + 1, (k - a - 3) // 3 + 1):
for c in range(b + 1, (k - a - b - 1) // 2 + 1):
yield (a, b, c, k - a - b - c)
else:
for a in range(1, min(nmax, (k - 6) // 4) + 1):
for b in range(a + 1, min(nmax, (k - a - 3) // 3) + 1):
for c in range(b + 1, min(nmax, (k - a - b - 1) // 2) + 1):
d = k - a - b - c
if d <= nmax:
yield (a, b, c, d)
you can check that
This is way faster than your brute force attempt
sage: %time NList = [i for i in Subsets([1..100], 4) if sum(i) == 100]
CPU times: user 31 s, sys: 83.3 ms, total: 31.1 s
Wall time: 31.1 s
sage: %time L = list(get_series(100))
CPU times: user 7.23 ms, sys: 136 µs, total: 7.36 ms
Wall time: 7.35 ms
sage: len(L) == len(NList)
True
sage: L == [tuple(sorted(i)) for i in NList]
True
This is of course not sufficient for solving efficiently your problem since you have plenty of 4-tuples to consider and then triples of 4-tuples...
sage: %time L = list(get_series(1000))
CPU times: user 2.82 s, sys: 99.1 ms, total: 2.92 s
Wall time: 2.92 s
sage: len(L)
6840777
Another useful idea is to store the tuples in a dictionary with key `a^2 + b^2 + c^2 + d^2` and value the list of corresponding tuples (so with same sum and same sum of squares).Sat, 11 Feb 2023 15:05:30 +0100https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?answer=66342#post-id-66342Comment by David Coudert for <p>Here is an iterator over 4-tuples <code>(a, b, c, d)</code> such that <code>a + b + c + d = k</code> and <code>a < b < c < d</code>. You can also ensure that <code>d <= nmax</code>, which seems a requirement of your problem.</p>
<pre><code>def get_series(k, nmax=None):
"""
Iterator over all series (a, b, c, d) such that:
- a + b + c + d = k
- a < b < c < d
When nmax is specified, we furthermore ensure that a,b,c,d <= nmax.
"""
if nmax is None:
for a in range(1, (k - 6) // 4 + 1):
for b in range(a + 1, (k - a - 3) // 3 + 1):
for c in range(b + 1, (k - a - b - 1) // 2 + 1):
yield (a, b, c, k - a - b - c)
else:
for a in range(1, min(nmax, (k - 6) // 4) + 1):
for b in range(a + 1, min(nmax, (k - a - 3) // 3) + 1):
for c in range(b + 1, min(nmax, (k - a - b - 1) // 2) + 1):
d = k - a - b - c
if d <= nmax:
yield (a, b, c, d)
</code></pre>
<p>you can check that </p>
<p>This is way faster than your brute force attempt</p>
<pre><code>sage: %time NList = [i for i in Subsets([1..100], 4) if sum(i) == 100]
CPU times: user 31 s, sys: 83.3 ms, total: 31.1 s
Wall time: 31.1 s
sage: %time L = list(get_series(100))
CPU times: user 7.23 ms, sys: 136 µs, total: 7.36 ms
Wall time: 7.35 ms
sage: len(L) == len(NList)
True
sage: L == [tuple(sorted(i)) for i in NList]
True
</code></pre>
<p>This is of course not sufficient for solving efficiently your problem since you have plenty of 4-tuples to consider and then triples of 4-tuples...</p>
<pre><code>sage: %time L = list(get_series(1000))
CPU times: user 2.82 s, sys: 99.1 ms, total: 2.92 s
Wall time: 2.92 s
sage: len(L)
6840777
</code></pre>
<p>Another useful idea is to store the tuples in a dictionary with key <code>a^2 + b^2 + c^2 + d^2</code> and value the list of corresponding tuples (so with same sum and same sum of squares).</p>
https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?comment=66352#post-id-66352Right, thanks.Sun, 12 Feb 2023 11:12:41 +0100https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?comment=66352#post-id-66352Comment by Max Alekseyev for <p>Here is an iterator over 4-tuples <code>(a, b, c, d)</code> such that <code>a + b + c + d = k</code> and <code>a < b < c < d</code>. You can also ensure that <code>d <= nmax</code>, which seems a requirement of your problem.</p>
<pre><code>def get_series(k, nmax=None):
"""
Iterator over all series (a, b, c, d) such that:
- a + b + c + d = k
- a < b < c < d
When nmax is specified, we furthermore ensure that a,b,c,d <= nmax.
"""
if nmax is None:
for a in range(1, (k - 6) // 4 + 1):
for b in range(a + 1, (k - a - 3) // 3 + 1):
for c in range(b + 1, (k - a - b - 1) // 2 + 1):
yield (a, b, c, k - a - b - c)
else:
for a in range(1, min(nmax, (k - 6) // 4) + 1):
for b in range(a + 1, min(nmax, (k - a - 3) // 3) + 1):
for c in range(b + 1, min(nmax, (k - a - b - 1) // 2) + 1):
d = k - a - b - c
if d <= nmax:
yield (a, b, c, d)
</code></pre>
<p>you can check that </p>
<p>This is way faster than your brute force attempt</p>
<pre><code>sage: %time NList = [i for i in Subsets([1..100], 4) if sum(i) == 100]
CPU times: user 31 s, sys: 83.3 ms, total: 31.1 s
Wall time: 31.1 s
sage: %time L = list(get_series(100))
CPU times: user 7.23 ms, sys: 136 µs, total: 7.36 ms
Wall time: 7.35 ms
sage: len(L) == len(NList)
True
sage: L == [tuple(sorted(i)) for i in NList]
True
</code></pre>
<p>This is of course not sufficient for solving efficiently your problem since you have plenty of 4-tuples to consider and then triples of 4-tuples...</p>
<pre><code>sage: %time L = list(get_series(1000))
CPU times: user 2.82 s, sys: 99.1 ms, total: 2.92 s
Wall time: 2.92 s
sage: len(L)
6840777
</code></pre>
<p>Another useful idea is to store the tuples in a dictionary with key <code>a^2 + b^2 + c^2 + d^2</code> and value the list of corresponding tuples (so with same sum and same sum of squares).</p>
https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?comment=66344#post-id-66344This is reinventing of `Partitions()` functionality. See my answer below.Sat, 11 Feb 2023 15:22:39 +0100https://ask.sagemath.org/question/66336/more-efficient-methods-for-finding-subsets-of-subsets/?comment=66344#post-id-66344