More efficient methods for finding subsets of Subsets()
Calling SS = Subsets([1..100000], 4)
takes 50 ms to generate a list with about 4×1018 elements. %%prun
suggests the majority of that time is used by misc.py:556(_stable_uniq)
(Note: trying this with >∼105 elements causes an overflow.)
But getting a subset of that list... hoo boy.
NList = [i for i in Subsets([1..100], 4) if sum(i) == 100]
takes 45 s to generate a list with 5952 elements, out of 3.9×106 possibilities. About half of that time is used by set.py:465(__init__)
Note that the size of the main set has been decreased by a factor of 1000. That's because this list comprehension seems to run at about O(n4), so going from 100 to 1000 multiplies the time by around 104; that increases the time from ~45 s to 5... days. (Creating the list, then filtering it, doesn't seem to change the time much.)
So my question is, is there any faster way to filter through such a long list that I don't know of? I'd like to be working in the [1..2000]
range, but I don't think my kernel will hold out for 80 days to make that happen. (Probably not even 5 days).
Any suggestions welcome!
Note that :
i. e.
about 10000 times less that your result. R agrees :
So does combinatorics :
Something may be amiss in your computation...
Can you formulate more clearly what problem you are solving?
@Emmanuel Charpentier: Your calculations are correct, but the initial calculation up there uses 100000, not 10000.
@Max Alekseyev: The context is attempting to make this algorithm work. The initial step requires finding all subsets of
[1..n]
with a given sum, then finding sets of three distinct subsets whose squares have the same sum (which takes a fraction of the time of the first step).When using your algorithm, you postulate
a=100
. Do you know (or postulate)b
?You do not need to build your sets : building iterators in them should suffice to exhibit a solution. Finding all of them is another question... I doubt that it can be done realistically by brute force...