Ask Your Question
1

Sample a large random number

asked 2023-11-06 02:24:09 +0100

dmishins gravatar image

I am trying to randomly (uniformly) sample an integer between 0, and say 2^1000 (or ideally an arbitrary large number). Attempting to do by calling random sample yields an overflow error:

sample(range(2^130), 7)

OverflowError: Python int too large to convert to C ssize_t Is it possible to do this?

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
1

answered 2023-11-06 04:58:12 +0100

updated 2023-11-06 05:00:33 +0100

First, range uses the Python int class, which can overflow (as you have experienced). The function srange uses the Sagemath Integer class instead, which will not overflow. Second, using sample is probably not a good idea. On my somewhat old computer:

sage: %time sample(srange(2^24), 7)
CPU times: user 4.11 s, sys: 556 ms, total: 4.66 s
Wall time: 6 s
sage: %time sample(srange(2^25), 7)
CPU times: user 8.95 s, sys: 1.34 s, total: 10.3 s
Wall time: 14.2 s
[14635440, 2287708, 14649651, 26289169, 18124119, 30825037, 17257883]
sage: %time sample(srange(2^26), 7)
CPU times: user 17.1 s, sys: 2.34 s, total: 19.5 s
Wall time: 23 s
[9787083, 12313568, 7761365, 8998698, 55334892, 50923481, 65747342]

It seems to be roughly doubling in time as the argument to srange doubles. (The issue is that srange actually constructs the whole list first.) Needless to say, I don't want to try 2^130. Much faster is to use randint repeatedly (and the changes of getting the same integer twice are pretty small:

sage: %time [randint(1, 2^130) for n in range(7)]
CPU times: user 63 µs, sys: 1 µs, total: 64 µs
Wall time: 67 µs
[1341203791323696537546187325304798025292,
 1106243203902472361349055754890433865742,
 1038420201477214469366016121149956168427,
 1242849022616461088502016645925189018763,
 286347095973575864665046785261064404737,
 286152873235140220458041877475917123637,
 41617504738350918940022502684766969623]

This is almost instant. The command A = [randint(1, 2^100000000) for n in range(7)] took almost a second for me. It actually takes longer to print the elements of A. Also, you could write a loop instead to make absolutely sure you don't get any repetitions, or test it afterwards, but it's probably not necessary if you're using such large sample sizes.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 2023-11-06 02:24:09 +0100

Seen: 70 times

Last updated: Nov 06 '23