Ask Your Question
0

Best practice to save results of a function as a database

asked 2020-05-02 00:44:02 +0200

jin gravatar image

updated 2020-05-02 16:56:29 +0200

slelievre gravatar image

I have a function defined inductively that taxes my computer quite a lot. It will help a lot if I can save its results somewhere, preventing it from unnecessary calculations.

To simplify, assume that it's a function $f: \mathbb{N} \to \mathbb{N}$, where $f(0)=1$ and $f(i)$ depends on the value of $f$ at lower $i$'s. Imagine that I want to calculate $f(n)$ where $n \sim 10^{1000}$.

Of course, I can do it by

for i in range(1000):
    f(i).save("./f_value_{}".format(i))

But I feel like this can be improved. Any idea?

edit retag flag offensive close merge delete

2 Answers

Sort by » oldest newest most voted
2

answered 2020-05-02 09:34:19 +0200

Sébastien gravatar image

updated 2020-05-02 11:57:56 +0200

One option is to use the @cached_function decorator which saves the result of already computed values:

sage: @cached_function
....: def fib(n):
....:     print('computation performed')
....:     if n == 0 or n == 1:
....:         return 1
....:     else:
....:         return fib(n-1) + fib(n-2)
....:     
sage: fib(5)
computation performed
computation performed
computation performed
computation performed
computation performed
computation performed
8
sage: fib(5)   # no computation is performed
8

Internally, results are saved in a dictionary where keys are of the form (*args,**kwds):

sage: fib.cache
{((1,), ()): 1, ((0,), ()): 1, ((2,), ()): 2, ((3,), ()): 3, ((4,), ()): 5, ((5,), ()): 8}

See the documentation of

sage: cached_function?

Then, you can save the fib function locally:

sage: save(fib, 'fib')
sage: ls fib*
fib.sobj

Then, in the same or in another Sage session, you can reload that function and its previous cache is still there:

sage: gib = load('fib.sobj')
sage: gib.cache
{((1,), ()): 1, ((0,), ()): 1, ((2,), ()): 2, ((3,), ()): 3, ((4,), ()): 5, ((5,), ()): 8}
sage: gib(5)
8

Which means that the default behavior is to include the cache in the pickle (documentation says one should write @cached_function(do_pickle=True) which does not seem necessary). Behind the scene, save is using dumps to create the file fib.sobj which contains binary code looking something like :

sage: dumps(fib)
b'x\x9cm\x8c\xbb\x12@0\x10\x00\xe39&\x7f\xa2\xc9W\xa8\x15*\xddM\x9c\xe0\x06\xc7M\xe8u~\xdb\xab\xd5\xed\x16\xbbG\x88\xde\xf6\xce\xcc\xe4\xd1\xa0\xc5\xc1u;\xa3\x86\x17[xd\xa3\x85a\xe7\x95p\x9c\x9c\x16UgJ)\x80\xd9\x12\x03HPG\xb7v\xd4\xc8\xff\xa9\\\xbe\x92\xb8/\x087-Q^I|JRIj.\xbe\xa8)\xf2'

which can also be loaded:

sage: hib = loads(dumps(fib))
sage: hib.cache
{((1,), ()): 1, ((0,), ()): 1, ((2,), ()): 2, ((3,), ()): 3, ((4,), ()): 5, ((5,), ()): 8}

If the cache becomes very heavy in memory, then saving and loading the function will take long time. I think I already heard of a decorator that query automatically a file if result is already computed or saves automatically to the file (or database) if it is a new computation, but I can't remember what/where I saw that, whether it was a ticket or just a discussion. I am sure people involved in LMFDB project (which computes stuff on the fly if not already in a database) have an idea about that.

edit flag offensive delete link more
1

answered 2020-05-02 16:40:45 +0200

slelievre gravatar image

One needs to decide on some balance between the cost of storing and the cost of computing. Find out what values are worth storing and will make other calculations easier. This might depend on $f$.

Would storing the values at primes or prime powers make more sense? Values at multiple of $2^{30}$?

In any case, $10^{1000}$ is a lot, there is not much hope to compute $f(n)$ for all values of $n$ up to that bound.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 2020-05-02 00:44:02 +0200

Seen: 188 times

Last updated: May 02 '20