Loading [MathJax]/jax/output/HTML-CSS/jax.js
Ask Your Question
0

Best practice to save results of a function as a database

asked 4 years ago

jin gravatar image

updated 4 years ago

slelievre gravatar image

I have a function defined inductively that taxes my computer quite a lot. It will help a lot if I can save its results somewhere, preventing it from unnecessary calculations.

To simplify, assume that it's a function f:NN, where f(0)=1 and f(i) depends on the value of f at lower i's. Imagine that I want to calculate f(n) where n101000.

Of course, I can do it by

for i in range(1000):
    f(i).save("./f_value_{}".format(i))

But I feel like this can be improved. Any idea?

Preview: (hide)

2 Answers

Sort by » oldest newest most voted
2

answered 4 years ago

Sébastien gravatar image

updated 4 years ago

One option is to use the @cached_function decorator which saves the result of already computed values:

sage: @cached_function
....: def fib(n):
....:     print('computation performed')
....:     if n == 0 or n == 1:
....:         return 1
....:     else:
....:         return fib(n-1) + fib(n-2)
....:     
sage: fib(5)
computation performed
computation performed
computation performed
computation performed
computation performed
computation performed
8
sage: fib(5)   # no computation is performed
8

Internally, results are saved in a dictionary where keys are of the form (*args,**kwds):

sage: fib.cache
{((1,), ()): 1, ((0,), ()): 1, ((2,), ()): 2, ((3,), ()): 3, ((4,), ()): 5, ((5,), ()): 8}

See the documentation of

sage: cached_function?

Then, you can save the fib function locally:

sage: save(fib, 'fib')
sage: ls fib*
fib.sobj

Then, in the same or in another Sage session, you can reload that function and its previous cache is still there:

sage: gib = load('fib.sobj')
sage: gib.cache
{((1,), ()): 1, ((0,), ()): 1, ((2,), ()): 2, ((3,), ()): 3, ((4,), ()): 5, ((5,), ()): 8}
sage: gib(5)
8

Which means that the default behavior is to include the cache in the pickle (documentation says one should write @cached_function(do_pickle=True) which does not seem necessary). Behind the scene, save is using dumps to create the file fib.sobj which contains binary code looking something like :

sage: dumps(fib)
b'x\x9cm\x8c\xbb\x12@0\x10\x00\xe39&\x7f\xa2\xc9W\xa8\x15*\xddM\x9c\xe0\x06\xc7M\xe8u~\xdb\xab\xd5\xed\x16\xbbG\x88\xde\xf6\xce\xcc\xe4\xd1\xa0\xc5\xc1u;\xa3\x86\x17[xd\xa3\x85a\xe7\x95p\x9c\x9c\x16UgJ)\x80\xd9\x12\x03HPG\xb7v\xd4\xc8\xff\xa9\\\xbe\x92\xb8/\x087-Q^I|JRIj.\xbe\xa8)\xf2'

which can also be loaded:

sage: hib = loads(dumps(fib))
sage: hib.cache
{((1,), ()): 1, ((0,), ()): 1, ((2,), ()): 2, ((3,), ()): 3, ((4,), ()): 5, ((5,), ()): 8}

If the cache becomes very heavy in memory, then saving and loading the function will take long time. I think I already heard of a decorator that query automatically a file if result is already computed or saves automatically to the file (or database) if it is a new computation, but I can't remember what/where I saw that, whether it was a ticket or just a discussion. I am sure people involved in LMFDB project (which computes stuff on the fly if not already in a database) have an idea about that.

Preview: (hide)
link
1

answered 4 years ago

slelievre gravatar image

One needs to decide on some balance between the cost of storing and the cost of computing. Find out what values are worth storing and will make other calculations easier. This might depend on f.

Would storing the values at primes or prime powers make more sense? Values at multiple of 230?

In any case, 101000 is a lot, there is not much hope to compute f(n) for all values of n up to that bound.

Preview: (hide)
link

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 4 years ago

Seen: 303 times

Last updated: May 02 '20