Ask Your Question
1

Fit model data

asked 2021-03-03 15:41:04 +0100

phcosta gravatar image

updated 2021-03-03 15:45:25 +0100

I have a very naive question about the find_fit model. Is there some way to choose the "best" one? For example, in this simple situation:

R = [[0,1],[4,2],[8,4],[12,8],[16,16],[20,32],[24,64],[28,128],[32,256]]
p = scatter_plot(R)
p.show()
model(x) = 2^(a*x)
find_fit(R,model)

It will return a==0.25. But for choosing this model 2^(a*x), of course, I already know the answer. If you choose something more generic that makes more sense like

model(x) = a^(b*x)

It will return a == 1.1017945777232188, b == 1.7875622665869968. So I was wondering if there is a way to get the optimized model for it. Many thanks.

edit retag flag offensive close merge delete

Comments

Up to a small numerical difference the results are the same, because for the second set of parameters $a^b \approx 2^{1/4}$. What is the question?

rburing gravatar imagerburing ( 2021-03-03 17:13:21 +0100 )edit

You are trying to reduce to computation a choice among a (nn-enumerable) infinite set of possible models. This is the subject of statistics (among other) ; it is usually foolish to tackle this question without prior subject matter knowledge about the origin of the data you are trying to model.

Furthermore, the meaning of "best fit" is far from obvious : smallest error ? Smallest squared error ? Smallest predictive error ?

Do you want to use it for further prediction ? If so, can you express your desired performance in terms of (predictive) error metrics ? In terms of distribution of these errors ? In terms of maximal error ? This is highly dependent of the goal(s) you have...

If you take into account the complexity (or (lack of) parsimony) of a model, lots of choice have to be made..

Emmanuel Charpentier gravatar imageEmmanuel Charpentier ( 2021-03-03 18:49:09 +0100 )edit

1 Answer

Sort by ยป oldest newest most voted
1

answered 2021-03-03 17:17:18 +0100

slelievre gravatar image

To obtain the version of the model with the "best fit" for the parameters found by find_fit, use subs.

If that was not the question, can you clarify it?

Example of applying subs to the result of find_fit:

sage: R = [[0, 1], [4, 2], [8, 4], [12, 8], [16, 16],
....:      [20, 32], [24, 64], [28, 128], [32, 256]]
sage: p = scatter_plot(R)

sage: a = var('a')
sage: model(x) = 2^(a*x)
sage: fit = find_fit(R, model)

sage: f(x) = model(x).subs(fit)
sage: f
x |--> 2^(0.25*x)

sage: a, b = SR.var('a, b')
sage: model(x) = a^(b*x)
sage: fit = find_fit(R, model)
sage: fit
[a == 1.1017948048163804, b == 1.7875584659241468]

sage: g(x) = model(x).subs(fit)
sage: g
x |--> 1.1017948048163804^(1.7875584659241468*x)
edit flag offensive delete link more

Comments

It is exactly the point. Thank you very much for the suggestion.

phcosta gravatar imagephcosta ( 2021-03-03 17:31:32 +0100 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2021-03-03 15:41:04 +0100

Seen: 933 times

Last updated: Mar 03 '21