Conditional sum

calculus

asked 2023-03-31 19:12:12 +0200

Cyrille
1409 ●45 ●96 ●157

I have this data [income, percentage]

D=[[1300.0, 0.0476],
 [1350.0, 0.142],
 [1500.0, 0.142],
 [1600.0, 0.0476],
 [1700.0, 0.0476],
 [1800.0, 0.0476],
 [1820.0, 0.0476],
 [1900.0, 0.0476],
 [2000.0, 0.0952],
 [2400.0, 0.0952],
 [4500.0, 0.0476],
 [4900.0, 0.0952],
 [5000.0, 0.0952]]

I would like to calculate the sum of the incomes conditional to the fact that the cumulative percentages is $\leq 40\%$ and the same thing for $\leq 10\%$ starting from the end of the list.

Of course it's not a too complex task but I would like to know if we can do that inside a conditional sum that is a sum conditionned by an an other. I have something like

sum(x[0] for x in D while sum(x[1]  for x in D) <= 0.4)

which for obvious reasons cannot work.

edit retag flag offensive close merge delete

add a comment

2 Answers

Sort by » oldest newest most voted

answered 2023-03-31 21:51:11 +0200

Emmanuel Charpentier
7834 ●10 ●54 ●158

updated 2023-04-04 10:42:17 +0200

What you say you want to compute can be done as :

sage: MD=matrix(D)
# R's cumsum is a swell way to compute this "cumulative sum", a nice alternative to itertools.accumulate...
sage: MD=MD.augment(vector(r.cumsum(MD.column(1).list()).sage()))
sage: sum([MD.column(0)[u]*(MD.column(2)[u]<=0.4) for u in range(MD.nrows())])
5750.00000000000

But I think that's not what your'e aiming at, which seems to be the 0.4th quantile of the income distribution. Which is about 1650 by ophthalmic interpolation, not what you compute.

Which I leave to you for now, as an exercise. Yell for help if necessary.

Hint: R and R libraries have tons of utilities for working with distributions and their cumulatives... and the semantics of R boolean indexing are quite helphul to roll your own.

A Sage alternative :

sage: g = spline(zip(MD.column(2), MD.column(0)))
sage: g(0.4)
1642.4180721758974
sage: g(0.9)
4917.845951606835

Explanations on request...

HTH,

EDIT : Since Sagemath includes Scipy, its interpolation functions are a good resource to solve your problem...

edit flag offensive delete link

Comments

Emmanuel thanks. You often tell me to learn Python. I have decided to program somes statistical formula I can find easily in some softwares. Other are not.

Cyrille ( 2023-04-03 16:04:57 +0200 )edit

Emmanuel I need An explanation. As I read it spline need as input a list composed of couples of the type (c,y) . But when when it is called you used only one coordinate. Hopw such a function can be writen ?

Cyrille ( 2023-04-03 16:39:14 +0200 )edit

Supposing x and y being lists (or other iterables) of the same length,

zip(x, y) returns a generator of two-elements tuples composed of correspondingx and y elements.
spline(zip(x, y))returns an interpolation function of these correspondances by piecewise cubic polynomials, minimizing
- the sum of squares between the function and the orignal points, and
- the total curvature (integral of squared second derivatives) of the curve.

Try this :

sage: set_random_seed(1789)
sage: M=matrix(RDF, 10,1, range(10))
sage: M=M.augment(vector(map(lambda u:u^2/5-u+0.5-random(), M.column(0))))
sage: points(M)+plot(g, (0, 9))

and explore the resulting objects...

There are many types of splines ; R has implementations for a bunch of them.

HTH,

Emmanuel Charpentier ( 2023-04-03 17:39:37 +0200 )edit

The spline interpolation may give unreasonable values at points of strong curvature variation ; you may prefer a piecewise interpolation, such as the one given bu R's approxfun function. Compare :

sage: rf=r.approxfun(MD.column(2).list(), MD.column(0).list())
sage: f=lambda x:rf(x).sage()
sage: rf=r.approxfun(MD.column(2).list(), MD.column(0).list())
sage: f=lambda x:rf(x).sage()
sage: plot([f, g], (0, 1), legend_label=["linear", "spline"])

Emmanuel Charpentier ( 2023-04-04 09:36:22 +0200 )edit

add a comment

answered 2023-03-31 20:19:42 +0200

rburing

11094 ●6 ●81 ●223 https://www.rburing.nl/

Probably this can only be done by defining some auxiliary function(s), or importing such functions.

My one-line solution using the (standard) itertools library:

sage: import itertools
sage: reduce(lambda z, w: w, itertools.takewhile(lambda x: x[1] <= 0.4, itertools.accumulate(reversed(D), lambda a, b: [a[0] + b[0], a[1] + b[1]])))
[16800.0000000000, 0.333200000000000]

Here,

reversed(D) allows iterating over the list in reverse,
itertools.accumulate allows accumulating a running total,
itertools.takewhile allows stopping when the accumulated percentage reaches some value,
reduce is used with lambda z, w: w as a trick to get the last element.

edit flag offensive delete link

add a comment

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Conditional sum

2 Answers

Comments

Your Answer

Question Tools

Stats

Related questions

Conditional sum edit

2 Answers

Comments

Your Answer

Question Tools

Stats

Related questions

Conditional sum