# Conditional sum

I have this data [income, percentage]

D=[[1300.0, 0.0476],
[1350.0, 0.142],
[1500.0, 0.142],
[1600.0, 0.0476],
[1700.0, 0.0476],
[1800.0, 0.0476],
[1820.0, 0.0476],
[1900.0, 0.0476],
[2000.0, 0.0952],
[2400.0, 0.0952],
[4500.0, 0.0476],
[4900.0, 0.0952],
[5000.0, 0.0952]]


I would like to calculate the sum of the incomes conditional to the fact that the cumulative percentages is $\leq 40\%$ and the same thing for $\leq 10\%$ starting from the end of the list.

Of course it's not a too complex task but I would like to know if we can do that inside a conditional sum that is a sum conditionned by an an other. I have something like

sum(x[0] for x in D while sum(x[1]  for x in D) <= 0.4)


which for obvious reasons cannot work.

edit retag close merge delete

Sort by ยป oldest newest most voted

What you say you want to compute can be done as :

sage: MD=matrix(D)
# R's cumsum is a swell way to compute this "cumulative sum", a nice alternative to itertools.accumulate...
sage: MD=MD.augment(vector(r.cumsum(MD.column(1).list()).sage()))
sage: sum([MD.column(0)[u]*(MD.column(2)[u]<=0.4) for u in range(MD.nrows())])
5750.00000000000


But I think that's not what your'e aiming at, which seems to be the 0.4th quantile of the income distribution. Which is about 1650 by ophthalmic interpolation, not what you compute.

Which I leave to you for now, as an exercise. Yell for help if necessary.

Hint: R and R libraries have tons of utilities for working with distributions and their cumulatives... and the semantics of R boolean indexing are quite helphul to roll your own.

A Sage alternative :

sage: g = spline(zip(MD.column(2), MD.column(0)))
sage: g(0.4)
1642.4180721758974
sage: g(0.9)
4917.845951606835


Explanations on request...

HTH,

EDIT : Since Sagemath includes Scipy, its interpolation functions are a good resource to solve your problem...

more

Emmanuel thanks. You often tell me to learn Python. I have decided to program somes statistical formula I can find easily in some softwares. Other are not.

( 2023-04-03 16:04:57 +0100 )edit

Emmanuel I need An explanation. As I read it spline need as input a list composed of couples of the type (c,y) . But when when it is called you used only one coordinate. Hopw such a function can be writen ?

( 2023-04-03 16:39:14 +0100 )edit

Supposing x and y being lists (or other iterables) of the same length,

• zip(x, y) returns a generator of two-elements tuples composed of correspondingx and y elements.

• spline(zip(x, y))returns an interpolation function of these correspondances by piecewise cubic polynomials, minimizing

• the sum of squares between the function and the orignal points, and

• the total curvature (integral of squared second derivatives) of the curve.

Try this :

sage: set_random_seed(1789)
sage: M=matrix(RDF, 10,1, range(10))
sage: M=M.augment(vector(map(lambda u:u^2/5-u+0.5-random(), M.column(0))))
sage: points(M)+plot(g, (0, 9))


and explore the resulting objects...

There are many types of splines ; R has implementations for a bunch of them.

HTH,

( 2023-04-03 17:39:37 +0100 )edit

The spline interpolation may give unreasonable values at points of strong curvature variation ; you may prefer a piecewise interpolation, such as the one given bu R's approxfun function. Compare :

sage: rf=r.approxfun(MD.column(2).list(), MD.column(0).list())
sage: f=lambda x:rf(x).sage()
sage: rf=r.approxfun(MD.column(2).list(), MD.column(0).list())
sage: f=lambda x:rf(x).sage()
sage: plot([f, g], (0, 1), legend_label=["linear", "spline"])

( 2023-04-04 09:36:22 +0100 )edit

Probably this can only be done by defining some auxiliary function(s), or importing such functions.

My one-line solution using the (standard) itertools library:

sage: import itertools
sage: reduce(lambda z, w: w, itertools.takewhile(lambda x: x[1] <= 0.4, itertools.accumulate(reversed(D), lambda a, b: [a[0] + b[0], a[1] + b[1]])))
[16800.0000000000, 0.333200000000000]


Here,

• reversed(D) allows iterating over the list in reverse,
• itertools.accumulate allows accumulating a running total,
• itertools.takewhile allows stopping when the accumulated percentage reaches some value,
• reduce is used with lambda z, w: w as a trick to get the last element.
more