Ask Your Question
1

Conditional sum

asked 2023-03-31 19:12:12 +0100

Cyrille gravatar image

I have this data [income, percentage]

D=[[1300.0, 0.0476],
 [1350.0, 0.142],
 [1500.0, 0.142],
 [1600.0, 0.0476],
 [1700.0, 0.0476],
 [1800.0, 0.0476],
 [1820.0, 0.0476],
 [1900.0, 0.0476],
 [2000.0, 0.0952],
 [2400.0, 0.0952],
 [4500.0, 0.0476],
 [4900.0, 0.0952],
 [5000.0, 0.0952]]

I would like to calculate the sum of the incomes conditional to the fact that the cumulative percentages is $\leq 40\%$ and the same thing for $\leq 10\%$ starting from the end of the list.

Of course it's not a too complex task but I would like to know if we can do that inside a conditional sum that is a sum conditionned by an an other. I have something like

sum(x[0] for x in D while sum(x[1]  for x in D) <= 0.4)

which for obvious reasons cannot work.

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted
0

answered 2023-03-31 21:51:11 +0100

Emmanuel Charpentier gravatar image

updated 2023-04-04 10:42:17 +0100

What you say you want to compute can be done as :

sage: MD=matrix(D)
# R's cumsum is a swell way to compute this "cumulative sum", a nice alternative to itertools.accumulate...
sage: MD=MD.augment(vector(r.cumsum(MD.column(1).list()).sage()))
sage: sum([MD.column(0)[u]*(MD.column(2)[u]<=0.4) for u in range(MD.nrows())])
5750.00000000000

But I think that's not what your'e aiming at, which seems to be the 0.4th quantile of the income distribution. Which is about 1650 by ophthalmic interpolation, not what you compute.

Which I leave to you for now, as an exercise. Yell for help if necessary.

Hint: R and R libraries have tons of utilities for working with distributions and their cumulatives... and the semantics of R boolean indexing are quite helphul to roll your own.

A Sage alternative :

sage: g = spline(zip(MD.column(2), MD.column(0)))
sage: g(0.4)
1642.4180721758974
sage: g(0.9)
4917.845951606835

Explanations on request...

HTH,

EDIT : Since Sagemath includes Scipy, its interpolation functions are a good resource to solve your problem...

edit flag offensive delete link more

Comments

Emmanuel thanks. You often tell me to learn Python. I have decided to program somes statistical formula I can find easily in some softwares. Other are not.

Cyrille gravatar imageCyrille ( 2023-04-03 16:04:57 +0100 )edit

Emmanuel I need An explanation. As I read it spline need as input a list composed of couples of the type (c,y) . But when when it is called you used only one coordinate. Hopw such a function can be writen ?

Cyrille gravatar imageCyrille ( 2023-04-03 16:39:14 +0100 )edit

Supposing x and y being lists (or other iterables) of the same length,

  • zip(x, y) returns a generator of two-elements tuples composed of correspondingx and y elements.

  • spline(zip(x, y))returns an interpolation function of these correspondances by piecewise cubic polynomials, minimizing

    • the sum of squares between the function and the orignal points, and

    • the total curvature (integral of squared second derivatives) of the curve.

Try this :

sage: set_random_seed(1789)
sage: M=matrix(RDF, 10,1, range(10))
sage: M=M.augment(vector(map(lambda u:u^2/5-u+0.5-random(), M.column(0))))
sage: points(M)+plot(g, (0, 9))

and explore the resulting objects...

There are many types of splines ; R has implementations for a bunch of them.

HTH,

Emmanuel Charpentier gravatar imageEmmanuel Charpentier ( 2023-04-03 17:39:37 +0100 )edit

The spline interpolation may give unreasonable values at points of strong curvature variation ; you may prefer a piecewise interpolation, such as the one given bu R's approxfun function. Compare :

sage: rf=r.approxfun(MD.column(2).list(), MD.column(0).list())
sage: f=lambda x:rf(x).sage()
sage: rf=r.approxfun(MD.column(2).list(), MD.column(0).list())
sage: f=lambda x:rf(x).sage()
sage: plot([f, g], (0, 1), legend_label=["linear", "spline"])
Emmanuel Charpentier gravatar imageEmmanuel Charpentier ( 2023-04-04 09:36:22 +0100 )edit
0

answered 2023-03-31 20:19:42 +0100

rburing gravatar image

Probably this can only be done by defining some auxiliary function(s), or importing such functions.

My one-line solution using the (standard) itertools library:

sage: import itertools
sage: reduce(lambda z, w: w, itertools.takewhile(lambda x: x[1] <= 0.4, itertools.accumulate(reversed(D), lambda a, b: [a[0] + b[0], a[1] + b[1]])))
[16800.0000000000, 0.333200000000000]

Here,

  • reversed(D) allows iterating over the list in reverse,
  • itertools.accumulate allows accumulating a running total,
  • itertools.takewhile allows stopping when the accumulated percentage reaches some value,
  • reduce is used with lambda z, w: w as a trick to get the last element.
edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2023-03-31 19:12:12 +0100

Seen: 301 times

Last updated: Apr 04 '23