Splitting the Gini index in two: the Pareto-power-law interpolation Lorenz curve

Robert T Jantzen and Klaus Volpert, Dept of Mathematics and Statistics, Villanova University, Villanova, PA 19085 USA

A two-parameter model for the Lorenz curve describing income distribution interpolating between self-similar behavior at the low (power law Lorenz) and high (Pareto Lorenz) ends of the income spectrum naturally leads to two separate Gini indices describing the low (G0) and high (G1) ends individually. These new indices accurately capture realistic data on income distribution and give a better picture of how income data is shifting over time than the Gini index (G) alone.

Here are the three indices for US income data 1967-2010, showing that the low end stagnated while the high end squeezed upwards as the overall inequality measured by the Gini index G grew..


In layman's terms, the shape of the income distribution at the low end measured by G0 has not really changed, while the "super rich" at the very high end measured by G1 have been grabbing more and more for fewer and fewer. The real losers are the middle class in between.  
[See Robert Reich's economic common sense movie Inequality for All from his book Beyond Outrage: What has gone wrong with our economy and our democracy and how to fix it.]

2014 update. This has come to be known as fractal inequality by some pundits:

2016 update. Here is the data and Maple worksheet that produces the above graphic:
   executed Maple worksheet gini-data-import.mw, unexecuted Maple worksheet with data files: gini-data-import.zip.
   Maybe we will update this with a few more years of data.

See also

The Mathematics of Income Distribution

Modeling income distribution with this two parameter family of functions, whose parameters have a direct geometrical interpretation.

The Pareto-power-law interpolation Lorenz curve

 G versus G0 and G1 in the unit cube

Article and Maple worksheets:

Lorenz curve and the Gini index

The Gini index (coefficient) is a measure of income inequality in a society. [See http://en.wikipedia.org/wiki/Gini_coefficient for its history and use, as well as http://en.wikipedia.org/wiki/Lorenz_curve.] It is based on the Lorenz curve, which plots the percentage of the total income of a population (y axis) that is cumulatively earned by the bottom x% of the population in income levels. Instead of using percentages, we can use the corresponding proper fractions, so that its graph becomes a curve from the origin (0,0) to the point (1,1) in the unit square representing the two anchoring points: none of the people have none of the wealth, while all of the people have all of the wealth. [Mathematicians like the interval (0,1) much better than (0,100)!]

If everyone had the same income then 40% of the population would have 40% of the total income, for example, and the Lorenz curve would lead to the graph y = x , a straight line connecting the above endpoints, with this data point corresponding to the point (0.4,0.4) on the graph using proper fractions instead of percentages. Thus income equality corresponds to the graph y = x , a straight line connecting the endpoints (0,0) and (1,1), while inequality necessarily pushes the curve down from this straight line towards the axis but always passing through the same endpoints. This corresponds to lower income levels at the bottom amounting to a lower total fraction of the income of the whole population. Extreme income inequality concentrates the largest possible values of y near the upper end of the interval at x = 1 , pulling the Lorenz curve down from the line y = x of "perfect income equality". In fact the limiting case of a curve for which y is nearly zero all the way to the right endpoint where it suddenly rises to 1 leads to the bottom edge y = 0 joined to the right edge x = 1 of the unit square being referred to as the "line of perfect inequality".

 The ratio of the pink area to the area of the whole triangle is the Gini index.


Two simple Lorenz curves are the power function curves  y = xp where p > 1 (with Gini index G = (p-1)/(p+1) ), and the Pareto curves  y = 1 - (1 - x)q where 0 < 1 (with Gini index G = (1-q)/(1+q) .

Symmetry of the power and Pareto curves

The power function Lorenz curve and the Pareto Lorentz curve with the reciprocal power are related by a simple reflection of the unit square into itself across the diagonal y = 1 - x, namely: (x,y) → (y,x) → (1-y, 1-x)


This reflection about the line y = x can be accomplished in two successive steps:
             (x, y) (y, x) (1 - y, 1 - x) .
In other words first interchange x and y (well known to be a reflection about the line y = x from inverting functions in calculus), then let the updated values of x go to 1 - x and y go to 1 - y (which are reflections across the vertical line x = 1/2 and across the horizontal line y = 1/2 respectively). Under the interchange of x and y, the power function (with exponent p > 1 ) goes to its inverse, the power function with the reciprocal power q = 1/p <1:

y = xp  →  x = yp   ↔  y = x1/p  → 1 - y = (1 - x)1/p   y = 1 - (1 - x)1/p .

Substituting q = 1/p into the Gini index for the Pareto curve thus gives the Gini index for the corresponding power curve.

Thus all the self-similar behavior of the power functions at the origin is repeated for the Pareto functions at the opposite extreme of the unit square. Multiplying a power function by a Pareto function gives a 2-parameter family of Lorenz curves with the same asymptotic self-similarity of the power and Pareto functions at the endpoints. Note that although there is this formal symmetry relating the power and Pareto functions at the high and low ends, the fundamental difference between the variables on the horizontal and vertical axes breaks the symmetry as far as interpretation goes.

Self-similarity at the origin for the power Lorenz curve


If we cut off the income at the value x = X, and consider the restricted income problem for only those incomes below X, then the fraction of the area below the restricted line of equality y = Xp (x/X) and above the Lorenz curve (red area)  is just 1 minus the fraction below the Lorenz curve (yellow region)


This is independent of X and equals the Gini index for the whole interval. This limiting behavior will be true for any Lorenz function which has a well-defined constant limit for L(x)/ xp for some p. This is true of the Pareto-power-law interpolation function. Because the Pareto function is the reflection of the power function across the diagonal, it has the same self-similarity property at the right endpoint, and the Pareto-power-law interpolation function inherits that asymptotic property.

One can use the limiting left Gini index defined by the limit of this expression with xp replaced by any Lorenz function L(x) to define a parameter G0 . Similarly one can extend the Gini parameter G1 to any Lorenz curve. For any differentiable monotonically increasing Lorenz function, the constraint that it lie between y = 0 and y = x forces it to behave like a constant times a power of x in the limit approaching  x = 0, so such a limit G0 must exist. A similar argument reflected across the diagonal shows that a right Gini limit G1  also must exist under such assumptions, omitting differentiability at the right endpoint of course. What is interesting is that somehow these two limiting behaviors seem to describe so well the interpolated behavior between the endpoints for actual data.


The right slider Gini index (high end income distribution), whose limit is the Gini index of the Pareto exponent of the product power-law-Pareto Lorenz curve. Similarly the left slider Gini index has as its limit the Gini index of the power law exponent.