Nassim Taleb On the Super-Additivity and Estimation Biases of Quantile Contributions
New York University-Poly School of Engineering
Riskdata; CES Univ. Paris
Sample measures of top centile contributions to the total (concentration) are downward biased, unstable estimators, extremely sensitive to sample size and concave in accounting for large deviations. It makes them particularly unfit in domains with power law tails, especially for low values of the exponent. These estimators can vary over time and increase with the population size, as shown in this article, thus providing the illusion of structural changes in concentration. They are also inconsistent under aggregation and mixing distributions, as the weighted average of concentration measures for A and B will tend to be lower than that from A U B. In addition, it can be shown that under such fat tails, increases in the total sum need to be accompanied by increased sample size of the concentration measurement. We examine the estimation superadditivity and bias under homogeneous and mixed distributions.
Nassim Taleb On the Super-Additivity and Estimation Biases of Quantile Contributions – Introduction
Vilfredo Pareto noticed that 80% of the land in Italy belonged to 20% of the population, and vice-versa, thus both giving birth to the power law class of distributions and the popular saying 80/20. The self-similarity at the core of the property of power laws  and  allows us to recurse and reapply the 80/20 to the remaining 20%, and so forth until one obtains the result that the top percent of the population will own about 53% of the total wealth.
It looks like such a measure of concentration can be seriously biased, depending on how it is measured, so it is very likely that the true ratio of concentration of what Pareto observed, that is, the share of the top percentile, was closer to 70%, hence changes year-on-year would drift higher to converge to such a level from larger sample. In fact, as we will show in this discussion, for, say wealth, more complete samples resulting from technological progress, and also larger population and economic growth will make such a measure converge by increasing over time, for no other reason than expansion in sample space or aggregate value.
The core of the problem is that, for the class one-tailed fat-tailed random variables, that is, bounded on the left and unbounded on the right, where the random variable X 2 [xmin;1), the in-sample quantile contribution is a biased estimator of the true value of the actual quantile contribution.
Let us define the quantile contribution
We shall see that the observed variable bq is a downward biased estimator of the true ratio q, the one that would hold out of sample, and such bias is in proportion to the fatness of tails and, for very fat tailed distributions, remains significant, even for very large samples.
II. Estimation For Unmixed Pareto-tailed Distributions
Let X be a random variable belonging to the class of distributions with a “power law” right tail, that is:
A. Bias and Convergence
Table I shows the bias of as an estimator of Kq in the case of an -Pareto distribution for = 1:1, a value chosen to be compatible with practical economic measures, such as the wealth distribution in the world or in a particular country, including developped ones.1 In such a case, the estimator is extemely sensitive to “small” samples, “small” meaning inpractice 108.We ran up to a trillion simulations across varieties of sample sizes. While 0:01 0:657933, even a sample size of 100 million remains severely biased as seen in the table.
Naturally the bias is rapidly (and nonlinearly) reduced for further away from 1, and becomes weak in the neighborhood of 2 for a constant , though not under a mixture distribution for , as we shall se later. It is also weaker outside the top 1% centile, hence this discussion focuses on the famed “one percent” and on low values of the exponent.
See full PDF here.