The Problem with Averages and Medians
27 June 2017 | Pago Pago, American Samoa
Assume we have a country with 5 people in it. We want to look at income. The person with the least has $1/year, the next up has $2/ year, the 3rd up has $3/year, the 4th, $4/year and the 5th, $5,000,000/ year.
Now, the average income for the country is $1,000,001 and we think, wow, this country is really well off. We are wrong. One person is really well off and everyone else is doing badly. You see this kind of game played in a 100 charts and narratives a day. Beware.
Well, let's use a median instead. The median is the center when ranked from highest to lowest. The median here is $3/year. That is a better descriptor, but still not good. It does not disclose one guy has an income of $5,000,000 or that one only has an income of $1. There is no mode, the income figure which appears most frequently.
A much better approach than averages or medians is to use quintile distributions and then describe the median and average in each quintile. Here that is --
bottom quintile -- $1 = median and average
second down -- $2 = median and average
middle quintile -- $3 = median and average
fourth quintile -- $4 = median and average
top quintile -- $5,000,000 = median and average
(Alternately. the variance and mean in each quintile could be provided but most would not understand that.)
Bogus averages are slipped to us every day. Often we are not even clearly told they are averages. Just per capita income is $1,000,001 or income is $1,000,001. We get sucker punched all the time.
Then, in addition there is the core problem of graphical presentation. The ordinate or vertical axis can have its scale shrunk or elongated hugely in either direction to make difference look tiny or huge. This is another common game.
We are beset with these problems every day and too often taken in to a point of view unaware. And this is just with descriptive statistics, much less inferential statistics which is much more complicated.