The Skinny on Simpson’s Paradox

Some much of the narrative that is offered on the economy is built on statistics.  And as often quoted there are lies, damn lies, and statistics.  One particularly annoying set of statistics rests on combining individual statistics by joining together (aggregating) statistics to tell a story that they don’t tell on their own.  This is at the heart of Simpson’s Paradox.

To illustrate the paradox consider a two demographic groups labeled ‘A’ and ‘B’.  Each is trying for a position at a large corporation ‘U’ with many divisions or departments.  Suppose that the hiring percentage for each group at the company is:

U
A 50.0%
B 40.0%

Can we conclude that the company discriminates against group ‘B’ in favor of group ‘A’?  At first glance, one may be inclined to say that ‘U’ clearly favors ‘A’ over ‘B’ and maybe has violated equal opportunity laws and is being unethical and unfair.

But suppose that we actually drill down to examine the hiring by division and that, for simplicity, ‘U’ is made of two divisions ‘S’ and ‘H’.  Also suppose that, upon request, the hiring percentages for the two divisions are:

S
A 62.5%
B 100.0%

and

H
A 0.0%
B 25.0%

At this point, we may be tempted to say that the company ‘U’ has cooked the books.  But a simple table shows that the statistics presented above can be understood very easily.  Again for simplicity, assume that 10 members of ‘A’ and 10 of ‘B’ apply for jobs but that 8 members of ‘A’ apply to ‘S’ and 2 to ‘H’ while the reverse is true for group ‘B’.

S H U
A 8 2 10
5 0 5
62.5% 0% 50%
B 2 8 10
2 2 4
100% 25% 40%

Note that by combining the statistics for ‘S’ and ‘H’ into one whole under ‘U’ the combine statistic tells a much different story than is told by tracking the two divisions separately.

The situation becomes more interesting when salary is factored into the analysis.  Suppose that each member of ‘A’ is paid on average $100K for his position in ‘S’ and that each member of ‘B’ is paid on average $125K for his position in ‘S’ and $60K for his position in ‘H’.  Members of ‘B’ seem to be doing quite well.  But when the statistics are combined into one roll-up, one would conclude that ‘B’s are paid only 92 cents for every dollar that an ‘A’ makes.

S H Ave
A 5 0 $100K
$500K $0
B 2 2 $92.5K
$250K $120K

Okay, one may be willing to concede that the combined statistic doesn’t tell the whole story but one may object that there is still unfairness in the system.  After all only 4 members of ‘B’ have been employed whereas 5 of ‘A’ have been.  This objection can also be addressed by considering the simple modification of the results shown above.

S H U
A 8 2 10
5 0 5
62.5% 0% 50%
B 2 8 10
2 4 6
100% 25% 60%

Now ‘B’ clearly has the upper hand in employment not just at the division level but at the corporate one as well.  But if the same average salaries are used ($100K and $125K for ‘A’ and ‘B’ in ‘S’ and $60K for ‘B’ in ‘H’) and then all the statistics are combined into on measure, the story told is that members of ‘A’ are paid on average more than those in ‘B’.  In fact the margin between the average pay of ‘A’ and that of ‘B’ is now larger, even though more members of ‘B’ are now employed.

S H Ave
A 5 0 $100K
$500K $0
B 2 4 $81.7K
$250K $240K

This is the heart of Simpson’s paradox.  What is not being accounted for is the reasons for why members of ‘B’ preferentially apply for employment in the lower paying jobs in division ‘H’ rather than for the higher paying jobs in ‘S’.

By now it should be clear that this situation has real world applications.  The most famous example of this type of situation that has worked its way through the courts is the case of the Berkeley gender bias case.

Other examples are the oft-quoted statistic that women make 77 cents for every dollar a man makes.  This statistic can be quite true and yet be quite misleading.  The common interpretation that women are being widely discriminated against is not supported by that statistic.  There are surely pockets of discrimination out there but more likely explanations are that women preferentially enter different fields (or that they interrupt their working years for various reasons, such as raising a family, which being a personal choice and one which I wish I could have pursued, is not addressed here).

If society really wants women to make on average the same as men, then steps should be made to address why so few women, comparatively speaking, enter high-paying STEM jobs.  This is where our focus should be and not on trying to fix what is mostly an imaginary problem caused by Mr Simpson and his paradox.

Leave a Comment

Your email address will not be published.