{"id":13,"date":"2014-11-01T03:55:07","date_gmt":"2014-11-01T03:55:07","guid":{"rendered":"http:\/\/commoncents.blogwyrm.com\/?p=13"},"modified":"2023-03-26T12:20:44","modified_gmt":"2023-03-26T16:20:44","slug":"the-skinny-on-simpsons-paradox-2","status":"publish","type":"post","link":"https:\/\/commoncents.blogwyrm.com\/?p=13","title":{"rendered":"The Skinny on Simpson\u2019s Paradox"},"content":{"rendered":"<p>Some much of the narrative that is offered on the economy is built on statistics.\u00a0 And as often quoted there are lies, damn lies, and statistics.\u00a0 One particularly annoying set of statistics rests on combining individual statistics by joining together (aggregating) statistics to tell a story that they don\u2019t tell on their own.\u00a0 This is at the heart of Simpson\u2019s Paradox.<\/p>\n<p>To illustrate the paradox consider a two demographic groups labeled \u2018A\u2019 and \u2018B\u2019.\u00a0 Each is trying for a position at a large corporation \u2018U\u2019 with many divisions or departments.\u00a0 Suppose that the hiring percentage for each group at the company is:<\/p>\n<table style = \"border-style:none !important; width: 20%;   margin-left: auto; margin-right: auto;\">\n<tr>\n<th style = \"background-color: #ffffff !important; width: 20% !important; border-style: none !important;\"><\/th>\n<th style = \"background-color: #f5f5dc !important; width: 20% !important; border-style: solid !important;\">U<\/td>\n<\/th>\n<tbody>\n<tr>\n<td>A<\/td>\n<td>50.0%<\/td>\n<\/tr>\n<tr>\n<td>B<\/td>\n<td>40.0%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Can we conclude that the company discriminates against group \u2018B\u2019 in favor of group \u2018A\u2019?\u00a0 At first glance, one may be inclined to say that \u2018U\u2019 clearly favors \u2018A\u2019 over \u2018B\u2019 and maybe has violated equal opportunity laws and is being unethical and unfair.<\/p>\n<p>But suppose that we actually drill down to examine the hiring by division and that, for simplicity, \u2018U\u2019 is made of two divisions \u2018S\u2019 and \u2018H\u2019.\u00a0 Also suppose that, upon request, the hiring percentages for the two divisions are:<\/p>\n<table style = \"border-style:none !important; width: 20%;   margin-left: auto; margin-right: auto;\">\n<tr>\n<th style = \"background-color: #ffffff !important; width: 20% !important; border-style: none !important;\"><\/th>\n<th style = \"background-color: #f5f5dc !important; width: 20% !important; border-style: solid !important;\">S<\/td>\n<\/th>\n<tbody>\n<tr>\n<td>A<\/td>\n<td>62.5%<\/td>\n<\/tr>\n<tr>\n<td>B<\/td>\n<td>100.0%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>and<\/p>\n<table style = \"border-style:none !important; width: 20%;   margin-left: auto; margin-right: auto;\">\n<tr>\n<th style = \"background-color: #ffffff !important; width: 20% !important; border-style: none !important;\"><\/th>\n<th style = \"background-color: #f5f5dc !important; width: 20% !important; border-style: solid !important;\">H<\/td>\n<\/th>\n<tbody>\n<tr>\n<td>A<\/td>\n<td>0.0%<\/td>\n<\/tr>\n<tr>\n<td>B<\/td>\n<td>25.0%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>At this point, we may be tempted to say that the company \u2018U\u2019 has cooked the books.\u00a0 But a simple table shows that the statistics presented above can be understood very easily.\u00a0 Again for simplicity, assume that 10 members of \u2018A\u2019 and 10 of \u2018B\u2019 apply for jobs but that 8 members of \u2018A\u2019 apply to \u2018S\u2019 and 2 to \u2018H\u2019 while the reverse is true for group \u2018B\u2019.<\/p>\n<table style = \"border-style:none !important; width: 20%;   margin-left: auto; margin-right: auto;\">\n<tbody>\n<tr>\n<th style = \"background-color: #ffffff !important; width: 20% !important; border-style: none !important;\"><\/th>\n<th style = \"background-color: #f5f5dc !important; width: 20% !important; border-style: solid !important;\">S<\/th>\n<th style = \"background-color: #f5f5dc !important; width: 20% !important; border-style: solid !important;\">H<\/th>\n<th style = \"background-color: #f5f5dc !important; width: 20% !important; border-style: solid !important;\">U<\/td>\n<\/tr>\n<tr>\n<td rowspan=\"3\" width=\"31\" style = \"vertical-align : middle;\">A<\/td>\n<td width=\"78\">8<\/td>\n<td width=\"96\">2<\/td>\n<td width=\"60\">10<\/td>\n<\/tr>\n<tr>\n<td width=\"78\">5<\/td>\n<td width=\"96\">0<\/td>\n<td width=\"60\">5<\/td>\n<\/tr>\n<tr>\n<td width=\"78\">62.5%<\/td>\n<td width=\"96\">0%<\/td>\n<td width=\"60\">50%<\/td>\n<\/tr>\n<tr>\n<td rowspan=\"3\" width=\"31\" style = \"vertical-align : middle;\">B<\/td>\n<td width=\"78\">2<\/td>\n<td width=\"96\">8<\/td>\n<td width=\"60\">10<\/td>\n<\/tr>\n<tr>\n<td width=\"78\">2<\/td>\n<td width=\"96\">2<\/td>\n<td width=\"60\">4<\/td>\n<\/tr>\n<tr>\n<td width=\"78\">100%<\/td>\n<td width=\"96\">25%<\/td>\n<td width=\"60\">40%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Note that by combining the statistics for \u2018S\u2019 and \u2018H\u2019 into one whole under \u2018U\u2019 the combine statistic tells a much different story than is told by tracking the two divisions separately.<\/p>\n<p>The situation becomes more interesting when salary is factored into the analysis.\u00a0 Suppose that each member of \u2018A\u2019 is paid on average $100K for his position in \u2018S\u2019 and that each member of \u2018B\u2019 is paid on average $125K for his position in \u2018S\u2019 and $60K for his position in \u2018H\u2019.\u00a0 Members of \u2018B\u2019 seem to be doing quite well.\u00a0 But when the statistics are combined into one roll-up, one would conclude that \u2018B\u2019s are paid only 92 cents for every dollar that an \u2018A\u2019 makes.<\/p>\n<table style = \"border-style:none !important; width: 20%;   margin-left: auto; margin-right: auto;\">\n<tbody>\n<tr>\n<th style = \"background-color: #ffffff !important; width: 20% !important; border-style: none !important;\"><\/th>\n<th style = \"background-color: #f5f5dc !important; width: 20% !important; border-style: soild !important;\">S<\/th>\n<th style = \"background-color: #f5f5dc !important; width: 20% !important; border-style: solid !important;\">H<\/th>\n<th style = \"background-color: #f5f5dc !important; width: 20% !important; border-style: solid !important;\">Ave<\/th>\n<\/tr>\n<tr>\n<td rowspan=\"2\" width=\"31\" style = \"vertical-align : middle;\">A<\/td>\n<td width=\"78\">5<\/td>\n<td width=\"96\">0<\/td>\n<td rowspan=\"2\" width=\"60\" style = \"vertical-align : middle;\">$100K<\/td>\n<\/tr>\n<tr>\n<td width=\"78\">$500K<\/td>\n<td width=\"96\">$0<\/td>\n<\/tr>\n<tr>\n<td rowspan=\"2\" width=\"31\" style = \"vertical-align : middle;\">B<\/td>\n<td width=\"78\">2<\/td>\n<td width=\"96\">2<\/td>\n<td rowspan=\"2\" width=\"60\" style = \"vertical-align : middle;\">$92.5K<\/td>\n<\/tr>\n<tr>\n<td width=\"78\">$250K<\/td>\n<td width=\"96\">$120K<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Okay, one may be willing to concede that the combined statistic doesn\u2019t tell the whole story but one may object that there is still unfairness in the system.\u00a0 After all only 4 members of \u2018B\u2019 have been employed whereas 5 of \u2018A\u2019 have been.\u00a0 This objection can also be addressed by considering the simple modification of the results shown above.<\/p>\n<table style = \"border-style:none !important; width: 20%;   margin-left: auto; margin-right: auto;\">\n<tbody>\n<tr>\n<th style = \"background-color: #ffffff !important; width: 20% !important; border-style: none  !important;\"> <\/th>\n<th style = \"background-color: #f5f5dc !important; width: 20% !important; border-style: soild !important;\">S<\/th>\n<th style = \"background-color: #f5f5dc !important; width: 20% !important; border-style: soild !important;\">H<\/th>\n<th style = \"background-color: #f5f5dc !important; width: 20% !important; border-style: soild !important;\">U<\/th>\n<\/tr>\n<tr>\n<td rowspan=\"3\" width=\"31\" style = \"vertical-align : middle;\">A<\/td>\n<td width=\"78\">8<\/td>\n<td width=\"96\">2<\/td>\n<td width=\"60\">10<\/td>\n<\/tr>\n<tr>\n<td width=\"78\">5<\/td>\n<td width=\"96\">0<\/td>\n<td width=\"60\">5<\/td>\n<\/tr>\n<tr>\n<td width=\"78\">62.5%<\/td>\n<td width=\"96\">0%<\/td>\n<td width=\"60\">50%<\/td>\n<\/tr>\n<tr>\n<td rowspan=\"3\" width=\"31\" style = \"vertical-align : middle;\">B<\/td>\n<td width=\"78\">2<\/td>\n<td width=\"96\">8<\/td>\n<td width=\"60\">10<\/td>\n<\/tr>\n<tr>\n<td width=\"78\">2<\/td>\n<td width=\"96\">4<\/td>\n<td width=\"60\">6<\/td>\n<\/tr>\n<tr>\n<td width=\"78\">100%<\/td>\n<td width=\"96\">25%<\/td>\n<td width=\"60\">60%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Now \u2018B\u2019 clearly has the upper hand in employment not just at the division level but at the corporate one as well.\u00a0 But if the same average salaries are used ($100K and $125K for \u2018A\u2019 and \u2018B\u2019 in \u2018S\u2019 and $60K for \u2018B\u2019 in \u2018H\u2019) and then all the statistics are combined into on measure, the story told is that members of \u2018A\u2019 are paid on average more than those in \u2018B\u2019.\u00a0 In fact the margin between the average pay of \u2018A\u2019 and that of \u2018B\u2019 is now larger, even though more members of \u2018B\u2019 are now employed.<\/p>\n<table style = \"border-style:none !important; width: 20%;   margin-left: auto; margin-right: auto;\">\n<tbody>\n<tr>\n<th style = \"background-color: #ffffff !important; width: 20% !important; border-style: none  !important;\"><\/th>\n<th style = \"background-color: #f5f5dc !important; width: 20% !important; border-style: soild !important;\">S<\/th>\n<th style = \"background-color: #f5f5dc !important; width: 20% !important; border-style: soild !important;\">H<\/th>\n<th style = \"background-color: #f5f5dc !important; width: 20% !important; border-style: soild !important;\">Ave<\/th>\n<\/tr>\n<tr>\n<td rowspan=\"2\" width=\"31\" style = \"vertical-align: middle;\">A<\/td>\n<td width=\"78\">5<\/td>\n<td width=\"96\">0<\/td>\n<td rowspan=\"2\" width=\"60\" style = \"vertical-align : middle;\">$100K<\/td>\n<\/tr>\n<tr>\n<td width=\"78\">$500K<\/td>\n<td width=\"96\">$0<\/td>\n<\/tr>\n<tr>\n<td rowspan=\"2\" width=\"31\" style = \"vertical-align : middle;\">B<\/td>\n<td width=\"78\">2<\/td>\n<td width=\"96\">4<\/td>\n<td rowspan=\"2\" width=\"60\" style = \"vertical-align : middle;\">$81.7K<\/td>\n<\/tr>\n<tr>\n<td width=\"78\">$250K<\/td>\n<td width=\"96\">$240K<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>This is the heart of Simpson\u2019s paradox.\u00a0 What is not being accounted for is the reasons for why members of \u2018B\u2019 preferentially apply for employment in the lower paying jobs in division \u2018H\u2019 rather than for the higher paying jobs in \u2018S\u2019.<\/p>\n<p>By now it should be clear that this situation has real world applications.\u00a0 The most famous example of this type of situation that has worked its way through the courts is the case of the Berkeley gender bias case.<\/p>\n<p>Other examples are the oft-quoted statistic that women make 77 cents for every dollar a man makes.\u00a0 This statistic can be quite true and yet be quite misleading.\u00a0 The common interpretation that women are being widely discriminated against is not supported by that statistic.\u00a0 There are surely pockets of discrimination out there but more likely explanations are that women preferentially enter different fields (or that they interrupt their working years for various reasons, such as raising a family, which being a personal choice and one which I wish I could have pursued, is not addressed here).<\/p>\n<p>If society really wants women to make on average the same as men, then steps should be made to address why so few women, comparatively speaking, enter high-paying STEM jobs.\u00a0 This is where our focus should be and not on trying to fix what is mostly an imaginary problem caused by Mr Simpson and his paradox.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Some much of the narrative that is offered on the economy is built on statistics.\u00a0 And as often quoted there are lies, damn lies, and statistics.\u00a0 One particularly annoying set&#8230; <a class=\"read-more-button\" href=\"https:\/\/commoncents.blogwyrm.com\/?p=13\">Read more &gt;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-13","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/commoncents.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/13","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/commoncents.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/commoncents.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/commoncents.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/commoncents.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=13"}],"version-history":[{"count":7,"href":"https:\/\/commoncents.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/13\/revisions"}],"predecessor-version":[{"id":1078,"href":"https:\/\/commoncents.blogwyrm.com\/index.php?rest_route=\/wp\/v2\/posts\/13\/revisions\/1078"}],"wp:attachment":[{"href":"https:\/\/commoncents.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=13"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/commoncents.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=13"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/commoncents.blogwyrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=13"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}