{"id":403,"date":"2018-11-01T10:00:33","date_gmt":"2018-11-01T10:00:33","guid":{"rendered":"https:\/\/dev.ucomm.ncsu.edu\/web-platform-free-tier\/2018\/11\/01\/building-better-predictions-from-polls\/"},"modified":"2018-11-01T10:00:33","modified_gmt":"2018-11-01T10:00:33","slug":"building-better-predictions-from-polls","status":"publish","type":"post","link":"https:\/\/dev.ucomm.ncsu.edu\/web-platform-free-tier\/2018\/11\/01\/building-better-predictions-from-polls\/","title":{"rendered":"Building Better Predictions from Polls"},"content":{"rendered":"<p>A challenge from his son led NC&#160;State statistician Fred Wright to analyze exactly what went wrong with political polling leading up to the 2016 Presidential election. His analysis of polling methods reveals that inaccurate election predictions weren\u2019t due to shy voters or flawed surveys, but to excessive averaging over multiple polls. Now Wright (and his son) propose a more accurate model.<\/p>\n<p>Polling predictions come in two parts: the prediction itself (in this instance, the overall spread in support between Clinton and Trump), and the confidence in that prediction, usually expressed as a percentage or a probability of winning.<\/p>\n<p>Pollsters arrive at these probabilities by aggregating, or averaging, polls. So at the very basic level, more polls equals more precision. But polling organizations also have to decide how far back to go in the polling data they aggregate. Most sites include poll data from roughly the last month or more. They average the data and then project the winner.<\/p>\n<p>\u201cDespite all the attention to what \u2018went wrong\u2019 in the election prediction, there was little attention to the <em>quantitative<\/em>\u00a0performance of prediction sites like fivethirtyeight or HuffPost,\u201d Wright says. \u201cThe popular press reacted to the fact that many sites had been highly confident of a Clinton victory, but how wrong were they, really? To answer that question, we looked at the state-by-state performance of predictions instead of looking only at the swing states.\u201d<\/p>\n<p>Wright reverse-engineered reported results from the prediction sites to tease out state-specific data and put everything on the same scale, then developed a regression model in which each state\u2019s calculation for candidate support reflected both a national component and a state-specific deviation from that national component.<\/p>\n<p>\u201cThe idea was to be able to view both state-specific information as well as national trends,\u201d Wright says. \u201cDoing so showed that the main thing distinguishing the \u2018good\u2019 from \u2018bad\u2019 prediction sites in 2016 was that on a state-by-state basis, the latter were overconfident, and their models were insufficiently sensitive to a late change before the election.\u201d<\/p>\n<p>Wright\u2019s model also detected the decline in Clinton\u2019s overall support as starting earlier than generally recognized. \u201cSince our model is designed to be more sensitive, it picked up a late, strong decline in support for Clinton that pre-dated the Comey letter,\u201d Wright says. \u201cAccording to our results, had we run our model just prior to the election, it would have given Trump a 47 percent chance of victory.<\/p>\n<p>\u201cOn a related note, our data suggest that polls were not very biased against Trump,\u201d Wright continues. \u201cWe show that the polls may have been roughly correct at the time, but when they were averaged to obtain a consensus, the dropping support for Clinton was inappropriately smoothed over in the final week.\u201d<\/p>\n<p>Wright credits his son and co-author, Alec Wright, an undergraduate at the University of North Carolina at Chapel Hill, with the idea for the research and the resulting model. \u201cThis work started as a challenge from my son, who made some snarky comments about statisticians after the election. I claimed I could do better, and so roped him in to work on this and propose a new model. I think our effort has been successful.\u201d The research appears in <a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0261379417303001\"><em>Electoral Studies<\/em><\/a>.<\/p>\n<p><em>This post was <a href=\"https:\/\/news.ncsu.edu\/2018\/11\/wright-poll\/\">originally published<\/a> in NC&#160;State News.<\/em><\/p>","protected":false,"raw":"A challenge from his son led NC State statistician Fred Wright to analyze exactly what went wrong with political polling leading up to the 2016 Presidential election. His analysis of polling methods reveals that inaccurate election predictions weren\u2019t due to shy voters or flawed surveys, but to excessive averaging over multiple polls. Now Wright (and his son) propose a more accurate model.\r\n\r\nPolling predictions come in two parts: the prediction itself (in this instance, the overall spread in support between Clinton and Trump), and the confidence in that prediction, usually expressed as a percentage or a probability of winning.\r\n\r\nPollsters arrive at these probabilities by aggregating, or averaging, polls. So at the very basic level, more polls equals more precision. But polling organizations also have to decide how far back to go in the polling data they aggregate. Most sites include poll data from roughly the last month or more. They average the data and then project the winner.\r\n\r\n\u201cDespite all the attention to what \u2018went wrong\u2019 in the election prediction, there was little attention to the <em>quantitative<\/em>\u00a0performance of prediction sites like fivethirtyeight or HuffPost,\u201d Wright says. \u201cThe popular press reacted to the fact that many sites had been highly confident of a Clinton victory, but how wrong were they, really? To answer that question, we looked at the state-by-state performance of predictions instead of looking only at the swing states.\u201d\r\n\r\nWright reverse-engineered reported results from the prediction sites to tease out state-specific data and put everything on the same scale, then developed a regression model in which each state\u2019s calculation for candidate support reflected both a national component and a state-specific deviation from that national component.\r\n\r\n\u201cThe idea was to be able to view both state-specific information as well as national trends,\u201d Wright says. \u201cDoing so showed that the main thing distinguishing the \u2018good\u2019 from \u2018bad\u2019 prediction sites in 2016 was that on a state-by-state basis, the latter were overconfident, and their models were insufficiently sensitive to a late change before the election.\u201d\r\n\r\nWright\u2019s model also detected the decline in Clinton\u2019s overall support as starting earlier than generally recognized. \u201cSince our model is designed to be more sensitive, it picked up a late, strong decline in support for Clinton that pre-dated the Comey letter,\u201d Wright says. \u201cAccording to our results, had we run our model just prior to the election, it would have given Trump a 47 percent chance of victory.\r\n\r\n\u201cOn a related note, our data suggest that polls were not very biased against Trump,\u201d Wright continues. \u201cWe show that the polls may have been roughly correct at the time, but when they were averaged to obtain a consensus, the dropping support for Clinton was inappropriately smoothed over in the final week.\u201d\r\n\r\nWright credits his son and co-author, Alec Wright, an undergraduate at the University of North Carolina at Chapel Hill, with the idea for the research and the resulting model. \u201cThis work started as a challenge from my son, who made some snarky comments about statisticians after the election. I claimed I could do better, and so roped him in to work on this and propose a new model. I think our effort has been successful.\u201d The research appears in <a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0261379417303001\"><em>Electoral Studies<\/em><\/a>."},"excerpt":{"rendered":"<p>Inaccurate 2016 election predictions weren\u2019t due to shy voters or flawed surveys, but to excessive averaging over multiple polls. An NC State statistician proposes a better model.<\/p>\n","protected":false},"author":4,"featured_media":404,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"source":"ncstate_wire","ncst_dynamicHeaderBlockName":"","ncst_dynamicHeaderData":"","ncst_content_audit_freq":"","ncst_content_audit_date":"","ncst_content_audit_display":false,"ncst_backToTopFlag":"","footnotes":""},"categories":[1],"tags":[5],"class_list":["post-403","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-_from-newswire-collection-6"],"displayCategory":null,"acf":[],"_links":{"self":[{"href":"https:\/\/dev.ucomm.ncsu.edu\/web-platform-free-tier\/wp-json\/wp\/v2\/posts\/403","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dev.ucomm.ncsu.edu\/web-platform-free-tier\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dev.ucomm.ncsu.edu\/web-platform-free-tier\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dev.ucomm.ncsu.edu\/web-platform-free-tier\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/dev.ucomm.ncsu.edu\/web-platform-free-tier\/wp-json\/wp\/v2\/comments?post=403"}],"version-history":[{"count":0,"href":"https:\/\/dev.ucomm.ncsu.edu\/web-platform-free-tier\/wp-json\/wp\/v2\/posts\/403\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dev.ucomm.ncsu.edu\/web-platform-free-tier\/wp-json\/wp\/v2\/media\/404"}],"wp:attachment":[{"href":"https:\/\/dev.ucomm.ncsu.edu\/web-platform-free-tier\/wp-json\/wp\/v2\/media?parent=403"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dev.ucomm.ncsu.edu\/web-platform-free-tier\/wp-json\/wp\/v2\/categories?post=403"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dev.ucomm.ncsu.edu\/web-platform-free-tier\/wp-json\/wp\/v2\/tags?post=403"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}