Last week's column on the statistical model by anonymous blogger "Poblano" of FiveThirtyEight.com, who uses mostly population demographics rather than polling to predict the results of the Democratic primary contests, drew some reaction worthy of further comment.
The model did better than pre-election polls in predicting the candidate percentages in Indiana and North Carolina (the raw vote was another story -- see below). In West Virginia this week, the model again came close, predicting a 39-point victory (67.4 percent to 28.6 percent) for Hillary Rodham Clinton over Barack Obama; in fact, Clinton won by 41 points (67.0 percent to 25.7 percent).
However, in this case, two pre-election polls did as well or better. The American Research Group (ARG) had Clinton leading by 43 points (66 percent to 23 percent), and a survey by political consultants TSG Consulting and Orion Strategies had Clinton ahead by 40 points (63 percent to 23 percent).
Here are some questions and answers raised by last week's column:
Is the success of Poblano's model in predicting candidate percentages an argument for abandoning polling?
Hardly. The model depends on a very unique combination of circumstances: (a) the first presidential primary contest in 24 years on the verge of going wire-to-wire featuring (b) deep divisions in candidate preference that (c) have remained remarkably stable across the vast majority of primaries. Moreover, (d) the model's success in the last four states depends on a rich pool of congressional district data from previous contests available only at the end of the process.
These preconditions are simply absent in virtually all other political races. Also, Poblano's model estimates only the "horse race" preference. It provides no measurement of voter opinions on issues or their perceptions of the candidates and their messages. So pollsters need not worry about losing their jobs to this particular technique anytime soon.
What about the failure of Poblano's model to predict turnout?
As both ARG pollster Dick Bennett and anonymous commenter "StatsProf" pointed out, the model "missed the mark" in its turnout projection by wide margins. In West Virginia, the total vote for Clinton, Obama and John Edwards (356,790) was 32 percent higher than what Poblano had predicted (270,000). According to Bennett, Poblano similarly underestimated turnout by 37 percent in Indiana and 38 percent in Pennsylvania.
Looking at the turnout estimate, "StatsProf" concludes that the models "are not a good representation of the actual vote," much like the "terribly flawed" work of students who merely "stumble onto the correct answers." "You should insist that the modelers you promote show their work and not just the results," he or she writes.
Now, I will grant that I made little effort to assess the mechanics of Poblano's model. While he describes the model and the data used in considerable detail, he does not provide some of the specifics that an academic journal would require for peer review. Whether Poblano's work would pass such review, I cannot say.
But that was not my purpose. My interest was sparked by what I assume to be the underlying reason that the model succeeds with candidate percentages: the continuing stability of vote preferences in the Clinton-Obama race.
As long as we are on the subject, how well do public opinion polls predict turnout?
If a "good representation" of the election outcome requires accurate assumptions about the level of turnout, then pre-election polls may have problems of their own.
Survey researchers have long understood that identifying likely voters is a lot more complicated than simply asking respondents whether they plan to vote. Since Americans tend to exaggerate their intent to vote, pollsters have developed sometimes elaborate procedures for selecting or representing the likely electorate. Some depend more than others on current survey measures of voter enthusiasm and intent, but all tend to rely to some degree on the pollster's judgment.
While a few produce quantitative turnout estimates, most do not. In fact, many pollsters have reported consistently "accurate" readings of the percentages that candidates received based on surveys that appear to assume unrealistically high levels of turnout.
Consider the automated surveys conducted this year by SurveyUSA in California, Maryland and Ohio, which predicted the exact margin separating the top two candidates. The Democratic "likely voters" that SurveyUSA selected in those states represented 44 percent, 46 percent and 44 percent, respectively, of the adults in their samples. By comparison, the actual turnout of Democrats in those states amounted to 23 percent, 20 percent and 26 percent, respectively, of all eligible adults.
I do not mean to pick on SurveyUSA on this point, since it is among the few polling organizations to routinely disclose the "filtering" statistics necessary to determine the percentage of adults represented by their likely voter samples. Other pollsters, including ARG, routinely fail to "show their work" in this respect, as some might put it.