After 1,070 public polls conducted at the state level, our unexpectedly long primary season has finally come to an end. Now, those of us who consume polling data ought to take a few moments to consider the rocky season we have just been through.
We started with a bang in New Hampshire, where every poll showed Barack Obama gaining over the last weekend and every poll showed him with at least a numeric advantage over Hillary Rodham Clinton. Obama's lead in the last round of polls mashed up to between 6 and 9 percentage points (depending on which averaging method you prefer), but Clinton surprised everyone, winning New Hampshire by 3 points, 39 percent to 36 percent.
Some of the most prominent voices in the polling world saw disaster. Andrew Kohut, president of the Pew Research Center, called it "one of the most significant miscues in modern polling history." ABC polling director Gary Langer agreed, labeling it a "fiasco" and adding, "It is simply unprecedented for so many polls to have been so wrong."
But then it got worse.
As conservative blogger Patrick Ruffini recounted this week, still bigger polling miscues occurred later in South Carolina, Georgia, Alabama and Wisconsin. To Ruffini, these problems exposed "how limited polling has been as a tool," especially given demographic voting patterns that appeared remarkably stable across the primaries.
The frustration with primary election polling as a forecasting tool leads me to one caution as well as a theory or two.
First, the caution. Pollsters have always had a harder time with primary elections than with other polling subjects. Surveys (subscription) conducted by 12 different pollsters just before the 2000 New Hampshire primary showed John McCain leading George W. Bush by an average of five percentage points (38 percent to 33 percent). McCain ended up winning by 19 points (49 percent to 30 percent), an even bigger error than this year's.
The discrepancy led the National Council of Public Polls to warn eight years ago that the "problems" of "forecasting a primary" are "almost insurmountable." They recalled that George Gallup had "shunned" pre-primary polling for most of his career.
Political campaigns hire pollsters to conduct pre-primary surveys, but their purpose is less about forecasting the outcome than it is about developing strategies to persuade potential voters and mobilize supporters.
Second, a theory. One big clue comes from the fact that the polling errors were generally bigger on the Democratic side this year. For example, I used the pollster "Report Card" data compiled by SurveyUSA to compare the errors for all pollsters. The average difference between the final poll margin (for the top two candidates) and the actual result was greater for the Democratic primaries (average 7.0 points, median 6) than the Republican primaries (average 6.0, median 4).
My hunch is that the pollsters had a tougher time in the Democratic primaries because of the same sharp and persistent demographic patterns that Ruffini highlights. Almost every state showed big differences in the Obama versus Clinton vote preference along lines of race, gender, age and class (usually measured by education or income) -- differences that were typically more pronounced when these variables were combined.
The cross-demographic group differences in the Obama-Clinton race are unusual. In primary polling, subgroup differences of 10 or more points on nongeographic variables are rare.
When vote preference shows big differences in demographic subgroups, and when the demographic composition of the "likely voters" varies from survey to survey, the potential for variation and error is considerable. And the latter variation did seem to occur this year, as we learned in collecting such data on Pollster.com for states like Iowa, Texas, Ohio and North Carolina.
The unfortunate reality is that many public pollsters are cutting corners when it comes to the way they gather interviews. The shortcuts have less consequence when the results show little or no demographic pattern. But in a race like this one, these flaws are more exposed.
Already, this election season has produced many new challenges to consider and methodological issues to chew over. And we still have five months of polling yet to go.