Current Census Director Robert Groves, a much-acclaimed survey methodologist, in 1989 proposed a distinction between "describers" (those who use surveys to measure characteristics or opinions) and "modelers" (those who use surveys to test theories about "causal relationships" -- whether one characteristic causes another).
Why kick off this column with a reference to arcane terminology from a 20-year-old textbook? Because the debate Groves wrote about between describers and modelers raises an important point about the issue of online "panel" surveys and about much of the pre-election data that political junkies consume. All told, there is quite a bit of modeling going on, if not precisely the variety that Groves wrote about.
Let's start with online panel surveys (a subject I have discussed here previously). True random samples, like those used in the first step of telephone surveys, are impossible on the Internet. So Internet polls choose respondents from panels of volunteers they have previously recruited through various non-random means who agree to be interviewed online. They routinely poll members of the panel and use special weighting or selection procedures to produce a representative sample. Put another way, they try to "model" a larger population.
The connection to the Groves book was made by another academic, University of Michigan professor Mick Couper, who analogized the modelers-versus-describers debate to a similar debate about online panel surveys at last month's conference of the American Association for Public Opinion Research.
"If the model works," Couper said, "if the assumptions are held, then it should work. But if not, you're kind of screwed."
That's why Couper and many colleagues argue, in a recently released AAPOR Report on Online Panels, against the use of such panels for surveys that seek "to accurately estimate population values." That's a fancy way of describing what most surveys do. In other words, the AAPOR report recommends sticking with more traditional sampling methods for measuring the characteristics or opinions of some larger population.
Just last week, blogger Nate Silver produced data that vividly illustrate Couper's point. In examining the accuracy of final pre-election polls conducted for various offices since 1998, Silver found that two online panel pollsters -- Harris Interactive and YouGov/Polimetrix -- produced results that were at least as accurate as telephone polls conducted in the same years. (Disclosure: YouGov/Polimetrix is the owner and primary sponsor of my website, Pollster.com.)
Yet the final Internet panel surveys released by Zogby International since 2004 were "perhaps two orders of magnitude worse than typical telephone polls." Where comparable polls produced errors of 3-5 points on the margin between the two top candidates, Zogby's online results were off by an average of 7-8 points.
In other words, the assumption behind panel surveys is that statistical adjustment can help remove the bias. The precise mechanisms used by various pollsters differ. When the assumptions behind their models hold, the pre-election forecasts have been as accurate as other polls. But when they don't, those that relied on the data are "kind of screwed."
But let's not confine this logic to online panel surveys. The reality is that much of pre-election polling also depends heavily on modeled assumptions to create representative samples of likely electorates.
Consider the testimony of another respected Michigan academic, professor Michael Traugott. In December 2009, I attended another AAPOR-sponsored session where Traugott presented data on the accuracy of polling in the 2008 campaign. The data showed automated telephone surveys (those that do not use a live interviewer) had been "increasing steadily in terms of accuracy."
How could that be, an audience member asked, given the qualms Traugott and others had previously expressed about shortcomings in the automated methodology? He answered that pre-election polls have become "heavily modeled."
AAPOR's online panel report makes a similar point: A pre-election pollster "must make numerous decisions about how to identify likely voters, how to handle respondents who decline to answer vote choice questions, how to weight data, how to order candidate names on questionnaires and more." Since the assumptions and judgments that pollsters make have considerable bearing on the accuracy of their forecasts, the report questions the "usefulness" of accuracy data as way to evaluate online polls.
But again, let's focus on a broader point: Whatever the sampling method, horse-race results from pre-election polls are often only as good as the assumptions a pollster makes about the nature of the likely electorate.
Finally, remember that most telephone surveys are now also "modeled" in one important way: Most begin with some sort of random sample of phone numbers -- though some fail to cover those without landline phones or whose numbers do not appear on the sampled list -- and then interview those they can reach who are willing to be interviewed.
For most public media polls, that means that most pollsters now ultimately interview someone in fewer than 20 percent of the randomly selected households. Their unweighted samples will typically show statistical bias, as it is harder to reach and interview younger Americans, men, nonwhites, the less-educated and those living in urban areas, and the unweighted data skews accordingly. As such, pollsters typically weight by these demographic characteristics on the assumption that doing so corrects for any other bias that differentiates respondents from non-respondents.
Usually, that "model" seems to work. Just keep in mind, that when it doesn't... we're kind of screwed.