Whenever I'm asked to speak about polling, someone always asks about the Internet: When will pollsters start conducting their surveys online rather than by telephone?
The question is no longer hypothetical. Surveys based on "panels" of respondents recruited online to answer questions on the Internet have been rapidly gaining acceptance in commercial market research. While most political opinion surveys are still done by telephone, we now see Internet surveys conducted by organizations like Harris Interactive, Zogby International and YouGov/Polimetrix cropping up more often. (Disclosure: YouGov/Polimetrix is the owner and primary sponsor of my Web site, Pollster.com.)
The growing use of Internet panel surveys has created significant controversy because the sampling method most of them use breaks with a critical element of survey orthodoxy. That debate got a significant jolt last month when ABC News polling director Gary Langer published a new study conducted by a team of Stanford University researchers that "finds trouble" in the form of consistently "less accurate" results for surveys conducted using opt-in panels, as online polls are.
In reality, the goal of obtaining a true random sample has always been difficult.
I want to review that study and the challenges that have been made to its conclusions. Before doing that, however, it is worth stepping back to consider the underlying argument about random probability sampling, what Langer describes as the "theoretical underpinning on which valid and reliable survey research is based."
The concept behind random sampling is straightforward and familiar to anyone who has taken a statistics class: Every member of the population of interest needs to have some chance of being selected into the sample (or to be technical about it, they must have a known, non-zero probability of selection). The selection procedure also needs to be mechanical and objective. Interviewers cannot pick respondents arbitrarily, and respondents cannot select themselves.
In reality, the goal of obtaining a true random sample has always been difficult. No matter how rigorous the methodology, some selected respondents will be unavailable or refuse to participate. By "opting out," they create the potential to introduce some statistical bias into the survey results.
The Internet, however, also poses two even bigger barriers to true random sampling. The first is that not all Americans are online, although the share that now use the Internet (79 percent according to the Pew Internet and American Life Project) nearly matches the number with landline phone service (80 percent according to the National Center for Health Statistics). So while coverage is still a problem for Internet surveys, it is a growing issue for landline-only telephone surveys as well.
The bigger problem is that pollsters lack a "sample frame" for the Internet. In other words, there is no comprehensive list of Internet users and no mechanism comparable to 10-digit telephone numbers that allows us to select online Americans at random. Even if such a mechanism were available, most e-mail providers ban unsolicited e-mail, and most of us routinely delete those messages from unknown addresses that manage to slip through our spam filters.
To get around that barrier, most of the companies that conduct Internet surveys use non-random methods to recruit panelists. While the various companies have experimented with a variety of techniques, most now rely on advertisements that appear on Web sites. They recruit as many panelists as they can, knowing that the pool itself will be non-random and non-representative. They then interview samples drawn from the panel that they attempt to make representative either by the mechanism used to draw respondents from their pool (sometimes called sample balancing) or statistically weighting after interviewing is completed.
And that's the nub of the controversy. Can we get reliable, valid results from a survey that does not begin as a random sample?
Five years ago, Morris Fiorina and Jon Krosnick, two Stanford professors who were then providing advice and analysis to the YouGov/Economist opt-in Internet poll, authored a paper that aptly described the fundamental question. The opt-in panel approach, they wrote, "entails a reexamination of first principles, namely, the principle that a probability sample is the only means to achieve a representative sample. The advocates of this approach argue that probability sampling may be a sufficient condition, but it is not a necessary condition."
Fiorina and Krosnick observed at the time that "some results produced by this approach have been surprising to the skeptics," yet other evidence had suggested "that volunteer Internet survey samples can be much less representative" than those obtained by more traditional sampling methods. The answer, they said, would require more study: "Only as more such research is done can we understand how volunteer Internet polling unfolds and what determines its accuracy in particular contexts."
The same Jon Krosnick was also a lead author of the study published last month that found opt-in surveys are "not as accurate and are sometimes strikingly inaccurate" as compared to more conventional methods. Other voices have challenged those findings, and Krosnick and his colleagues responded. Next week, I will review the results of that study and ensuing debate.