It was gratifying to see that late last week, blogger Nate Silver took the advice offered by my old boss Harrison Hickman (and presumably many others) and established a procedure to allow pollsters to audit the data from their polls that appear in the database on Silver's website, FiveThirtyEight. Good move.
This week, I want to look at the flip side of that issue, a finding that I mentioned in passing in last week's column. Silver recently scored the accuracy of 4,670 horse race polls from 264 pollsters conducted in the final three weeks of hundreds of campaigns going back to 1998. He then calculated "raw" accuracy scores using a regression model that he says effectively levels the playing field among pollsters by controlling for factors that can affect accuracy, such as the number of interviews, the type of election and the time between the poll and the election.
Silver then took a close look at two sets of pollsters that "have made a public commitment to disclosure and transparency": Members of the National Council on Public Polls (NCPP), who by definition abide by NCPP's Principles of Disclosure, and pollsters that have endorsed the Transparency Initiative launched this year by the American Association for Public Opinion Research (AAPOR).
Silver says his regression models show that pollsters with the NCPP/AAPOR designation were more accurate than the rest, and the relationship grew stronger for some over time: "If they were strong before," he writes, "they were more likely to remain strong; if they were weak before, they were more likely to improve."
This finding raised a few cautious eyebrows in the survey world. Frank Newport, AAPOR's current president and Gallup Poll editor-in-chief, refrained from evaluating Silver's findings except to underscore AAPOR's "long-standing commitment" to disclosure and transparency. He also shared, via e-mail, the assumption "that polling organizations that adhere to AAPOR standards will tend to be among the best in the business."
NCPP President Evans Witt spoke more directly to the linkage between transparency and accuracy: "Those who conduct and sponsor quality research are often first in line to disclose relevant information about their surveys.
"On the other end of the spectrum," he added, "there is no way to assess the quality of surveys about which the relevant information has not been disclosed."
However, Witt also offered a caution: "As much as we might like to think otherwise, disclosure does not equal quality. Full and complete disclosure does not turn a poorly crafted survey with shoddy methodology into quality research."
Separately, Silver made clear that neither AAPOR nor NCPP endorse his ratings "in any implicit or explicit way."
I have to admit that I want to believe the effect Silver identifies is real. I am a strong supporter of AAPOR's Transparency Initiative and have advocated scoring the quality of pollster disclosure. The pollsters I know that are most active in AAPOR and NCPP are typically the first to disclose information when I request it. They also tend to be more grounded in "tested practices" and the true science of survey research.
So in that sense, I assume that if Silver's NCPP/AAPOR effect is real, it is less about transparency per se than about other pollster values and practices that often correlate with it.
We should be clear, of course, that whatever one makes of Silver's findings, they are not about a literal cause-and-effect relationship between transparency and accuracy. I don't think anyone would argue that endorsing the transparency initiative last month "caused" pollsters' surveys to be more accurate a year or 10 years ago.
But as much as I want to believe in the effect, I still have questions. Virtually all of the observed effect rests on a number of regression models. The biggest open question is whether Silver's initial raw score model leveled the playing field among pollsters as advertised, so that the error scores for, say, a pollster that conducts mostly polls in presidential general elections are comparable to those from a pollster that conducts a lot of statewide polls in governor and U.S. Senate races.
I asked Silver about two possibilities: Would the effect still be present if he examined only general elections for Senate and governor (as many of the NCPP/AAPOR organizations poll mostly in national elections for president)? And would it still be present if he did the analysis without super-accurate SurveyUSA, which accounts for more than half the polls in his NCPP/AAPOR category?
The short answer is that he thinks the effect holds up in both cases, although he concedes that when treating SurveyUSA as a separate variable on the sort of straightforward regression analysis of all of the raw scores, it "dips closer to the 90 percent confidence threshold in some circumstances, but it depends on exactly which assumptions you make."
Patrick Murray, director of the Monmouth University Polling Institute, did his own analysis of Silver's raw error scores and says he found more variation within each group (NCPP/AAPOR versus the rest) than between them. The group differences "appear to be idiosyncratic," he said via e-mail, if limited to pollsters with 10 or more surveys in Silver's database, "thus arguing for evaluating each pollster on its own merits, not by group evaluation."
Confused? Me too. I'd like to believe in the transparency effect, but I need to see more evidence.