Should pollsters weight their voter samples by age? And if so, by how much? That's an awfully wonky question, and yet it has been getting quite a bit of attention over the last few days. The answer may help explain why some polls show different results in the race for president.
The most recent story involves the daily "Battleground poll" sponsored by George Washington University and conducted by the team of Republican pollster Ed Goeas, of the Tarrance Group, and Democratic pollster Celinda Lake, of Lake Research Partners.
Their tracking survey had been showing John McCain with a consistent lead of 1 to 2 percentage points over Barack Obama during the last 10 days, even while three other daily tracking polls showed Obama regaining a lead. Democratic partisans were asking, Why is the Battleground survey different?
On Monday, blogger Nate Silver found what looked to be a likely culprit: The questionnaire and cross-tabulations published online by the Battleground poll showed a much older electorate than he saw when he checked the age of the 2004 electorate as reported by the Current Population Study (CPS) of the U.S. Census.
For example, Silver found that the percentage of 18-to-34-year-olds on the Battleground poll (17 percent) was much smaller than on the CPS survey of voters from 2004 (24 percent*). He considered it "highly unlikely" that the Battleground pollsters were "deliberately" weighting down younger voters. "Instead," he concluded, "very probably they simply aren't weighting by age groups at all."
At about the same time Silver was questioning the age distribution on the Battleground poll, Ed Goeas was making a change in its weighting procedure.
According to Goeas, until Tuesday they had been weighting by a combination of age, race and party identification. Starting on Tuesday morning they stopped weighting by party, allowing the Democratic advantage on the base party identification question to rise from what had been a 3 point advantage for the Democrats to a 7 point advantage. Overnight, Obama went from trailing McCain by 2 points to leading by the same, 48 percent to 46 percent margin.
The change, according to Goeas, had the net effect of increasing the percentage of 18-to-34-year-olds from 17 percent to 22 percent of their likely voter sample
So this story touches on two issues: (1) weighting by party and (2) weighting by age. I took up the topic of party weighting two weeks ago. Let's consider age.
The issue of statistically adjusting a poll sample by age has taken on greater urgency in recent years due to the increasing number of younger Americans choosing to use cellular phones rather than landline telephone service. Since these so-called "cell phone onlys" are out of reach of the standard telephone samples, younger Americans have become increasingly harder for pollsters to interview. The Pew Research Center, for example, found [PDF] that 18-to-34-year-olds fell from just over 30 percent of their unweighted samples in 2000 to 20 percent in 2006.
For pollsters who begin with a sample of all adults, weighting by age and other demographics is relatively easy: They can compare their percentages with the Census estimates for gender, age and race, and adjust accordingly.
However, things get a bit trickier for the subgroups of registered or "likely" voters, particularly when pollsters hang up on unregistered or unlikely voters at the beginning of the interview.
One challenge is a lack of official demographic statistics for past voters. The Current Population Survey of the Census comes close at the national level, but it is still a survey, and more important, relies on voters to accurately report whether they voted. As such, the turnout rates reported by CPS are typically higher than official vote counts.
Exit polls provide another potential source. In theory they do better by restricting their sample to actual voters, but exit polls have their own issues -- missing many absentee voters or those that simply avoid the exit poll interviewer.
Official voter files provide a third source of age data within individual states, but given the inconsistency across states, not for the nation as a whole. Lists also have the potential problem of missing records or including inactive "deadwood" registrants.
In a 2007 article in Public Opinion Quarterly, George Mason University professor Michael McDonald compared the three sources of data in 10 states and the District of Columbia and found "general agreement between the CPS and voter files," but found that the exit polls typically reported a younger electorate.
The bottom line is that no pollster knows exactly what the "right" answer is when it comes to the age of likely voters. Setting aside the wonky mechanics of the age of voters four years ago, we can only guess at the likely demographics on Nov. 4.
The Obama campaign has invested a small fortune toward boosting turnout among younger voters. Given the increased youth turnout in the primaries (as measured by exit polls), many Democratic partisans are convinced that the turnout among young voters will increase again in November.
That may be so, but the best pollsters will depend on their measurements, not on hunches or best guesses. The answer of how much to weight by age may come down to how pollsters deal with the missing cell-phone-only respondents. I will try to take that up in next week's column.
* Silver imputed 26 percent as the 2004 percentage of 18-to-34-year-olds, but the correct percentage according to data on the CPS website is 24 percent.