Have you smoked at least 100 cigarettes in your life? Have you practiced yoga in the past year? On average, how many hours of sleep do you get in a day?
At face value, these questions are not directly related to the topics that the Pew Research Center is most committed to studying. Yet our researchers have been periodically asking questions like these for years. Why? These are examples of benchmarking questions, which the center uses as a check to ensure that our surveys are accurate.
Why and how we use benchmarking questions
Determining the accuracy of a survey requires some sort of objective standard against which the survey can be compared. In election polls and other measures of voting intent, the standard is the outcome of the election. But for surveys that don’t ask about elections or voting intent, researchers need to find another way to benchmark their findings. This is often done with the help of other surveys—usually large, expensive government surveys conducted with great attention to data quality.
Pew Research Center surveys occasionally include questions about economic, demographic, and lifestyle characteristics for which government statistics are available as a benchmark. This not only helps us check the accuracy of our findings, it also helps us study how surveys themselves can be better conducted.
Take, for example, a Pew Research Center study from last year that examined what low response rates—many potential respondents being contacted but far fewer of them participating—mean for the accuracy of telephone surveys. To help answer this question, the study compared the results of a telephone survey by the center with those of high-response benchmark surveys by the federal government to see what, if any, differences existed.
The report found that Pew Research Center surveys were closely aligned with federal surveys on key demographic and lifestyle benchmarks. Across 14 questions about personal traits, the average difference between the government estimate and the center’s telephone survey estimate was 3 percentage points. Differences on individual questions ranged from 0 to 8 points. The largest was on a measure asking respondents about their health status: The government found that 59 percent of people rated their health as very good or excellent, while the center’s telephone survey found 51 percent doing so.
The other 13 items were quite close to the benchmarks, most with differences of 3 percentage points or fewer, which was generally within the margin of error. These questions included measures of family income, employment status, household size, citizenship, health insurance, length of residence at current address, marital and parenthood status, smoking frequency, place of birth (among Hispanics), and having a driver’s license. In other words, on these measures, the low-response telephone survey provided results quite comparable to those of the high-response government survey used as a benchmark.
Overall, the report showed that bias introduced into surveys due to low response rates remains limited in scope. And, critically, telephone poll estimates for party affiliation, political ideology, and religious affiliation continue to track well with estimates from high-response-rate benchmark surveys.
However, as the center and other survey researchers have discussed extensively, telephone surveys continue to yield large biases on measures of civic and, to a lesser extent, political engagement. This discrepancy is probably because of nonresponse bias—in which the kinds of people agreeing to participate in surveys are systematically different from those who can’t be contacted or refuse to participate. As found in previous work, the people who answer surveys are likely to be the same people involved in community life—they are joiners, and participating in surveys is a kind of pro-social behavior related to other kinds of behaviors such as volunteering. Fortunately for pollsters, civic engagement is not strongly correlated with political attitudes or most other measures researchers study in surveys.
Caveats about benchmarks
Although large government surveys are generally considered to have high data quality, they’re not immune to some of the same problems every survey researcher faces. For example, while government surveys tend to have very high response rates (on the order of 60 percent or more) compared with opinion polls conducted by other organizations, the risk of nonresponse bias still exists.
Government surveys, while carefully developed and tested, are also still subject to measurement error, which can arise from the way in which questions are asked (such as what questions come immediately before a particular question, whether the survey was conducted on the phone or online, etc.). Pew Research Center questionnaires that include benchmarking questions do not replicate the exact context in which the original questions were asked, particularly because the center tends to focus on topics that are different from those in benchmark surveys. Benchmarks also are generally unavailable for questions about attitudes and behaviors that the government does not study.
All surveys can also face response bias issues, including social desirability bias, where respondents may modify answers to certain questions to present themselves more favorably. This is especially a risk when an interviewer asks sensitive questions: Respondents may, for example, overstate their voting frequency.
All of these factors can affect the comparability of seemingly identical questions asked on different surveys, including government surveys. That said, benchmarking questions continue to be a valuable tool for survey researchers checking and assessing accuracy. They are especially vital for the center’s studies on survey methodology.