What Drives Program Evaluation Costs?

Factors policymakers can consider when measuring the impact of government services

Fact Sheet October 22, 2020 Read time:

Projects: Results First

This fact sheet was updated Dec. 3, 2020, to reflect the statewide data sample size rather than the sample size for only the Portland portion of Study 2, as originally noted under the project design. Also, foundation and government grants funded all staff time, not just the staff time of people from the local hospital system, as originally suggested in the personnel section.

Evaluations are a powerful tool that you, as a policymaker, can use when determining how to allocate limited public resources toward effective programs. Though many types of evaluations exist, this fact sheet focuses on impact evaluations, which allow you to know whether a program achieved its intended effects, such as an increase in employment or a decrease in crime. Impact evaluations can help you make decisions to invest in what works, scale back what doesn’t, or look for ways to improve a program with results that weren’t as positive as expected.¹

One of the most common questions about evaluations is how much they cost. Unfortunately, no single, straightforward dollar amount can be cited, as the cost of an evaluation depends on many factors. However, understanding the following cost drivers can help inform how much money could be budgeted for evaluation.

Four key drivers of evaluation costs

What drives evaluation costs? What questions do you need to answer to determine how each factor contributes to costs? Your state agency and research partners can help you answer these questions.

	Outcomes What do you want to learn? What you want to learn may influence the evaluation cost. For example, assume you want to know the impact of an eight-week summer school program for third graders. If you are interested in only one outcome, such as improvement in reading comprehension, the data you need may be straightforward and easy to obtain. However, if you are also interested in learning if the program affects peer relationships, empathy, and rule-following, you will likely need more time and more sophisticated data-gathering methods. Evaluations that examine multiple and complex outcomes may be more expensive.
	Data How will information be collected? Obtaining the data needed to answer your questions can be one of the biggest cost drivers for evaluations. If the outcomes of interest can be studied using available and accessible administrative data, such as standardized test scores, the costs of data collection will be lower. However, if other data collection efforts are needed, such as surveys of students or interviews with teachers, then costs will increase because these tools require more time and money to design and implement.
	Design How will the evaluation be carried out? In addition to outcomes of interest and availability of data, several other factors shape the design of an evaluation, including size (number of participants), setting (type and number of locations), and length of the study (how long the outcomes are monitored). All of these are likely to affect costs. For instance, an evaluation that studies employment outcomes of 900 trainees from nine vocational education centers across the state will require more time, staff, and travel than one that studies 100 trainees at one center. Similarly, a study that looks only at the short-term impacts of a program—such as whether job training services improved participants’ ability to keep a job for six months—will likely be cheaper than one that also assesses whether these gains were sustained for two years.
	Personnel Who will conduct the evaluation? Two key questions need to be addressed when deciding who will conduct the evaluation. First, does your state have the internal capacity, such as a research unit within an agency or the legislature, to do an evaluation? If so, the cost will likely be much lower than contracting with an external entity, whether that be a public university, a nonprofit organization, or a private research firm. Second, if your state has internal capacity, is it sufficient to carry out the evaluation as envisioned? Research units may be able to conduct simpler evaluations that rely on administrative data, but they may not have the resources to carry out more complex studies. Regardless of who conducts the evaluation, it is also important to factor in the cost of asking program staff to participate in evaluation activities such as recruitment or data collection, which takes time away from their full-time work to deliver programs.

Examples of real-life evaluations

Let’s identify these factors in two actual evaluations and consider how key drivers might have contributed to costs in each one.

	STUDY 1 Preventing Youth Violence and Dropout, which examined the effect of an after-school program called Becoming a Man (BAM) for middle and high school boys in Chicago public schools.²	STUDY 2 The Oregon Health Insurance Experiment, which examined the impact of expanded Medicaid coverage for low-income adults in Oregon.³
Outcomes What do you want to learn?	$ Few outcomes BAM’s effects on youth delinquency, violence, and drop-out rates	$$$ Many outcomes Medicaid’s impact on patients’ physical and mental health, rates of diagnoses, use of prescription medications, financial circumstances, health care utilization, and civic participation
Data How will information be collected?	$ All administrative data Student attendance and enrollment data from the school system and arrest records from the state police	$$$ Mixture of data sources Mail surveys, in-person interviews, and administrative data sources (e.g., emergency department data, commercial credit reports, voter records).
Design How will the evaluation be carried out	$ Fewer people studied, shorter duration Given that administrative data sources were used, the sample size of 2,740 boys at various schools throughout a single city likely did not have much bearing on the cost. Evaluators collected data during and one year after the intervention.	$$$ More people studied, longer duration Over a two-year period, evaluators collected administrative data on a statewide sample of 74,922 people. They also gathered in-person survey data from 12,229 of these individuals living in the Portland area.
Personnel Who will conduct the evaluation?	$ Small-scale funding, simpler study All personnel were university affiliated, funded by a mix of university salaries or stipends and foundation grants. Having administrative data on hand, a smaller study sample size, and a shorter study duration probably lowered personnel costs.	$$$ Large-scale funding, more complex study Foundation and government grants funded analytic time and data collection, led by universities and a nonprofit research organization in collaboration with a local hospital system. The complexity of this study, combined with more resource-intensive data collection methods and a larger sample size, probably resulted in higher costs for both personnel and data collection.

STUDY 1

Preventing Youth Violence and Dropout, which examined the effect of an after-school program called Becoming a Man (BAM) for middle and high school boys in Chicago public schools.²

STUDY 2

The Oregon Health Insurance Experiment, which examined the impact of expanded Medicaid coverage for low-income adults in Oregon.³

Outcomes
What do you want to learn?

Few outcomes

BAM’s effects on youth delinquency, violence, and drop-out rates

$$$

Many outcomes

Medicaid’s impact on patients’ physical and mental health, rates of diagnoses, use of prescription medications, financial circumstances, health care utilization, and civic participation

Data
How will information be collected?

All administrative data

Student attendance and enrollment data from the school system and arrest records from the state police

$$$

Mixture of data sources

Mail surveys, in-person interviews, and administrative data sources (e.g., emergency department data, commercial credit reports, voter records).

Design
How will the evaluation be carried out

Fewer people studied, shorter duration

Given that administrative data sources were used, the sample size of 2,740 boys at various schools throughout a single city likely did not have much bearing on the cost. Evaluators collected data during and one year after the intervention.

$$$

More people studied, longer duration

Over a two-year period, evaluators collected administrative data on a statewide sample of 74,922 people. They also gathered in-person survey data from 12,229 of these individuals living in the Portland area.

Personnel
Who will conduct the evaluation?

Small-scale funding, simpler study

All personnel were university affiliated, funded by a mix of university salaries or stipends and foundation grants. Having administrative data on hand, a smaller study sample size, and a shorter study duration probably lowered personnel costs.

$$$

Large-scale funding, more complex study

Foundation and government grants funded analytic time and data collection, led by universities and a nonprofit research organization in collaboration with a local hospital system. The complexity of this study, combined with more resource-intensive data collection methods and a larger sample size, probably resulted in higher costs for both personnel and data collection.

Funding resources for evaluations

The following organizations offer grants to state governments to fund evaluations in a variety of policy areas:

Abdul Latif Jameel Poverty Action Lab, https://www.povertyactionlab.org/initiative/state-and-local-innovation-initiative
Arnold Ventures, https://www.arnoldventures.org/grantees
Federal government, https://www.grants.gov/web/grants
Institute of Education Sciences, https://ies.ed.gov/funding/
John D. and Catherine T. MacArthur Foundation, https://www.macfound.org/info-grantseekers/
Robert Wood Johnson Foundation, https://www.rwjf.org/en/how-we-work/grants-explorer/funding-opportunities.html
Smith Richardson Foundation, https://www.srf.org/
William T. Grant Foundation, http://wtgrantfoundation.org/grants

Endnotes

The Pew Charitable Trusts, “Targeted Evaluations Can Help Policymakers Set Priorities” (2018), https://www.pewtrusts.org/en/research-and-analysis/issue-briefs/2018/03/targeted-evaluations-can-help-policymakers-set-priorities.
S. Heller et al., “Preventing Youth Violence and Dropout: A Randomized Field Experiment,” National Bureau of Economic Research Working Paper 19014 (2013), https://www.nber.org/papers/w19014.
K. Baicker et al., “The Oregon Experiment—Effects of Medicaid on Clinical Outcomes,” The New England Journal of Medicine 368, no. 18 (2013): 1713-22, https://www.nejm.org/doi/full/10.1056/NEJMsa1212321.

This fact sheet benefited from the insights and expertise of J-PAL North America, a regional office of the Abdul Latif Jameel Poverty Action Lab (J-PAL), a global network of researchers who use randomized evaluations to answer critical policy questions in the fight against poverty. Although they supported content development and reviewed various drafts of this publication, neither they nor their organizations necessarily endorse its conclusions.

Downloads What Drives Program Evaluation Costs? (PDF)