Why FDA Must Increase Transparency of Medical Devices Powered by Artificial Intelligence

More insight into product development and limitations could help providers and patients make informed decisions

Navigate to:

Why FDA Must Increase Transparency of Medical Devices Powered by Artificial Intelligence
A group of medical professionals gather around a handheld medical device.
Misha Friedman Getty Images

When used in health care, artificial intelligence (AI) holds the promise of improving patient outcomes, reducing costs, and advancing medical research. These tools can analyze patient images for disease, detect patterns in large sets of health data, and automate certain administrative tasks. But many companies develop AI-enabled medical products in what is essentially a “black box,” disclosing little to the public about their inner workings. Just as doctors and patients need to know what’s in a prescription medication, AI users need information about the tools that may be used to help make life-or-death medical decisions.

Not all AI-enabled tools fall under the purview of the Food and Drug Administration, but the agency regulates any software intended to treat, diagnose, cure, mitigate, or prevent disease or other conditions before it can be marketed and sold commercially. In recent years, FDA has been considering an updated approach to oversight of these products, including steps to improve how developers communicate about four key factors: a product’s intended use, how it was developed, how well it performs, and the logic it uses to generate a result or recommendation.

If companies do not disclose these details, prescribers and patients may be more likely to use the products inappropriately, and that can lead to inaccurate diagnoses, improper treatment, and harm. Here’s how and why this information matters to patients and prescribers:

  • Intended use. AI developers should clearly communicate how their products should be used—such as specifying exact intended populations and clinical settings—because these factors can greatly affect their accuracy. For example, researchers at the Mayo Clinic developed an AI-enabled tool to predict atrial fibrillation using data from the general population receiving care at the facility. Although it was highly accurate when used on that general population, it performed only slightly better than random chance in higher-risk clinical scenarios, such as on patients who had just undergone a certain type of heart surgery.
  • Development. Clinicians need information about the data used to develop and train AI systems so they can better determine whether and how to use certain tools for specific patients. If data comes from a limited population, for example, the product may incorrectly detect or miss disease in people who are underrepresented—or not represented at all—in the assembled training data. For example, AI-based smartphone apps designed to detect skin cancer may often be trained on images of mostly lighter-skinned patients. As a result, the products may not work as well on darker-skinned patients, which can lead to inappropriate treatment and the potential to exacerbate existing health disparities.
  • Performance. Prescribers and patients need to know whether AI tools have been independently validated and, if so, how they were evaluated and how well they performed. Currently, this information can be difficult to obtain and compare across tools because there are no set standards on how these products should be evaluated and no independent organization to oversee their proper use. In one case, researchers at a hospital system found that an AI tool developed to predict sepsis missed two-thirds of cases and was associated with a high rate of false alarms. The developer asserted, however, that the “researchers’ analysis didn’t take into account the required tuning that should precede real-world deployment of the tool.” Performance issues also arise when AI developers use the same data to train and validate their products. That can lead to inflated accuracy rates, akin to students using the same test for practice and the final exam.
  • Logic. Some AI tools, especially those enabled by machine learning techniques, are referred to as “black-box” models because the way they came to a result or recommendation cannot be explained. In other cases, a developer may keep this kind of information confidential. However, if clinicians and researchers are unable to understand the logic that a tool uses in reaching its conclusion, then they might not trust the recommendations it makes or be able to identify potential flaws or limitations in its performance. For example, one AI model used to analyze X-ray images made predictions based in part on the type of equipment used to take the image, rather than on the image’s actual contents. Had the model’s logic been more transparent at the onset, this flaw might have been corrected earlier.

FDA can promote increased transparency by requiring more and better information on AI-enabled tools in the agency’s public database of approvals. Currently, the details that companies publicly report about their products vary. For example, in an analysis of public summaries for the 10 FDA-cleared AI products for breast imaging, only one provided information about the racial demographics of the data used to validate the product. Requiring developers to publicly report basic demographic information—and where appropriate, data on how the product performed in key subgroups—could help providers and patients select the most appropriate products. This is especially important when treating conditions with disparate impacts on underserved populations, such as breast cancer, a disease more likely to be fatal for Black women.

Similar to its requirements for drug labeling, the agency could also require developers to provide more detailed information on product labels so that these tools can be properly evaluated before being purchased by health care facilities or patients. Researchers at Duke University and the Mayo Clinic have suggested an approach akin to a nutrition label that would describe how an AI tool was developed and tested and how it should be used. This would allow end users to better assess products before they are used on patients. The information could also be integrated into an institution’s electronic health record system to help make the data easily available for busy providers at the point of care.

AI can save lives and reduce health care costs, but providers and patients need to know more about these products to use them safely and effectively. FDA should continue its crucial work to increase the transparency of these revolutionary tools.

Liz Richardson directs The Pew Charitable Trusts’ health care products project.