If Your AI Can’t Explain Itself, Can FDA Authorize It? -

Consider a hypothetical scenario that reflects a pattern increasingly reported by manufacturers: a 510(k) submission for an AI-powered diagnostic tool with 99% sensitivity. Strong numbers. Solid validation. And yet — a deficiency letter comes back from FDA.

🩺 Daily Picks by Our Team
Brought to you by Healthwise Feed’s “Daily Picks” team. Curated from credible health sources.
👉 Follow us @healthwisefeed1 for more wellness tips and updates.

Not because the model underperforms. Because reviewers cannot determine what it is actually doing or why.

This is the reality manufacturers are running into as AI/ML-based Software as a Medical Device (SaMD) enters the 510(k) and De Novo pathways. High performance is necessary — but it is not sufficient on its own. FDA is placing increasing weight on a different question: not just:

How precise is the AI?” but rather is now asking
Can you describe what this AI does and the clinician’s ability to rely on its results?

FDA is not asking, “manufacturers to make complex algorithms simple. It is asking manufacturers to make their intended use, logic, limitations, performance, and clinical role understandable enough to support safe use.”

Performance Alone No Longer Closes the Case:

Previously, most of the efforts by device manufacturers to develop AI technologies have focused on creating extraordinary machine learning models validated using expansive datasets with detailed performance metrics. Although these approaches are still important, they do not adequately address the increased attention that the regulators are paying to how devices use algorithmic techniques and what evidence is being used to substantiate the assumption that clinicians can trust these medical devices.

As such, a clinician is downstream of the decision-making process when an artificial intelligence (AI) system makes a diagnosis or modifies treatment. The function of the FDA is to validate that the model is accurate, but it is also to ensure that clinicians have the ability to comprehend, question and intervene with respect to the output, where necessary.

A black box result, even one that is very high in accuracy does not empower clinicians in this manner. The deficiency letters related to AI/ML 510(k)’s are increasingly highlighting the same types of deficiencies – lack of explanation of how the model derives its output, insufficient subgroup performance analysis, lack of clear understanding of how the device performs at the edges, such as on minority populations, under out of distribution inputs, or in clinical practice conditions.

What FDA’s Guidance Actually Requires?

The FDA does not have a separate document titled “Explainability Guidance.” Instead, there are multiple key documents that require transparency, and the combination of all of these establishes clear expectations for AI and machine learning systems.

The Good Machine Learning Practice (GMLP) Principles from 2021 set the requirement for transparency as a foundational element; AI and ML systems must clearly convey the purpose, assumptions, and limitations of such systems to the end user. This requirement is not discretionary.

The 2025 Marketing Submission Recommendations for a Predetermined Change Control Plan (PCCP) for Artificial Intelligence-Enabled Devices take this a step further – in order to be granted the ability to modify an AI model after it has been cleared by the FDA, without having to go through the resubmission process, manufacturers need to produce documentation on how the algorithm operates and the limits of changes to it, such that the FDA is able to evaluate any future updates to the model. This level of documentation will not be possible without a transparent, well-characterized algorithm.

IMDRF N41 (Clinical Evaluation of Software-as-a-Medical Device (SaMD)) requires that developers demonstrate an understanding of the clinical connection between an AI’s output (i.e. the result of the AI’s calculations) and the condition that it is intended to treat. In order to demonstrate this through clinical evidence, the developer will need to have a model that they can explain.

Read together, GMLP principles, PCCP requirements, and clinical evaluation frameworks create an effective explainability standard — even without a standalone guidance document. Manufacturers who miss this are sitting on a compliance gap.

Three Things FDA Needs to See:

When a reviewer looks at an AI/ML submission for transparency, three common areas that will come up are:

1) Traceability: Do you have the ability to trace a model output back to the data, features and design decisions that led to it? The FDA requires a documented chain of custody including the composition of the training dataset, any preprocessing that was done and your reasons for using that preprocessed data, decisions related to their features, the version history of your AI and known failure modes, etc. If you present a performance metric for your AI without this context, it’s just a number.

2) Accountability: When an AI is doing something incorrectly, who is responsible? In practice, this means defining your human oversight model so that clinicians know how to work with the outputs from the AI, what the pathway is if they choose to override the output from the AI and how to monitor the performance of the AI in like between when it was market-ready and on the market.

The GMLP framework that the FDA published requires that the interdisciplinary team developing the AI must have people who know how to practice medicine involved while the AI is built, rather than having those people review the AI only at the end of its build.

3) Comprehensibility: Will your target end-users (a radiologist, a nurse or a general practitioner) understand what the AI is telling them and be able to take action based on that information? This does not mean exposing the internals of how the AI works. This means making the AI’s output clinically interpretable, giving clinicians the levels of confidence associated with the AI and making clinicians using the AI aware of its limitations.

Where Submissions Break Down?

Failures in both AI/ML submissions appear to follow the same general pattern. A well-known technical team creates a solid model with evidence of prior validation or experience developing something like this.

The submission includes detailed descriptions of the model’s architecture and metrics yet has only a few lines about explainability (a single statement about the use of SHAP values, statement about training of the clinicians).

Explainability was not viewed as a must-have by the submitting team and ultimately gets treated as a secondary issue to the submission. However, upon receipt of the “deficiency” letter requesting exactly what was considered secondary (i.e., how are the outputs generated, how should clinicians interpret the results from the outputs, and how well does the performance of the model hold-up across different groups of patients?), the submitting team is typically subjected to a remediation process that lasts from 6 months to 1 year.

This remediation process could be avoided if submitting teams built transparency into their submission. In other words, using transparency when creating the model and submitting the application rather than waiting until the end to add in transparency.

What a Transparent Submission Looks Like?

In general, successful submissions share several characteristics that are consistent with successful review:

Algorithm Description Document:

Describes the function of the algorithm (model), how it is trained, validated, and the limitations of the algorithm in plain English clinical language, not just a reference citing a published paper.
Documented training and validation datasets that include more than just the number of samples (number of patients): the demographic makeup of the sample, how the data was collected and how it was controlled for quality/accuracy, and how the known biases were addressed.
Subgroup analysis of performance, i.e., breaking out performance based upon: age, sex, race/ethnicity, disease severity, and other relevant clinical characteristics, not reporting only aggregate performance statistics.

A plan for post-marketing surveillance with well-defined performance thresholds, as well as clear triggers for when to implement a change or new submission. Documentation of human factors that demonstrates the manner in which AI outputs are displayed, how uncertainty of an output is communicated, and how the target user group was tested.

With De Novo submissions, where FDA is also creating the regulatory framework for future devices within the same class, the standard is even more demanding. Reviewers must have enough understanding of the algorithm to create appropriate special controls for the entire device class. This is impossible to accomplish if there is no transparency on the part of the submitter.

One additional consideration worth flagging: as of early 2026, FDA has fully transitioned to the new Quality Management System Regulation (QMSR), which aligns 21 CFR Part 820 with ISO 13485. This has practical implications for AI/ML submissions.

Data origin documentation, algorithm version control, and explainability records must now fit within a QMS — not exist as only technical files. Manufacturers building transparency infrastructure should confirm it integrates directly with their QMS from the start.

Conclusion:

The current era of using a black box AI and having it be supported by robust metrics has ended. FDA’s guidance (encompassed in multiple documents and each consistent in intent) clearly indicates that algorithmic transparency is a pre-condition for approval (not a bonus).

In fact, what FDA is truly asking for is not whether your AI is accurate; rather, it is whether your AI is accountable. There is no approval for an accurate AI without an account. The first step toward establishing a level of accountability is being able to communicate in specific clinical terms exactly how the algorithm is operating and why.

Ask yourself before you submit:

Can you explain what your model does to a clinician who has never seen a neural network?
Can you show FDA exactly which inputs drove each output?
Do you have a plan for when the model is wrong?
If not — explainability is where your submission prep should start, not finish.

The post If Your AI Can’t Explain Itself, Can FDA Authorize It? appeared first on MedTech Intelligence.

🔗 Read the full article on the original source

💬 Enjoyed this article? Share it with others: