
Is explainability necessary for using AI in medicine? I argue that without understanding what happens in the opaque reasoning process, interpretable AI is impractical in many facets of healthcare without human redundancy.
Suppose a patient asked their doctor for a diagnosis and the doctor was unable to explain their reasoning until after the diagnosis. This is precisely how opaque reasoning occurs in highly complex models, such as deep learning networks, which have billions of parameters which interact in ways that are not able to be understood by humans in real time. There are many methods for interpreting opaque reasoning (not to be confused with explainability), such as LIME (Local Interpretable Model-agnostic Explanations), SHAP (SHapley Additive exPlanations), PDP (Partial Dependence Plots), and ICE (Individual Conditional Expectation) plots, but none explain the exact process by which the algorithm generated an output (hence its opacity). Some simulate a simpler model that mimics the behavior in an understandable way, whereas others identify the importance of various features in the decision-making process.[7] Herein lies the crux of the matter, are IAI outputs actionable?
Opaque reasoning alone is not actionable because we have a duty to verify its outputs. Until insights can be justified in human terms, there are severe risks in using that information to make decisions that will have profound impacts on well-being.
As we know, a model’s outputs are reliant on the data it was trained on, and since training data reflects human knowledge, it has the potential to be biased, flawed, or inaccurate.[8][9] While there are mitigation strategies, without being able to specifically point to how weights were developed during training and how they relate to each other, the outputs of an interpretable model are potentially perpetuating issues found in the data it was trained on.[10] While this is also a risk with human beings, IAI use might facilitate inequitable outcomes due to insufficiently diverse or poorly collected training data.[11] Moreover, IAI has difficulties in recognizing its biases, as do users, further exacerbating the issue.[12][13] While IAI might conditionally improve quality of care for some, it might also have the effect of decreasing trust by or contributing to inequity in medicine, which is negatively correlated with trust. Crucially, low trust leads to non-disclosure (and worsens the patient experience), which worsens the quality of care and therefore outcomes.[14] Some might argue that our healthcare system is already inequitable, and that IAI implementation will broadly speaking improve outcomes, so to avoid going on a tangent on the goals of our healthcare system, I will say only that we should reflect on implementing technology that might worsen outcome disparity. Regardless, even when sufficiently “good” data is available for LLM use, IAI is intrinsically not conducive to building trust.
There is already a great deal of mistrust in medicine today, some of which is due to low health literacy among patients and their families. While this has been historically unrelated to AI, IAI implementation might worsen health literacy due to the additional layer of abstraction. Up until this point, providers have been trying to explain the explainable, but they cannot discuss opaque reasoning processes in the same manner due to a fundamental lack of transparency.
Something is lost in the patient-provider relationship when the provider cannot leverage their expertise towards patient care. We value providers’ opinions because they have expertise that they wield for our benefit, but their credibility is diminished when a provider’s expertise is not the source of information. A provider’s expertise comes from verifiable knowledge, in that their medical education is a credential commensurate with trustworthiness. The credentialing mechanisms that we put in place, such as licensure for professionals, agency approval for interventions, etc. were created (in part) for this purpose.
Another key component of credibility is accountability, in that providers face consequences when they fail to meet required standards. There are several compelling reasons for this to be so, in that it incentivizes a minimum level of performance and offers restitution for wronged parties, but it is unclear who is responsible for similar failures in AI use cases.[15][16] Without a system of accountability, patients may be less incentivized to use IAI-integrated healthcare services. Designing such an accountability system is outside the scope of this paper, but the simplest remedy to make IAI trustworthy to verify its claims with human oversight, which should have the effect of supplementing opaque reasoning with transparent reasoning methods.
There are some instances where full explainability might not be necessary in supplementary roles. For example, an IAI highlighting anomalies in diagnostic imaging might be helpful for radiologists in identifying tumors. Here IAI proponents might say that the process of finding abnormalities isn’t as important (though should be studied) because the output is verifiable. However, what ought providers to do when the IAI and radiologists are not in agreement on the diagnosis, or if the tumor is too small to determine malignancy? There are pros and cons to this type of IAI supplementation, but I would characterize this overall as a case of IAI outputs being verified with human supervision, i.e. oversight.
In conclusion, interpretable AI requires human oversight for actionability in healthcare settings due to the need for concrete reasoning to justify consequential decisions. While IAI may supplement traditional medical practices, it cannot be used on its own due to a lack of transparency, which makes it untrustworthy and incredible. IAI therefore is impractical in some healthcare applications, though may be implemented conditionally so long as steps are taken to ensure proper oversight and accountability. To responsibly integrate AI in healthcare, we must engage with stakeholders (industry, academia, and the public) to construct regulatory frameworks and develop transparency standards to minimize potential harms of IAI integration .
All opinions are those of the author and do not reflect the position of the IU Center for Bioethics, the IU School of Medicine, or the Indiana CTSI.
[1] Zhang, Weng, and Lund, “Applications of Explainable Artificial Intelligence in Diagnosis and Surgery.”
[2] Alowais et al., “Revolutionizing Healthcare.”
[3] Quinn et al., “The Three Ghosts of Medical AI.”
[4] Miller, “Explanation in Artificial Intelligence.”
[5] Mittelstadt, Russell, and Wachter, “Explaining Explanations in AI.”
[6] Lamberti, “An Overview of Explainable and Interpretable AI.”
[7] IBM, “What Is AI Interpretability?”
[8] Jones et al., “A Causal Perspective on Dataset Bias in Machine Learning for Medical Imaging.”
[9] Hort et al., “Bias Mitigation for Machine Learning Classifiers.”
[10] González-Sendino, Serrano, and Bajo, “Mitigating Bias in Artificial Intelligence.”
[11] “The Limits of Fair Medical Imaging AI in Real-World Generalization | Nature Medicine.”
[12] Babic et al., “Beware Explanations from AI in Health Care.”
[13] Anderson and Anderson, “How Should AI Be Developed, Validated, and Implemented in Patient Care?”
[14] Nong et al., “Discrimination, Trust, and Withholding Information from Providers.”
[15] Procter, Tolmie, and Rouncefield, “Holding AI to Account.”
[16] Kiseleva, Kotzinos, and De Hert, “Transparency of AI in Healthcare as a Multilayered System of Accountabilities.”

Nicolas Oliver
Mr. Oliver is the Program Manager of the IU Center for Bioethics and and the Bioethics and Subject Advocacy Program.
Leave a Reply