DIREC project

Explain me

- Learning to Collaborate via Explainable AI in Medical Education

Summary

In the Western world, approximately one in ten medical diagnoses is estimated to be incorrect, which results in the patients not getting the right treatment. The explanation may be lack of experience and training on the part of the medical staff.

Together with clinicians, this project aims to develop explanatory AI that can help medical staff make qualified decisions by taking the role as a mentor who provides feedback and advice for the clinicians. It is important that the explainable AI provides good explanations that are easy to understand and utilize during the medical staff’s workﬂow.

Project period: 2021-2025
Budget: DKK 28,44 million

More info

AI is widely deployed in assistive medical technologies, such as image-based diagnosis, to solve highly speciﬁc tasks with feasible model optimization. However, AI is rarely designed as a collaborator for the healthcare professionals, but rather as a mechanical substitute for part of a diagnostic workﬂow. From the AI researcher’s point of view, the goal of development is to beat state-of-the-art on narrow performance parameters, which the AI may solve with superhuman accuracy.

However, for more general problems such as full diagnosis, treatment execution, or explaining the background for a diagnosis, the AI is still not to be trusted. Hence, clinicians do not always perceive AI solutions as helpful in solving their clinical tasks, as they only solve part of the problem suﬃciently well. The EXPLAIN-ME initiative seeks to create AIs that help solve the overall general tasks in collaboration with the human health care professional.

To do so, we need not only to provide interpretability in the form of explainable AI models — we need to provide models whose explanations are easy to understand and utilize during the clinician’s workﬂow. Put simply, we need to provide good explanations.

Unmet technical needs
It is not hard to agree that good explanations are better than bad explanations. In this project, however, we aim to establish methods and collect data that allow us to train and validate the quality of clinical AI explanations in terms of how understandable and useful they are.

AI support should neither distract nor hinder ongoing tasks, giving ﬂuctuating need for AI support, e.g. throughout a surgical procedure. As such, the relevance and utility of AI explanations are highly context- and task-dependent. Through collaboration with Zealand University Hospital we will develop explainable AI (XAI) feedback for human-AI collaboration in static clinical procedures, where data is collected and analyzed independently — e.g. when diagnosing cancer from scans collected beforehand in a different unit.

In collaboration with CAMES and NordSim, we will implement human-AI collaboration in simulation centers used to train clinicians in dynamic clinical procedures, where data is collected on the ﬂy — e.g. for ultrasound scanning of pregnant women, or robotic surgery. We will monitor the clinicians’ behavior and performance as a function of feedback provided by the AI. As there are no actual patients involved in medical simulation, we are also free to provide clinicians with potentially bad explanations, and we may use the clinicians’ responses to freely train and evaluate the AI’s ability to explain.

Unmet clinical needs
In the Western World, medical errors are only exceeded by cancer and heart diseases in the number of fatalities caused. About one in ten diagnoses is estimated to be wrong, resulting in inadequate and even harmful care. Errors occur during clinical practice for several reasons, but most importantly, because clinicians often work alone with minimal expert supervision and support. The EXPLAIN-ME initiative aims to create AI decision support systems that take the role of an experienced mentor providing advice and feedback.

This initiative seeks to optimize the utility of feedback provided by healthcare explainable AI (XAI). We will approach this problem both in static healthcare applications, where clinical decisions are based on data already collected, and in dynamic applications, where data is collected on the ﬂy to continually improve conﬁdence in the clinical decision. Via an interdisciplinary effort between XAI, medical simulation, participatory design and HCI, we aim to optimize the explanations provided by the XAI to be of maximal utility for clinicians, supporting technology utility and acceptance in the clinic.

Case 1: Renal tumor classiﬁcation
Classiﬁcation of a renal tumor as malign or benign is an example of a decision that needs to be taken under time pressure. If malign, the patient should be operated immediately to prevent cancer from spreading to the rest of the body, and thus a false positive diagnosis may lead to the unnecessary destruction of a kidney and other complications. While AI methods can be shown statistically to be more precise than an expert physician, there is a need for extending it with explanation for a decision– and only the physicians know what “a good explanation” is. This motivates a collaborative design and development process to ﬁnd the best balance between what is technically possible and what is clinically needed.

Case 2: Ultrasound Screening
Even before birth, patients suffer from erroneous decisions made by healthcare workers. In Denmark, 95% of all pregnant women participate in the national ultrasound screening program aimed at detecting severe maternal-fetal disease. Correct diagnosis is directly linked to the skills of the clinicians, and only about half of all serious conditions are detected before birth. AI feedback, therefore, comes with the potential to standardize care across clinicians and hospitals. At DTU, KU and CAMES, ultrasound imaging will be the main case for development, as data access and management, as well as manual annotations, are already in place. We seek to give the clinician feedback during scanning, such as whether the current image is a standard ultrasound plane (see ﬁgure); whether it has suﬃcient quality; whether the image can be used to predict clinical outcomes, or how to move the probe to improve image quality.

Case 3: Robotic Surgery
AAU and NordSim will collaborate on the assessment and development of robotic surgeons’ skills, associated with an existing clinical PhD project. Robotic surgery allows surgeons to do their work with more precision and control than traditional surgical tools, thereby reducing errors and increasing eﬃciency. AI-based decision support is expected to have a further positive effect on outcomes. The usability of AI decision support is critical, and this project will study temporal aspects of the human-AI collaboration, such as how to present AI suggestions in a timely manner without interrupting the clinician; how to hand over tasks between a member of the medical team and an AI system; and how to handle disagreement between the medical expert and the AI system.

In current healthcare AI research and development, there is often a gap between the needs of clinicians and the developed solutions. This comes with a lost opportunity for added value: We miss out on potential clinical value for creating standardized, high quality care across demographic groups. Just as importantly, we miss out on added business value: If the ﬁrst, research-based step in the development food chain is unsuccessful, then there will also be fewer spin-offs and start-ups, less knowledge dissemination to industry, and overall less innovation in healthcare AI.

The EXPLAIN-ME initiative will address this problem:

We will improve clinical interpretability of healthcare AI by developing XAI methods and workﬂows that allow us to optimize XAI feedback for clinical utility, measured both on clinical performance and clinical outcomes.
We will improve clinical technology acceptance by introducing these XAI models in clinical training via simulation-laboratories.
We will improve business value by creating a prototype for collaborative, simulation-based deployment of healthcare AI. This comes with great potential for speeding up industrial development of healthcare AI: Simulation-based testing of algorithms can begin while algorithms still make mistakes, because there is no risk of harming patients. This, in particular, can speed up the timeline from idea to clinical implementation, as the simulation-based testing is realistic while not requiring the usual ethical approvals.

This comes with great potential value: While AI has transformed many aspects of society, its impact on the healthcare sector is so far limited. Diagnostic AI is a key topic in healthcare research, but only marginally deployed in clinical care. This is partly explained by the low interpretability of state-of-the-art AI, which negatively affects both patient safety and clinicians’ technology acceptance. This is also explained by the typical workﬂow in healthcare AI research and development, which is often structured as parallel tracks where AI researchers independently develop technical solutions to a predeﬁned clinical problem, while only occasionally interacting with the clinical end-users.

This often results in a gap between the clinicians’ needs and the developed solution. The EXPLAIN-ME initiative aims to close this gap by developing AI solutions that are designed to interact with clinicians in every step of the design-, training-, and implementation process.