FDA medical device loophole could cause patient harm, study warns

Doctors and researchers from the University of Maryland School of Medicine, the UMD Institute for Health Computing and the VA Maryland Healthcare System are concerned that large language models summarizing clinical data could meet the U.S. Food and Drug Administration’s device-exemption criteria and could cause patient harm.

WHY IT MATTERS

Artificial intelligence that summarizes clinical notes, medications and other patient data without FDA oversight will soon reach patients, doctors and researchers said in a new viewpoint published Monday on the JAMA Network.

While the FDA has interpreted clinical decision support software involved in “time-critical” decision-making as a regulated device function – which the authors said could possibly include LLM generation of a clinical summary – they analyzed FDA’s final guidance on clinical decision support software.

Published about two months before ChatGPT’s release, the researchers said the guidance “provides an unintentional ‘roadmap’ for how LLMs could avoid FDA regulation.”

Generative AI will change everyday clinical tasks and has earned a great deal of attention for its promise to reduce physician and nurse burnout and improve healthcare operational efficiencies, but LLMs that summarize clinical notes, medications and other forms of patient data “could exert important and unpredictable effects on clinician decision-making,” the researchers said.

They conducted tests using ChatGPT and anonymized patient record data, and examined the summarization outputs, concluding, that results beg questions that go beyond “accuracy.”

“In the clinical context, sycophantic summaries could accentuate or otherwise emphasize facts that comport with clinicians’ preexisting suspicions, risking a confirmation bias that could increase diagnostic error,” they said.

“For example, when prompted to summarize previous admissions for a hypothetical patient, summaries varied in clinically meaningful ways, depending on whether there was concern for myocardial infarction or pneumonia.”

Lead author Katherine Goodman, a legal expert with the UMD School of Medicine Department of Epidemiology and Public Health, studies clinical algorithms and laws and regulations to understand adverse patient effects.

She and her research team said that they found LLM-generated summaries to be highly variable, and while they may be developed to avoid full-blown hallucinations, they could include small errors with important clinical influence.

In one example from their study, a chest radiography report noted “indications of chills and nonproductive cough,” but the LLM summary added “fever.”

“Including ‘fever,’ although a [one-word] mistake, completes an illness script that could lead a physician toward a pneumonia diagnosis and initiation of antibiotics when they might not have reached that conclusion otherwise,” they said.

However, it’s a dystopian danger that generally arises “when LLMs tailor responses to perceived user expectations” and become virtual AI yes-men to clinicians.

“Like the behavior of an eager personal assistant.”

THE LARGER TREND

Others have said that the FDA regulatory framework around AI as medical devices could be curtailing innovation.

During a discussion of the practical application of AI in the medical device industry in London in December, Tim Murdoch, business development lead for digital products at the Cambridge Design Partnership, was critical that FDA regulations would cut out genAI innovation.

“The FDA allows AI as a medical device,” he said, according to a story by the Medical Device Network.

“They are still focused on locking the algorithm down. It is not a continuous learning exercise.”

One year ago, the CDS Coalition asked the FDA to rescind its clinical decision support guidance and better balance regulatory oversight with the healthcare sector’s need for innovation.

The coalition suggested that in the final guidance, the FDA compromised its ability to enforce the law, in a situation it said would lead to public health harm.

ON THE RECORD

“Large language models summarizing clinical data promise powerful opportunities to streamline information-gathering from the EHR,” the researchers acknowledged in their report. “But by dealing in language, they also bring unique risks that are not clearly covered by existing FDA regulatory safeguards.”

“As summarization tools speed closer to clinical practice, transparent development of standards for LLM-generated clinical summaries, paired with pragmatic clinical studies, will be critical to the safe and prudent rollout of these technologies.”

Andrea Fox is senior editor of Healthcare IT News.
Email: afox@himss.org
Healthcare IT News is a HIMSS Media publication.

Source link

Post Views: 105