In medicine, the cautionary tales about the unintended consequences of artificial intelligence are already legendary.
There was a program designed to predict when patients would develop sepsis, a fatal bloodstream infection, which triggered a litany of false positives. Another, intended to improve follow-up care for the sickest patients, appeared to deepen worrying health disparities.
Fearful of such shortcomings, doctors have left AI to work on the sidelines: as an assistant scribe, an occasional second opinion, and a back-office organizer. But the field has gained investment and momentum for applications in medicine and beyond.
AI is a hot topic within the Food and Drug Administration, which plays a key role in approving new medical devices. It helps discover new drugs. It could pinpoint unexpected side effects. And it’s even being discussed as a way to help employees who are overwhelmed with repetitive, routine tasks.
But the FDA’s role has been sharply criticized in one crucial respect: how carefully it reviews and describes the programs it approves to help doctors detect everything from tumors to blood clots to collapsed lungs.
“We will have many options. It’s exciting,” said Dr. Jesse Ehrenfeld, president of the American Medical Association, a leading lobbying group for doctors, in an interview. “But if physicians are going to integrate these things into their workflows, if they’re going to pay for them, and if they’re going to use them, we have to have some confidence that these tools will work.”
President Biden issued an executive order on Monday calling for regulations across a wide range of agencies to try to address the security and privacy risks of AI, including in healthcare. The regulation calls for more funding for AI research in medicine as well as a safety program to collect reports of harm or unsafe practices. A meeting with world leaders will take place later this week to discuss the issue.
At an event on Monday, Mr. Biden said it was important to oversee AI development and security and build systems that people can trust.
“To protect patients, for example, we will use AI to develop cancer drugs that work better and cost less,” Mr. Biden said. “We will also launch a security program to ensure AI healthcare systems do no harm.”
No single US agency governs the entire landscape. Senator Chuck Schumer, Democrat of New York and majority leader, summoned technology managers to Capitol Hill in September to discuss ways to advance the field and also identify pitfalls.
Google has already caught the attention of Congress with its pilot of a new chatbot for healthcare workers. It’s called Med-PaLM 2 and is intended to answer medical questions, but has raised concerns about patient privacy and informed consent.
The way the FDA will monitor such “large language models,” or programs that mimic expert advisors, is just one area where the agency is lagging behind rapidly evolving advances in AI. Agency officials have only begun to talk about testing a technology that continues to “learn” as it processes thousands of diagnostic scans. And the agency’s existing rules encourage developers to focus on one problem at a time – such as a heart murmur or a brain aneurysm – a contrast to the AI tools used in Europe, which look for a range of problems.
The agency’s scope is limited to products that are approved for sale. It has no authority over programs that health systems create and use internally. Large health systems like Stanford, Mayo Clinic and Duke — as well as health insurers — can develop their own AI tools that impact care and insurance decisions for thousands of patients without direct government oversight.
Still, doctors are raising more questions as they try to use the roughly 350 FDA-approved software tools to help detect blood clots, tumors or a hole in the lung. They found few answers to basic questions: How was the program structured? How many people was it tested on? Is this likely to detect something that a typical doctor would miss?
The lack of publicly available information, perhaps paradoxical in a world full of data, leads doctors to be cautious, fearful that an exciting-sounding technology may lead patients down the path to more biopsies, higher medical bills and toxic medications without the to significantly improve care.
Dr. Eric Topol, author of a book about AI in medicine, is a near-unwavering optimist about the technology’s potential. But he said the FDA made a mistake by allowing AI developers to keep their “secret sauce” secret and failing to require careful studies to assess meaningful benefit.
“You need really compelling, great data to change the practice of medicine and give confidence that this is the right way to go,” said Dr. Topol, executive vice president of Scripps Research in San Diego. Instead, he added, the FDA has allowed “shortcuts.”
Large studies are starting to tell more of the story: One found the benefits of using AI to detect breast cancer and another pointed out flaws in a skin cancer detection app, Dr. Topol.
Dr. Jeffrey Shuren, head of the FDA’s medical device division, has recognized the need for continued efforts to ensure that AI programs deliver on their promises after his department approves them. While drugs and some devices are tested on patients before approval, AI software programs typically do not require this.
A new approach could be to build labs where developers could access massive amounts of data and create or test AI programs, said Dr. Shuren during the National Organization for Rare Disorders conference on October 16.
“If we really want to ensure this right balance, we need to change federal law because the framework we use for these technologies is almost 50 years old,” said Dr. Shuren. “It really wasn’t designed for AI.”
Other forces are complicating efforts to adapt machine learning to large hospital and healthcare networks. Software systems do not communicate with each other. No one agrees on who should pay for it.
By one estimate, about 30 percent of radiologists (a field where AI is widely used) use AI technology. Simple tools that could make an image sharper are easy to sell. But higher-risk risks, like choosing whose brain scans should get priority, worry doctors when they don’t know, for example, whether the program is designed to detect the illnesses of a 19-year-old or a 90-year-old. old.
Dr. Aware of these shortcomings, Nina Kottler is leading a multi-year, multi-million dollar project to review AI programs. She is chief medical officer of clinical AI at Radiology Partners, a Los Angeles-based practice that evaluates approximately 50 million scans annually for approximately 3,200 hospitals, freestanding emergency rooms and imaging centers across the United States.
She knew getting into AI would be tricky given the practice’s 3,600 radiologists. Finally, Geoffrey Hinton, known as the “Godfather of AI,” caused an uproar in the industry in 2016 when he predicted that machine learning would completely replace radiologists.
Dr. Kottler said she began evaluating approved AI programs by interviewing their developers and then testing some to see which programs missed relatively obvious problems or identified subtle ones.
She rejected an approved program that didn’t detect lung abnormalities beyond the cases her radiologists found — and missed some obvious ones.
Another program that scanned images of the head for aneurysms, a potentially life-threatening condition, proved impressive, she said. Although many false positives were detected, approximately 24 percent more cases were detected than radiologists had identified. Other people with an apparent brain aneurysm received follow-up treatments, including a 47-year-old with a bulging vessel in an unexpected corner of the brain.
At the end of a telemedicine appointment in August, Dr. Roy Fagan noted that he had difficulty speaking to the patient. Suspecting a stroke, he rushed to a hospital in rural North Carolina for a CT scan.
The image went to Greensboro Radiology, a Radiology Partners practice, where it triggered an alarm in an AI stroke triage program. A radiologist did not have to present cases to Dr. Fagan or click through more than 1,000 image cuts; The person who discovered the brain clot showed up immediately.
The radiologist had Dr. Fagan was transferred to a larger hospital where the clot could be quickly removed. He woke up feeling normal.
“It doesn’t always work that well,” said Dr. Sriyesh Krishnan of Greensboro Radiology, who is also director of innovation development at Radiology Partners. “But when it works so well, it changes the lives of these patients.”
Dr. Fagan planned to return to work the following Monday, but agreed to rest for a week. He was impressed by the AI program and said: “It’s a real step forward to have it here now.”
Radiology Partners has not published its findings in medical journals. However, some researchers have highlighted less inspiring examples of the impact of AI in medicine.
Researchers at the University of Michigan examined a widely used AI tool in an electronic health records system designed to predict which patients would develop sepsis. They found that the program triggered alarms in one in five patients – although only 12 percent later developed sepsis.
Another program that analyzed health care costs as an indicator for predicting medical needs ended up withholding treatment from black patients who were just as sick as whites. A study in the journal Science found that cost data proved to be a poor proxy for disease because less money is typically spent on black patients.
These programs have not been reviewed by the FDA. However, given the uncertainty, doctors have turned to the authorities’ approval documents for reassurance. They found little. A research team looking at AI programs for critically ill patients found that evidence of real-world use was “completely lacking” or based on computer models. The team from the University of Pennsylvania and the University of Southern California also found that some of the programs were approved because of their similarity to existing medical devices – including some that didn’t even use artificial intelligence.
Another study of programs approved by the FDA through 2021 found that of 118 AI tools, only one described the geographic and ethnic breakdown of the patients the program was trained on. The majority of programs were tested in 500 or fewer cases – not enough, the study concluded, to justify widespread use.
Dr. Keith Dreyer, study author and chief data science officer at Massachusetts General Hospital, is currently leading a project at the American College of Radiology to close the information gap. With the help of AI vendors who have been willing to share information, he and his colleagues plan to release an update on the agency’s approved programs.
This allows physicians, for example, to see how many pediatric cases a program is expected to detect to inform them of blind spots that could potentially impact care.
James McKinney, an FDA spokesman, said the agency’s staff reviews thousands of pages before releasing AI programs, but acknowledged that software makers may write the publicly released summaries. These are not “intended to make purchasing decisions,” he said, adding that more detailed information is provided on product labels that are not readily available to the public.
Getting AI surveillance right in medicine is critical, a task that involves multiple agencies, Dr. Ehrenfeld, the president of the AMA. He said doctors have been studying the role of AI in fatal plane crashes to warn about the dangers of automated safety systems that override a pilot’s – or a doctor’s – judgment.
He said the investigation into the 737 Max plane crash showed that pilots were not trained to override a safety system that contributed to the fatal collisions. He fears that doctors may encounter similar use of AI in the background of patient care, which could prove harmful.
“Just understanding that AI is there should be an obvious starting point,” said Dr. Ehrenfeld. “But it’s not clear that that will always happen unless we have the right regulatory framework.”