Limitations of AI Chatbots as Health Information Providers
Many people today use artificial intelligence (AI) chatbots as a source of health information. However, it is advisable not to blindly trust the answers provided.
A recently published study paints a bleak picture of whether AI chatbots can actually provide medical advice to the general public.
The experiment found that chatbots perform no better than search engines such as Google—which has long been considered a limited source of health information—in helping users arrive at accurate diagnoses or determine appropriate next steps.
Furthermore, this AI technology carries its own risks, as it sometimes presents incorrect information or provides recommendations that change drastically due to minor differences in how questions are framed.
The study is described as the first randomised study specifically designed to test AI chatbots’ ability to provide medical advice to the public.
In the three years since AI chatbots became publicly available, health questions have become one of the most common topics users ask them about.
Some doctors report frequently encountering patients who have consulted AI to obtain a second opinion.
A survey found that approximately one in six adults use chatbots to seek health information at least once a month.
Major AI technology companies, including Amazon and OpenAI, have launched products specifically designed to answer health questions from users.
However, Adam Mahdi, a professor at the Oxford Internet Institute and senior author of the latest study published in Nature Medicine, argues that the neat and straightforward medical questions tested in the research are unlikely to represent an accurate picture of how well the technology performs when used by patients in real-world scenarios.
“Medicine is not like that. Medicine is messy, incomplete, and stochastic,” Mahdi said.
Researchers asked participants to chat with an AI bot to determine appropriate next steps, such as whether to call an ambulance or treat themselves at home. They tested commercially available chatbots such as ChatGPT from OpenAI and Llama from Meta.