A MARTÍNEZ, HOST:
Hundreds of millions of people are turning to chatbots for advice on health and wellness these days. Now, that's according to OpenAI, the maker of ChatGPT. But several recent studies published in the journal Nature Medicine suggest AI medical advice frequently leads people astray. NPR's Katia Riddle is here to explain. So a lot of people are incorporating these AI tools into their decisions around health. You're saying, though, Katia, that consulting AI chatbots doesn't help people correctly identify medical problems.
KATIA RIDDLE, BYLINE: Less than half the time - that was the finding in one of these studies. In another study, researchers proposed different medical scenarios to the bots, and they found that even when it did correctly identify the problem, it often did not express an appropriate amount of urgency in seeking help for potentially dangerous conditions.
MARTÍNEZ: All right. So that sounds problematic. What went wrong in these scenarios?
RIDDLE: I spoke with an author of one study, Andrew Bean. He studies AI systems at Oxford University. In his study, he and his colleagues tried to simulate the way people actually use AI by giving them scripts to discuss with the bots - hypothetical medical issues they were having. He talked about one scenario where two different users gave slightly different descriptions of the same scenario - describing a headache.
ANDREW BEAN: One of them said, it's the worst headache I've ever had, and that person was told, go to the ER immediately. Now, it turns out this was actually a life-threatening condition. And the other one was told, take aspirin. Stay home.
MARTÍNEZ: All right. So two different sets of directives there. So who's to blame for this communication failure - the person or the bot itself?
RIDDLE: Well, it's both. Humans aren't always the best reporters of their own symptoms. And in this scenario, AI was not sufficiently curious enough to ask questions to get at the information the human was not giving them.
MARTÍNEZ: All right. So does that mean, then, that AI isn't replacing medical professionals anytime soon?
RIDDLE: Well, it's complicated. In controlled studies, large language models have sometimes matched or even outperformed physicians on diagnostic reasoning tasks. But these two studies suggest that human doctors are still better at evaluating patients and also better at recommending next steps for treatment.
MARTÍNEZ: All right. So what does OpenAI say about all this?
RIDDLE: I did reach out to them. They point out that in one study, the version of ChatGPT that the researchers evaluated is outdated. They say they've course-corrected with newer versions. However, another study did look at the most recent version. In that case, the company argues that the methodology did not reflect how people typically use ChatGPT.
MARTÍNEZ: So should all of us just stop using AI as a cheap and accessible doctor?
RIDDLE: Not necessarily. First of all, AI is here to stay. Even if doctors wanted their patients to stop using it, there's no reason to think they would. AI can sometimes be very useful or even lifesaving. One person I talked to is Robert Wachter. He's a doctor and researcher at UC San Francisco. He just wrote a book on AI and medicine. ChatGPT isn't perfect, he says, but he argues that it is often better than the alternatives.
ROBERT WACHTER: I encourage patients to use these tools 'cause it's not easy to get in to see a doctor. And often the advice you get from the tools is substantially better than nothing and better than what you would get from your second cousin.
RIDDLE: Wachter says we're in an awkward new relationship stage with AI now. We're just getting to know each other. He thinks both humans and AI will figure out how, over time, to communicate better with each other.
MARTÍNEZ: So our relationship with AI is still complicated.
RIDDLE: That's right.
MARTÍNEZ: That's NPR's Katia Riddle. Thanks a lot.
RIDDLE: Thank you.
(SOUNDBITE OF PLEIJ'S "INSIDEOUT") Transcript provided by NPR, Copyright NPR.
NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.