Study Reveals a Diagnosis Gap: Why AI Chatbots Shouldn't Be Your Doctor

A study published in Nature Medicine delivers a sobering assessment of using AI chatbots for medical self-diagnosis. While these tools can pass medical exams on paper, they failed to correctly identify medical conditions in over 65% of real-world cases examined.

The research involved 1,298 UK participants who sought advice from models like ChatGPT or Meta's Llama 3. The AI provided an accurate initial diagnosis less than 34.5% of the time. Even after conversation, performance did not reliably improve; in some instances, adding more patient details led the AI to introduce new, incorrect information. Correct follow-up recommendations were given only 44.2% of the time.

This gap exists despite the models' known capability to produce clinical text rated as equivalent to a doctor's notes. The core issue, researchers found, is conversational. Users frequently provided incomplete symptom details in their initial questions, and the AI systems struggled to navigate these imperfect, real-world interactions.

The findings are particularly relevant as public adoption grows. An OpenAI survey indicates 60% of U.S. adults have used AI for health-related inquiries, from interpreting symptoms to understanding doctor's instructions. While services carry disclaimers, the study underscores that for serious health concerns, these tools are not substitutes for professional medical guidance.

Source: CNET