People struggle to tell AI from doctors-and often trust it more
A new paper in NEJM AI shows that many people can't tell AI-written medical advice from a physician's-and often rate the AI as more trustworthy, even when it's wrong. For healthcare leaders, that gap between fluency and factual accuracy is a patient safety problem waiting to happen.
What the researchers did
Researchers gathered 150 real, anonymized questions and answers from HealthTap across six clinical areas. They generated parallel answers with GPT-3, then had four independent physicians rate the AI outputs as high or low accuracy.
From this, they built a balanced set: 30 physician answers, 30 high-accuracy AI answers, and 30 low-accuracy AI answers. Three controlled online experiments with 300 adults tested source detection, perceived quality, and the impact of labeling.
What people saw and believed
Participants could identify AI vs. doctor authorship only about half the time-chance level. When sources were hidden, AI answers were rated clearer and more persuasive.
High-accuracy AI scored highest on validity, trustworthiness, and completeness. Even low-accuracy AI, despite factual errors, was rated nearly on par with physicians.
Labels nudge trust
When answers were labeled as "doctor," "AI," or "doctor assisted by AI," the "doctor" label boosted trust in high-accuracy AI responses but didn't rescue low-accuracy ones. Authority cues and confident tone shaped perception more than factual precision.
Even experts show bias
Physicians evaluating the same content rated AI and doctor answers similarly when blinded to the source. Once labels were visible, they judged AI as less accurate and complete-evidence of label-driven bias even among experts.
Why this matters for care
Confident, well-structured language can mask errors. The study found that people who trusted low-accuracy AI were likely to follow it-even when it risked harm or unnecessary visits. If patients or staff equate fluency with expertise, error rates can rise quietly.
Practical guardrails for healthcare teams
- Keep a clinician in the loop: Require human review for any patient-facing medical information.
- Use clear labeling and uncertainty cues: State when AI is used, include confidence qualifiers, and show references where possible.
- Restrict to low-risk use cases first: Education drafts, visit summaries, and non-urgent triage suggestions; avoid autonomous clinical decisions.
- Force grounding: Use retrieval with citations from institution-approved guidelines and formularies.
- Calibrate tone: Configure prompts to avoid overconfident phrasing; prefer conditional language and next-step options.
- Escalation rules: Auto-flag red-flag symptoms, pediatrics, oncology, and med-med interactions for immediate clinician review.
- Audit and feedback: Log AI use, sample outputs weekly, and run safety drills for near-miss cases.
- Patient-facing disclaimers: Short, readable statements that AI content is informational and not a diagnosis.
- Train your staff: Provide ongoing education on prompt design, verification, and bias. If you need structured programs, see courses by job.
Where AI can help-safely
- Drafting patient education for clinician edit and approval.
- Summarizing long charts or inbox threads for faster review.
- Routing non-urgent messages and preparing response templates.
- Creating after-visit summaries and instruction checklists.
Bottom line
Clarity is not correctness. AI can assist with communication and workflow, but trust in medicine should be earned through human judgment and verified sources-not confident prose from a model.
Your membership also unlocks: