Evaluating AI-Generated Patient Education Guides: A Comparative Study of ChatGPT and DeepSeek
Abstract
Artificial intelligence chatbots like ChatGPT and DeepSeek are gaining popularity as tools to create patient education materials for chronic diseases. While these AI tools provide useful supplements to traditional counseling, they lack the empathy and intuition of healthcare professionals. Their greatest value is realized when used alongside human therapists. This study compares patient education guides generated by ChatGPT-4o and DeepSeek V3 for epilepsy, heart failure, chronic obstructive pulmonary disease (COPD), and chronic kidney disease (CKD).
Introduction
Chronic illnesses such as epilepsy, CKD, COPD, and heart failure affect millions globally and require ongoing patient education to manage symptoms and treatment effectively. Proper education enables patients to recognize complications early, adhere to medication, and adopt necessary lifestyle changes, reducing morbidity and mortality.
AI tools can deliver immediate, customized, evidence-based information, enhancing patient engagement and easing healthcare workers’ workloads by handling common queries. However, AI lacks empathy, may produce inaccurate information, and often struggles with complex questions. Trust and privacy concerns also limit patient reliance on AI.
ChatGPT, developed by OpenAI, uses machine learning to generate human-like text for queries and conversations. DeepSeek is an open-source AI platform offering continuous learning, offline deployment for privacy, and customization for healthcare, but it raises concerns about data security and accuracy.
AI chatbots assist with tasks like medication reminders and basic health information, supporting healthcare professionals but not replacing the human touch necessary for effective counseling. They provide cost-effective, non-judgmental support and contribute to more efficient healthcare delivery.
While ChatGPT’s role in generating patient education materials has been studied, DeepSeek's capabilities in this area remain less explored. This study aims to compare the readability, quality, similarity, and suitability of patient education guides produced by both AI tools for the four chronic conditions mentioned.
Materials & Methods
Study Design
This cross-sectional study evaluated the readability, quality, and similarity of AI-generated patient education guides for epilepsy, heart failure, COPD, and CKD. Conducted over one week in April 2025, it used outputs from ChatGPT (GPT-4o) and DeepSeek (DeepSeek-V3). No human participants were involved, so ethics approval was not necessary. Data collection and analysis were performed online using digital tools.
Data Collection
Using a consistent setup, identical prompts were fed into fresh accounts of both AI tools on the same day. The prompts included:
- "Write a patient education guide for epilepsy"
- "Write a patient education guide for heart failure"
- "Write a patient education guide for chronic obstructive pulmonary"
- "Write a patient education guide for chronic kidney disease"
The AI-generated guides were saved in Word documents for evaluation.
Readability Assessment
Readability was measured using the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Higher FRES indicates easier text; FKGL shows the US school grade level needed to comprehend the material. These metrics are standard in healthcare to ensure materials are patient-friendly.
Evaluation of Similarity
Turnitin software checked the guides for textual similarity to existing sources, indicating originality. The Overall Similarity Index (OSI) reflects the percentage of text matching other content without quotes or references.
Evaluation of Quality
The DISCERN instrument assessed the quality and reliability of the information, rating 16 aspects on a scale of 1-5. Two independent reviewers scored the guides, resolving disagreements by consensus.
Suitability Assessment
Suitability was measured using the Patient Education Materials Assessment Tool (PEMAT), which assesses understandability and actionability. Scores range from 0 to 100%, with higher scores indicating better suitability. Two observers independently rated the guides and agreed on final scores.
Statistical Analysis
Data were analyzed using SPSS software. Cohen’s Kappa assessed inter-observer agreement, and unpaired T-tests compared AI tools, with significance set at p < 0.05.
Results
Table 1 summarizes the key characteristics of the patient education guides generated by ChatGPT and DeepSeek. Both tools produced similar content in most areas, including word count, sentence structure, readability, quality, and suitability. The only statistically significant difference was in similarity scores, where DeepSeek showed lower overlap with existing sources.
| Parameters | ChatGPT | DeepSeek | P-value* |
|---|---|---|---|
| Words | 500.25 | 425.25 | 0.132 |
| Sentences | 68.75 | 75.75 | 0.323 |
| Average Words per Sentence | 7.43 | 5.65 | 0.117 |
| Average Syllables per Word | 1.95 | 1.9 | 0.363 |
| Grade Level | 10.33 | 9.03 | 0.051 |
| Ease Score | 34.35 | 40.38 | 0.18 |
| Similarity % | 46 | 32.5 | 0.049 |
| DISCERN | 47.63 | 46.75 | 0.829 |
| Understandability % | 81.81 | 80.15 | 0.789 |
| Actionability % | 50 | 65.63 | 0.194 |
*P-values <0.05 indicate statistical significance.
Both ChatGPT and DeepSeek produced patient education guides with comparable readability levels, close to a 9th-10th grade reading level. Their DISCERN scores showed similar quality and reliability. The understandability of the materials was high, with actionability scores somewhat lower but still acceptable.
DeepSeek's lower similarity percentage suggests it may generate more original content compared to ChatGPT, which could be an advantage in producing unique patient materials. However, both tools require human oversight to ensure accuracy and empathy in communication.
Conclusion
This study highlights that AI chatbots ChatGPT and DeepSeek are capable of generating patient education guides of similar quality and readability for chronic diseases. While DeepSeek may offer somewhat more original content, both tools fall short on the empathetic and intuitive aspects essential for effective patient education.
For educators and healthcare professionals, integrating AI-generated materials with human guidance can improve efficiency without compromising quality. AI tools serve best as assistants rather than replacements in patient education.
For those interested in expanding skills in AI applications in education and healthcare, exploring courses on Complete AI Training can provide valuable insights and practical knowledge.
Your membership also unlocks: