A practical look at a RAG-based AI teaching assistant in medical education
Medical programs are scaling faster than individualized support can keep up. Students study after hours, prefer quick answers, and many already use chatbots. The problem: open models can hallucinate, which makes them risky for clinical content.
A retrieval-augmented generation (RAG) assistant grounds answers in instructor-curated materials. That constraint increases accuracy and keeps responses aligned with the course. This study deployed a RAG-based "NeuroBot TA" in a preclerkship neuroscience course across two cohorts to see how students actually used it-and what educators should do next.
What is RAG? It fetches relevant course content first, then lets the model respond within that context. It's a simple way to reduce off-target answers while retaining a conversational interface.
What happened in the course
Usage patterns
Students started 360 conversations and sent 2,946 messages. Conversations were short and focused (mean 3.6 turns). Activity spiked mid-week and after 5 pm, when faculty access is usually limited.
Three days before exams, conversation volume increased by about 330%. Cohort 1 used the bot more consistently; cohort 2 usage dropped overall, likely because students had adopted other AI tools by then.
What students asked
Most queries stuck to the curriculum. Neuroanatomy and physiology dominated (66%), followed by clinical syndromes (54%). Students also asked about study resources and course details (about 28-31%). Complex clinical reasoning and imaging questions were less frequent.
How students felt
About 31% tried the bot. Usefulness ratings were modest (2.8/5 and 3.3/5). Students liked source-grounded answers and availability. They were frustrated when the bot refused questions outside the knowledge base. A few noted the answers could be long and prompt quality mattered.
The tension was clear: constraint builds trust, but it also limits breadth.
The trade-off you must manage
Two forces pull against each other: accuracy and comprehensiveness. RAG boosts reliability but narrows what the bot can answer. Students appreciate trust, but they also want breadth-especially near exams. Be explicit about these boundaries and set expectations upfront.
Adoption followed the Technology Acceptance Model: students used the tool when it felt useful (pre-exam, after-hours) and easy to access. You can boost both through design and messaging. Learn more about the model here: Technology Acceptance Model.
Implementation playbook for educators
- Start early. Introduce the assistant in the first courses, before study habits calcify. Position it as a supplemental tool.
- Curate for relevance. Include lecture slides, prework, objectives, study tips, exam policies, and logistics. Students asked these questions often.
- Ground and cite. Force source citations and link directly to the slide or document section so students can verify and review.
- Set refusal behavior. If content isn't in the knowledge base, the bot should say so plainly and suggest where to look next.
- Provide prompt templates. Offer quick-starters (e.g., "Explain slide X in simple terms," "Contrast A vs B from Lecture Y," "Locate objectives related to Z").
- Measure and iterate. Track usage peaks, refusal rates, topic mix, and student ratings. Expand the corpus based on unmet demand.
- Align with assessment cadence. Expect usage surges before exams. Seed targeted summaries, FAQs, and objectives ahead of those dates.
- Plan a hybrid mode. Allow the bot to answer beyond the corpus with clear labels: "Grounded" vs "General knowledge (verify)." Include a quick accuracy disclaimer for the latter.
- Accuracy spot checks. Faculty review a small random sample weekly. Fix errors at the source; document changes.
- Onboarding and norms. Teach appropriate use cases, verification habits, and course policies for AI use.
Prompt patterns to teach students
- "Explain slide [number] from [lecture]. Summarize in 5 bullet points and cite the slide."
- "Compare [concept A] vs [concept B] using only course materials. Keep it concise and cite."
- "Point me to the learning objectives related to [topic], with links to the source."
- "I'm confused about [pathway]. Give a step-by-step description grounded in the notes."
What to track (and why)
- Daily/hourly usage: Confirms access patterns and after-hours demand.
- Pre-exam surges: Indicates perceived usefulness when stakes are high.
- Refusal rate: Signals where the corpus is thin or expectations aren't aligned.
- Topic distribution: Guides what to add to the knowledge base.
- Accuracy checks: Guardrail against drift and outdated material.
- Student ratings and comments: Validate trust, usefulness, and friction points.
Common pitfalls and how to fix them
- "It can't answer my question." Expand the corpus; add a hybrid mode with clear labels and verification tips.
- Overly long answers. Add a "concise mode" instruction to the system prompt; teach students to request brevity.
- Low second-cohort usage. Students may prefer general chatbots. Highlight the benefits of source-grounded answers for exams and ensure the assistant can still support broader questions with labels.
- Prompt sensitivity. Provide templates, quick examples, and in-class demos. Short workshops help.
Where this is going
Two improvements stand out. First, hybrid grounding: clearly mark grounded vs general responses so students can judge trust at a glance. Second, more active learning: add a Socratic mode for regular study and a direct-answer mode before exams.
For deeper synthesis across topics, consider a simple knowledge graph that maps concept relationships and links back to course sources. This keeps accuracy high while supporting cross-topic understanding.
Quick start checklist
- Pick a platform that supports RAG with source citations.
- Build a clean, tagged corpus of course materials.
- Write a system prompt that enforces grounding, brevity on request, and polite refusals.
- Launch with a 15-minute demo and prompt templates.
- Monitor usage, refusals, and accuracy weekly; expand content where students struggle.
- Time targeted content drops a few days before assessments.
Upskilling your team
If your faculty or students need fast, practical AI training-especially for prompt design and classroom use-consider these resources from Complete AI Training: Prompt engineering guides and courses.
Your membership also unlocks: