Enhancing Medical Journal Clubs with AI: A Proof of Concept Using Retrieval-Augmented Language Models

A RAG-enhanced LLM improved journal club engagement by summarizing articles and answering queries, aiding preparation and discussion quality. Challenges include prompt precision and occasional inaccuracies.

Categorized in: AI News Education
Published on: May 09, 2025
Enhancing Medical Journal Clubs with AI: A Proof of Concept Using Retrieval-Augmented Language Models

Transforming Education: Tackling the Two Sigma Problem with AI in Journal Clubs – A Proof of Concept

Abstract

Introduction
Journal clubs are key to medical education, encouraging critical thinking and evidence-based learning. Yet, their impact can be limited by uneven participation, dependence on faculty expertise, and the difficulty of complex research articles. Generative AI, especially Large Language Models (LLMs), offers a promising solution. However, general LLMs can produce inaccurate information, known as “hallucinations.” Retrieval-Augmented Generation (RAG) addresses this by combining AI-generated content with curated knowledge, improving accuracy and relevance. This study presents the development and early evaluation of a RAG-enhanced LLM to support journal club discussions.

Materials and Methods

A specialized LLM was deployed using Microsoft Azure’s GPT-4o. Journal club articles were embedded into a vector database using text-embedding-ada-002 (Version 2) for efficient retrieval. A user-friendly website was created for access. Using a design-based research approach, residents and faculty engaged with the LLM before and during journal club sessions. Feedback was collected through focus group discussions and questionnaires to assess engagement, usability, and impact.

Results

Thirteen residents and three faculty members participated. Half of the residents reported a positive experience, while the others had a neutral view, noting both benefits and limitations. Residents found the LLM helpful for summarizing articles, answering queries, and boosting engagement. Faculty observed improved discussion quality and preparation. Challenges included the need for precise prompts and occasional misleading answers.

Conclusion

This study demonstrates the promise of RAG-enhanced LLMs to improve journal club engagement and learning. Advances in AI and open-source models may reduce costs and increase access, making further research worthwhile.

Introduction

Journal clubs are central to ongoing medical education, promoting lifelong learning and critical appraisal. However, participation varies and research articles often contain complex methods or statistics that can be hard to grasp without support. These sessions usually depend on faculty or senior residents to guide discussions. When these leaders are unavailable or unable to adapt to learners’ needs, discussion quality can decline.

Benjamin Bloom’s Two Sigma Problem, identified in the 1980s, showed that students receiving one-on-one tutoring perform two standard deviations better than those in typical classrooms. This highlights the value of personalized learning but also the challenge, as individualized tutoring demands considerable time and resources—often scarce in medical education.

Generative AI, particularly LLMs, can generate human-like text and hold complex conversations. Trained on vast datasets, these models offer new educational possibilities. Yet, their use in practical teaching settings is limited. General-purpose LLMs are designed for broad tasks and may produce incorrect or misleading information (“hallucinations”) because they lack domain-specific precision.

Fine-tuned, domain-specific LLMs provide accurate and context-rich responses, mimicking personalized tutoring. However, training such models is resource-intensive and costly.

Retrieval-Augmented Generation (RAG) offers a cost-effective alternative by combining AI with targeted information retrieval from curated documents. This reduces hallucinations and improves response accuracy by grounding answers in verified sources like PDFs, text files, and spreadsheets. RAG is particularly useful in resource-limited environments, leveraging existing data without heavy computational demands. Despite its potential, RAG’s role in education needs more exploration.

This article introduces a custom-purpose LLM using RAG to aid journal club discussions, aiming to enhance personalized, accurate learning while managing costs.

Materials and Methods

Model Development

A specialized LLM was built to support journal club sessions using RAG on Microsoft Azure’s GPT-4o. Development occurred in Microsoft’s Chat Playground, allowing fine-tuning of parameters to balance response quality and accuracy. Settings included a maximum response length of 800 characters, a top-P value of 0.9, and a temperature of 0.1 to limit randomness and hallucinations.

To create a knowledge base, journal club articles and related documents were collected as PDFs. Faculty selected relevant materials. These files were embedded using text-embedding-ada-002 (Version 2) in chunks of 1024 tokens, capturing semantic meaning. Embeddings were stored in a vector database optimized for similarity searches, enabling the model to retrieve relevant content based on queries.

A dedicated website built on Google Sites provided user access. Three modules were developed covering topics such as periimplantitis, implant length comparisons, and sinus lift procedures. Article sources and citations were documented to ensure transparency.

Data Collection

The study applied a design-based research method involving residents and faculty from the Dentistry Section at Aga Khan University, Pakistan. Participants received 20-30 minutes of training to use the LLM effectively. Interaction with the model occurred before and during journal club sessions, totaling 2-3 hours per participant.

Data were gathered via focus group discussions and questionnaires focusing on usability, engagement, and educational impact.

Results

Six Prosthodontic residents, seven Operative Dentistry residents, and three faculty members participated. After the intervention, half of the residents reported a positive experience using the LLM for journal club preparation, while the rest were neutral, acknowledging both pros and cons.

Residents found the LLM useful for generating summaries, answering questions, and analyzing data, which improved their preparation and understanding. Faculty noted enhanced discussion quality and better session readiness. One resident commented that the LLM saved time by highlighting key paper areas on request.

Challenges included the need for precise prompts and occasional incomplete or misleading responses. Nevertheless, navigation and usage were generally smooth.

Overall, both learners and faculty saw benefits in knowledge and engagement compared to traditional journal clubs, suggesting further refinement and repeated use could strengthen outcomes.

Discussion

Using RAG-enhanced LLMs fits well with the flipped classroom approach, where learners prepare independently before engaging in active discussions. This model offers personalized support, helping learners bridge gaps and better grasp complex research. It may help address Bloom’s Two Sigma Problem by providing adaptive feedback and motivation, improving individual and group performance.

Facilitators benefit from reduced prep time and more focused guidance, promoting consistent discussion quality regardless of individual expertise or availability. This approach encourages collaboration and critical thinking among participants.

The LLM supports higher-order skills by helping users analyze research methods, synthesize findings across studies, and evaluate evidence critically. However, maintaining this system requires ongoing effort to select relevant materials and align AI outputs with educational goals.

Low temperature and top-P settings help reduce hallucinations by generating more deterministic responses. The choice not to clean PDF text preserved tables and figures, balancing data integrity with embedding quality, though this area needs further study.

Costs for running a session averaged $350, which may seem high initially. However, growing availability of open-source LLMs like Mistral and Llama, along with hosting on institutional platforms, is expected to lower expenses. Residents adapted quickly with minimal training, supported by the increasing accessibility of AI tools and no-code platforms.

Future faculty development should include AI literacy to maximize benefits from these tools in education.

Conclusion

This initial study shows that RAG-enhanced LLMs can improve preparation, engagement, and critical learning in journal clubs while easing faculty workload. Despite challenges like cost and occasional inaccuracies, advances in AI and open-source alternatives promise greater accessibility. Continued development and evaluation will help integrate this approach more widely in medical education.