New tools for understanding AI and learning outcomes
March 4, 2026 * Global Affairs
Education is one of AI's most promising frontiers. Tools like ChatGPT can give every student on-demand support, but the real question for schools is simple: does this help students learn better over time, not just on one test?
Early research on study mode showed encouraging gains, yet exposed a gap. Most methods measure narrow performance signals and miss how students actually learn with AI day to day-and whether those habits persist.
Why current methods fall short
Short windows and single endpoints hide what matters: durable gains, healthy study behaviors, and transfer across courses. Results also struggle to generalize across curricula and systems. Educators need a flexible way to evaluate AI against their own goals and standards over months, not hours.
What's new: the Learning Outcomes Measurement Suite
Built with the University of Tartu and the SCALE Initiative at the Stanford Accelerator for Learning, the Learning Outcomes Measurement Suite supports longitudinal measurement across contexts. Validation is underway through a randomized controlled trial and broader collaborations in the Learning Lab with institutions such as Arizona State University, UCL Knowledge Lab, and MIT Media Lab.
- System instructions to refine model behavior: Aligns the AI to evidence-based pedagogy using natural language guidelines.
- Learning interaction classifiers: Detects "learning moments" in de-identified transcripts and labels features like engagement or error correction.
- Learning quality graders: Scores moments against tutoring rubrics, flags failure modes, and checks whether the learner met their objective.
- Longitudinal learning graders: Tracks how a learner's engagement, persistence, and metacognitive strategies change over time-individually and by cohort.
- Standardized cognitive and metacognitive measures: Validated third-party instruments delivered pre/during/post access to assess critical thinking, creativity, memory, and related skills.
The suite outputs structured views of learning moments, cohort dashboards over time, model indicators against teaching rubrics, and outcome measures aligned to standardized assessments and short check-ins. Where partners provide ground truth (e.g., exam scores, observations, attendance), it can integrate that data for a clearer picture. All data is de-identified.
Early findings from study mode
In a randomized study with 300+ college students preparing for neuroscience and microeconomics exams, participants were assigned to: (1) a control group using traditional online resources with AI overviews disabled, or (2) one of two study mode variants that guided learning in slightly different ways. Baseline quizzes and surveys adjusted for prior exposure, habits, confidence, and AI familiarity. Sessions were timed (about 40 minutes) and reflected real study conditions, allowing intention-to-treat analysis.
Results varied by subject. In neuroscience, differences trended positive for study mode but weren't distinguishable from control, with some onboarding and technical friction limiting study time. In microeconomics, students offered study mode saw about a 15% higher relative exam score than the no-AI control.
The takeaway: fixed, short-term endpoints miss the core mechanism of AI in learning-ongoing, personalized interactions that evolve with the learner. We need measures that capture durability, behavior change, and trade-offs across capabilities, not just short-term recall.
What the suite helps you track over time
- Autonomous Motivation: How much students drive their own study versus relying on the model.
- Productive Engagement: Frequency, variety, and quality of meaningful learning interactions.
- Task Persistence: Willingness to sit with and work through cognitive challenges.
- Metacognition: Planning, reflection, and monitoring of study strategies.
- Recall: Accuracy of remembering content across sessions.
This broadens the focus beyond raising test scores to include the capabilities that make learning stick. There's no single metric to optimize. Systems and educators decide which trade-offs fit their pedagogy and context.
Practical steps for schools and universities
- Define success upfront: Choose the outcomes that matter most for your context-e.g., critical thinking, persistence, or course pass rates.
- Start with an opt-in pilot: Align with local policies, secure approvals, and communicate data handling clearly. Keep transcripts de-identified.
- Instrument interactions: Use classifiers and graders to label learning moments and calibrate against your tutoring rubrics.
- Measure across time: Run brief pre/during/post checks using validated instruments alongside course assessments.
- Watch for trade-offs: Track whether gains in recall affect persistence, motivation, or creativity, and adjust system instructions accordingly.
- Invest in teacher capability: Offer hands-on training for lesson planning, feedback workflows, and formative assessment with AI. See the AI Learning Path for Teachers.
Validation at scale
Large-scale studies with the University of Tartu and Stanford's SCALE Initiative include a nation-level rollout in Estonia with nearly 20,000 students aged 16-18. Work is coordinated with local leaders to ensure safety and alignment to curriculum.
"This research allows us to learn quickly while also laying the groundwork for a deeper understanding of how AI can be thoughtfully integrated into schools in ways that truly matter. We want to understand how these tools can support rigorous academic learning while also cultivating higher-order thinking, creativity, curiosity, and students' confidence in themselves as learners." - Susanna Loeb, Stanford University, SCALE Initiative
"Estonia has always approached education as a system we continuously improve. With AI becoming part of that picture, the big question is how we measure AI's long-term impact on learning. That's what we're figuring out in collaboration with partners. Students are keen to be involved, and many want to learn how to support learning with AI." - Jaan Aru, University of Tartu
Additional research is examining how AI intersects with student pathways and labor outcomes across institutions including Bocconi University, Innova Schools and Tuck School of Business at Dartmouth, San Diego State University, and Stony Brook University.
Stay informed
As longer-term studies progress, findings will be shared with the education community and made available as a public resource. For context on partners, see the Stanford SCALE Initiative and the University of Tartu.
Your membership also unlocks: