Blog post proposes using reinforcement learning on timestamped web cache to address data leakage in AI forecasting

Metal Ivy proposed a timestamped web cache to train AI forecasters and fix data leakage. This addresses flaws in 2024 papers claiming superhuman prediction skills.

Categorized in: AI News Science and Research
Published on: Jun 29, 2026
Blog post proposes using reinforcement learning on timestamped web cache to address data leakage in AI forecasting

A new proposal from Metal Ivy, crossposted to LessWrong, outlines a method to train a superhuman forecaster using reinforcement learning on a timestamped internet cache. The approach aims to solve a persistent data-leakage problem that has inflated scores on AI forecasting benchmarks. While the proposal is speculative and lacks empirical results, it directly addresses a methodological gap that researchers have criticized in recent high-profile forecasting claims.

The proposed training setup

The idea is to use a large, static cache of historical web content as the training environment. An RL agent would be rewarded for making accurate predictions about future events, but it could only access information published before each question's resolution date. This cleanly separates the training signal from any knowledge of the outcome, avoiding the leakage that occurs when models are trained on web data that includes post-resolution content.

The author argues that this setup mirrors the reinforcement learning loops that produced superhuman performance in Go and chess, but applied to open-ended world-event forecasting rather than a constrained game. The agent would learn to search for and weigh evidence from the cached internet, much as a human forecaster might research a question.

Data leakage in current benchmarks

Several papers in 2024-2025 claimed that large language model-based forecasters rivaled or exceeded human forecasters. A prominent LessWrong critique, titled "Contra papers claiming superhuman AI forecasting," argued those claims suffer from methodological problems, including data leakage, non-representative question sets, and comparisons to weak human baselines. Data leakage occurs when a model's training data includes web content from after a question was resolved, allowing it to effectively cheat on the benchmark.

The cached-internet proposal specifically targets the leakage critique. By restricting the agent to a snapshot of the internet at each point in time, the training process would force the model to reason from genuinely available information, not from hindsight.

What's missing and what to watch

The blog post is a conceptual proposal, not a research paper with empirical results. Building and maintaining a high-quality, timestamped web cache at the required scale is a substantial engineering challenge. Whether the reward signal from forecasting accuracy is rich enough to drive the kind of capability gains seen in game-playing agents remains an open question.

Practitioners interested in this direction should watch for follow-up empirical work testing whether the training loop produces the claimed generalization. No such experiments have been reported yet.

Why this matters for Science & Research

For researchers working on prediction, forecasting, or decision-support systems, the proposal highlights a concrete path to address a well-documented methodological weakness. Even as a speculative idea, it reframes the conversation around AI forecasting from "can we beat human forecasters?" to "can we build a training setup that cleanly measures forecasting ability?" The proposal's value lies in its direct engagement with the leakage problem, which remains a central challenge in AI for Science & Research. Tracking whether anyone tests this approach will be more informative than the proposal itself.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)