Hugging Face releases ML Intern, an open source AI agent that autonomously runs machine learning research and outperforms Claude Code on scientific reasoning benchmarks

Hugging Face released ML Intern, an open source AI agent that runs the full ML research loop autonomously. It outscored Claude Code on scientific reasoning and beat Codex on a healthcare benchmark by 60%.

Categorized in: AI News Science and Research
Published on: Apr 23, 2026
Hugging Face releases ML Intern, an open source AI agent that autonomously runs machine learning research and outperforms Claude Code on scientific reasoning benchmarks

Hugging Face releases ML Intern, an AI agent that teaches itself to beat Claude on research tasks

Hugging Face released ML Intern, an open source AI agent that autonomously researches, writes and runs machine learning code. Early benchmark results show it outperforming Anthropic's Claude Code on scientific reasoning and OpenAI's Codex on healthcare evaluation tasks.

The agent automates the full research loop: finding papers on arXiv, selecting datasets, writing code, launching training jobs on GPUs, and evaluating results. Hugging Face is provisioning $1,000 in GPU resources and Anthropic credits for early users. The tool is available today as a command-line interface and web app.

How the agent performs on benchmarks

On a scientific reasoning task, ML Intern found NVIDIA research papers through citation searches, then fine-tuned Qwen3-1.7B across 12 training passes. It achieved a 32% score on the GPQA benchmark in under 10 hours. Claude Code's best result on the same task was 22.99%.

For a healthcare evaluation, the agent identified that existing datasets were too low quality. It wrote a script to generate 1,100 synthetic data points covering emergency, client and multilingual communication scenarios, then upsampled the data 50 times for training. The final model beat Codex on HealthBench by 60%.

On a competitive mathematics task, the agent wrote a full training script, launched it on A100 GPUs, and ran ablations when initial rewards collapsed until it succeeded.

What the tool does

ML Intern runs up to 300 iterations per task, with a context manager that handles message history and automatically compacts it. The agent can access Hugging Face documentation, datasets and training jobs, search GitHub code, and execute code in a sandboxed environment.

The command-line interface installs via uv and accepts any inference provider model ID. Default configuration uses Anthropic's Claude models.

The practical question

The agent performed well on curated benchmarks, but its real-world performance remains untested. Educational datasets often involve messy data quality, consent issues and licensing concerns. Since Hugging Face built ML Intern as open source software on its own ecosystem, the research community will be able to test those limits publicly.

For researchers looking to automate parts of the post-training workflow, AI Research Courses and AI Coding Courses can help you understand how agents like this work and where they fit into your research process.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)