Executable Knowledge Graphs (xKG) Bring Missing Details Into Code Generation
Automating research replication is hard because papers leave out implementation details. Retrieval-augmented agents hit a wall when those details sit in references, footnotes, or scattered code.
Researchers from Zhejiang University propose Executable Knowledge Graphs (xKG): a structured, runnable knowledge base built from papers and repositories. It organizes technical insights and code into a hierarchy that agents can query, reason over, and compose into working implementations.
Why agents stall on replication
Most systems retrieve text but miss the fine-grained links between methods, assumptions, and code. They also struggle to assemble end-to-end solutions from partial snippets.
xKG tackles both issues by encoding concepts and their executable counterparts, then exposing them through a graph that reflects how real projects are built: techniques, sub-tasks, and the code that makes them run.
What xKG is
xKG is a hierarchical, multi-relational graph extracted from arXiv papers and GitHub repositories. It includes paper nodes, technique nodes, and code nodes connected by structural and implementation edges.
This lets an agent move from "what the paper says" to "which component does it," then to "which code runs it," with enough granularity to reassemble a full pipeline.
How xKG is built
- Corpus curation: select target papers, references, and corresponding repositories.
- Technique extraction: identify core methods and their key components from paper text.
- Code linking: map techniques and sub-tasks to concrete, runnable code snippets.
- Modularization: refactor code into well-scoped components that are easy to execute and reuse.
- Knowledge filtering: verify, prune, and align nodes/edges for accuracy and relevance.
The result is a knowledge graph where techniques branch into sub-tasks and connect to specific, documented implementations.
Where it fits in agent workflows
The team integrated xKG into three agent frameworks-BasicAgent, IterativeAgent, and PaperCoder-paired with two different language models. Instead of guessing missing steps, agents retrieve the exact modules they need and stitch them together with fewer hallucinations and fewer dead ends.
In short, xKG upgrades agents from "text-guided scaffolding" to assembling complete, functional repositories grounded in verified code.
Benchmark results: PaperBench
On PaperBench, which checks functional correctness of generated repositories against a rubric, xKG delivered consistent gains. With the o3-mini model, the improvement reached 10.9%.
The study notes evaluation variance and cost, plus reduced effectiveness when reference papers are unavailable. Still, across agent types and models, the boost is clear.
Practical takeaways for research teams
- Use hierarchical knowledge: link claims to components, and components to runnable code.
- Prefer modular repositories: small, documented units make retrieval and recomposition tractable.
- Capture hidden assumptions: defaults, preprocessing, seeds, and evaluation metrics belong in the graph.
- Treat code as knowledge: verified snippets are as valuable as text for replication.
- Plan for missing references: add fallbacks when paper links or repos are incomplete.
Limitations and next steps
xKG works best when reference papers and repos are accessible. Future work will test transfer to other tasks where code-based knowledge organization may help beyond replication.
The authors also situate xKG alongside related efforts such as ExeKG, noting differences in approach and scope, and they have released code to encourage further research.
Resources
- Executable Knowledge Graphs for Replicating AI Research (arXiv)
- AI courses for research and engineering roles - Complete AI Training
Your membership also unlocks: