How LLMs Do Theory of Mind with Tiny Sparse Circuits-and Why Rotary Positional Encoding Could Cut Energy Costs

LLMs do Theory of Mind with tiny parameter clusters yet fire up the whole model-wasting compute. RoPE guides how beliefs are tracked, hinting at lean, selective routing.

Categorized in: AI News Science and Research
Published on: Nov 12, 2025
How LLMs Do Theory of Mind with Tiny Sparse Circuits-and Why Rotary Positional Encoding Could Cut Energy Costs

Sparse Circuits, Big Insight: How LLMs Do Theory of Mind (and Why It's Inefficient)

Researchers at Stevens Institute of Technology found that large language models perform Theory-of-Mind (ToM) reasoning using a small, specialized subset of parameters-while still activating the entire network every time. That mismatch is the headline: selective internal circuitry doing the work, wrapped inside a compute-hungry process.

The team also shows that positional encoding-especially rotary positional encoding (RoPE)-is central to how models track beliefs and perspectives. In other words, the way a model encodes word positions quietly steers its social reasoning.

A quick mental model

Think of the classic false-belief setup: someone hides a chocolate bar in a box, then another person moves it to a drawer. You know the bar is in the drawer. You also know the first person will look in the box. Humans do this in seconds, using only a small slice of neural resources.

LLMs can do something similar, but they light up almost their entire network to produce the answer-whether the prompt is trivial or complex. That's a serious efficiency gap.

Key findings

  • Sparse circuits: ToM relies on tiny clusters of parameters. Perturbing as little as 0.001% of these ToM-sensitive parameters causes a measurable drop in ToM performance and harms contextual processing.
  • Crucial encoding: RoPE strongly influences how models represent beliefs and perspectives. Changes here alter attention geometry (e.g., the angle between queries and keys) and disrupt dominant frequency channels tied to context.
  • Efficiency gap: Humans recruit a small neural subset for social reasoning; LLMs light up almost everything. Understanding these sparse circuits points the way to selective, energy-efficient computation.

Why this matters for your roadmap

If you build or evaluate LLM systems, this is a blueprint for lowering inference cost without sacrificing capability. The study suggests future models can toggle relevant parameter subsets on demand-similar to how the brain recruits specialized regions for a task.

That means fewer wasted FLOPs, smaller energy bills, and better latency for workloads that include social reasoning, dialogue safety, and multi-agent simulations.

Under the hood: what's actually happening

  • Parameter sparsity with global activation: Only a small internal cluster is critical for ToM, yet the full network still runs. This is avoidable overhead.
  • RoPE as a control dial: The positional encoding routine-specifically RoPE-modulates angles between queries and keys and emphasizes frequency bands that help the model localize beliefs across context.
  • Fragility reveals function: Micro-perturbations to ToM-sensitive parameters degrade ToM and general language localization, indicating these parameters sit at a structural choke point for social inference.

What to build next

  • Conditional computation: Introduce routing that activates only task-relevant experts or blocks for ToM-like workloads. Pair with confidence gating to keep quality stable.
  • Position-aware gating: Use RoPE-driven signals to trigger selective activation. If ToM cues are detected (e.g., belief states, perspective shifts), route to sparse ToM modules.
  • Targeted pruning and quantization: Preserve ToM-sensitive clusters while trimming less relevant paths. Validate with ToM benchmarks.
  • Instrumentation: Track attention geometry and frequency activations tied to ToM during training. Use controlled ablations to verify causal pathways.

For teams in science and research

  • ML engineers: Add ToM probes to your eval suite; log query-key angle shifts under RoPE. Test micro-ablations to map sensitive parameter sets.
  • Product leads: Expect inference savings from conditional compute. Prioritize features that help the model "use less to do more."
  • Neuroscience collaborators: This is a clean bridge to brain-inspired selective activation. Plan joint studies around belief tracking and perspective-taking.

Key questions answered

  • What did researchers discover about AI social reasoning?
    LLMs rely on a small, specialized set of internal connections and positional encoding patterns to perform Theory-of-Mind reasoning.
  • Why does this matter for AI efficiency?
    Current models activate most parameters for every task; mapping sparse ToM circuits enables selective activation and lower energy use.
  • What's next for LLM design?
    Build models that activate only task-specific parameters-more like the brain-to cut compute and improve throughput.

Source and research

Source: Stevens Institute of Technology

Original Research: "How large language models encode theory-of-mind: a study on sparse parameter patterns," published in npj Artificial Intelligence.

Learn more

If you're upskilling your team on LLM systems, interpretability, or efficient inference, explore practical training paths here: Latest AI courses.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)