Titans + MIRAS give AI long-term memory that learns as it goes

Titans and MIRAS give models long context by updating memory on the fly, mixing RNN-like speed with transformer-level accuracy. Results beat baselines at million-token scale.

Titans + MIRAS: Helping AI have long-term memory

AI breaks down on very long inputs because traditional attention scales poorly with sequence length. Titans and the MIRAS framework attack that head-on by updating a model's core memory while it's running - no offline retraining loop, no static state that forgets the details.

Think of Titans as the tool and MIRAS as the blueprint. Together they bring RNN-like speed with transformer-level accuracy by learning what to keep (and what to drop) in real time.

Why long context is hard

Transformers compare every token to every other token. That gets expensive fast. Linear RNNs and state space models compress everything into a fixed-size state, which is efficient but loses nuance in very long streams like full documents, DNA, or multi-hour logs.

We need something that keeps the details that matter, scales linearly, and adapts on the fly. That's the gap Titans and MIRAS fill.

Titans in one line

Titans adds a neural long-term memory module - a deep multilayer perceptron - that summarizes the past and feeds that summary back into the context before attention runs. Attention can use it or ignore it. The key is that this memory is learned and updated during inference.

Surprise-driven updates (what gets stored, and why)

Titans uses a "surprise" signal based on the model's internal error (the gradient). If the next token is expected, nothing major changes. If the input conflicts with what the model believes, the surprise is high and the memory updates immediately.

Two refinements make this practical: momentum (carry forward recent surprise so related tokens are captured) and adaptive forgetting (weight decay) to clear space as sequences get extremely long.

MIRAS: the general blueprint

MIRAS reframes sequence models as associative memories. Different architectures are just different answers to the same question: how do we combine fresh input with prior state without losing essential information?

Memory architecture: What holds information (vector, matrix, or a deep network like Titans).
Attentional bias: The internal objective that decides what is prioritized.
Retention gate: The regularizer that balances new learning against keeping what matters.
Memory algorithm: The optimizer used to update memory.

Moving beyond one-size-fits-all losses

Most models lean on mean squared error or dot-product similarity for both bias and retention. That can overreact to outliers and limit flexibility. MIRAS opens a broader design space: non-Euclidean objectives, different regularizers, and more controlled updates.

Three MIRAS models (attention-free)

YAAD: Uses Huber-style penalties to be less sensitive to outliers (like random typos or noisy spikes).
MONETA: Tests stricter generalized norms for prioritization and forgetting to stabilize long-term memory behavior.
MEMORA: Constrains memory to act like a probability map, making each update balanced and predictable.

What the experiments showed

Across language modeling datasets (C4, WikiText) and zero-shot reasoning (HellaSwag, PIQA), Titans and the MIRAS variants beat strong baselines including Transformer++, Mamba-2, and Gated DeltaNet of comparable size. Perplexity improved (lower is better; it's a measure of how "surprised" a model is by text), and accuracy rose on reasoning tasks.

Titans generalized beyond text to genomics and time-series forecasting. On extreme long-context tests like BABILong, it outperformed top baselines - including very large models - while using far fewer parameters. It also scaled to context windows beyond 2 million tokens.

Ablation studies highlighted a simple rule: for the same memory size, deeper memory modules worked better and kept performance steady as sequences grew.

Why this matters for applied research

If your workloads involve very long documents, continuous logs, or streaming data, consider architectures that update memory at test time rather than storing a fixed state.
Model behavior improves when the memory is deep, not just wide. Prioritize depth when tuning footprint.
Use surprise thresholds and momentum to capture meaningful shifts, not just single spiky tokens. Pair with adaptive forgetting to manage capacity.
When inputs are messy, losses like Huber (YAAD) help avoid overreacting to outliers. For stability, constraints like MEMORA's probability-style updates can keep learning smooth.
Training remains parallelizable, and inference runs in linear time, which keeps deployment costs predictable.

Get practical with long-context AI

If you want structured ways to skill up on sequence modeling and long-context systems, browse Complete AI Training: Courses by job. Pick a path that matches your role and build from there.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Titans + MIRAS give AI long-term memory that learns as it goes

Titans + MIRAS: Helping AI have long-term memory

Why long context is hard

Titans in one line

Surprise-driven updates (what gets stored, and why)

MIRAS: the general blueprint

Moving beyond one-size-fits-all losses

Three MIRAS models (attention-free)

What the experiments showed

Why this matters for applied research

Further reading

Get practical with long-context AI

Related AI News for Science and Research

Global AI maps safe waters to shield freshwater fish from extinction

African-led health research and care with AI and data science: DS-I Africa's open, ethical and collaborative blueprint

Smarter Isn't Wiser: How to Build AI That Thinks About Its Thinking

Beyond plaques: AI maps Alzheimer's as a brain-wide metabolic upheaval

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: