Mira Murati's $2B bet on deterministic AI

Mira Murati's Thinking Machines Lab is tackling LLM consistency by taming GPU-driven randomness. Backed by $2B, it plans a near-term product and an open Connectionism series.

Categorized in: AI News Science and Research

Published on: Sep 12, 2025

Ex-OpenAI CTO's startup targets AI consistency - and why that matters for research

Thinking Machines Lab, led by Mira Murati, is studying how to make large language models produce consistent, reproducible responses. The team highlighted inference-time randomness caused by GPU kernel orchestration and suggested tighter control of this process could stabilize outputs. The company has raised US$2 billion in seed funding and plans to launch its first product in the coming months. It will also publish ongoing research and code in a new series called "Connectionism."

Why this matters for science and enterprise

Reproducibility is a non-negotiable for regulated workflows, scientific experiments, and model evaluation. If the same prompt yields different answers across runs, audit trails break, A/B tests lose meaning, and RL training becomes noisy. Consistency at inference unlocks cleaner metrics, safer deployments, and smoother collaboration across teams.

Where randomness creeps in

Floating-point math: non-associativity changes results with different reduction orders.
GPU kernel scheduling: thread timing and fused kernels alter operation order between runs.
Library heuristics: cuBLAS/cuDNN algorithm selection varies by shape, hardware, or driver.
Mixed precision: BF16/FP16 rounding can amplify tiny deviations.
Decoding: temperature, top-k/p, and RNG states introduce variance even with fixed seeds.
Engine/tooling: graph capture, quantization, and compilers (e.g., TensorRT) can change numerics.

What Thinking Machines Lab is testing

Researcher Horace He outlined how tighter orchestration of GPU kernels during inference could reduce run-to-run variation. That likely means pinning algorithm choices, constraining kernel fusion, and standardizing execution order. If successful, enterprises and labs could get repeatable outputs without sacrificing too much throughput.

Potential impact

Auditable inference: stable outputs improve traceability and compliance reviews.
Reinforcement learning: less variance speeds up reward modeling and policy evaluation for business customization.
Benchmark integrity: reproducible metrics across hardware, drivers, and deployments.
Operations: more reliable A/B tests, incident triage, and SLA adherence in production.

Practical steps you can use now

Set seeds end to end and persist RNG states across services.
Enable deterministic modes and disable autotuning where possible (e.g., PyTorch and cuDNN). See PyTorch notes on reproducibility.
Pin versions: drivers, CUDA, libraries, model weights, tokenizers, and compilers.
Control numerics: consider disabling TF32, constrain mixed precision in sensitive ops, and calibrate quantization consistently.
Decode deterministically: temperature=0 (greedy) or fixed top-k/top-p; log sampling configs with outputs.
Standardize engines: compile once (same flags/hardware) and reuse; avoid runtime kernel changes.
Measure variance: run N replicates, track output drift, and set acceptance thresholds before deployment.
Record provenance: GPU model, SM count, clock, driver/CUDA versions, and env flags with every run.
For GPU-specific guidance, review NVIDIA's reproducibility recommendations.

Open research stance

The lab plans to publish work regularly under "Connectionism," sharing ideas and code early. That transparency contrasts with more closed development models and may attract researchers who prefer open collaboration and verifiable claims.

What to watch next

Whether the team's determinism techniques ship in the first product.
Benchmarks showing variance reductions across GPUs, drivers, and batch sizes.
Trade-offs: throughput, latency, and cost impacts of enforcing determinism.
Tooling: configs, kernels, or compilers that make deterministic inference easy to adopt.

If you're building reproducible AI workflows for research or production, explore focused training paths on AI systems and MLOps at Complete AI Training.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Advertisement

Mira Murati's $2B bet on deterministic AI

Ex-OpenAI CTO's startup targets AI consistency - and why that matters for research

Why this matters for science and enterprise

Where randomness creeps in

What Thinking Machines Lab is testing

Potential impact

Practical steps you can use now

Open research stance

What to watch next

Related AI News for Science and Research

How AI Slipped Into Peer Review: Faster Publishing, Murky Transparency, Untapped Rigor

From Busywork to Breakthroughs: Building Reliable Scientific AI Agents with NeMo Gym and NeMo RL

AI tips off scientists to a new monkeypox weak spot, opening the door to simpler vaccines and antibody therapies

AI spots chronic stress on routine CT: adrenal volume index tracks cortisol and predicts heart failure risk

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: