Mira Murati's $2B bet on deterministic AI

Mira Murati's Thinking Machines Lab is tackling LLM consistency by taming GPU-driven randomness. Backed by $2B, it plans a near-term product and an open Connectionism series.

Categorized in: AI News Science and Research
Published on: Sep 12, 2025
Mira Murati's $2B bet on deterministic AI

Ex-OpenAI CTO's startup targets AI consistency - and why that matters for research

Thinking Machines Lab, led by Mira Murati, is studying how to make large language models produce consistent, reproducible responses. The team highlighted inference-time randomness caused by GPU kernel orchestration and suggested tighter control of this process could stabilize outputs. The company has raised US$2 billion in seed funding and plans to launch its first product in the coming months. It will also publish ongoing research and code in a new series called "Connectionism."

Why this matters for science and enterprise

Reproducibility is a non-negotiable for regulated workflows, scientific experiments, and model evaluation. If the same prompt yields different answers across runs, audit trails break, A/B tests lose meaning, and RL training becomes noisy. Consistency at inference unlocks cleaner metrics, safer deployments, and smoother collaboration across teams.

Where randomness creeps in

  • Floating-point math: non-associativity changes results with different reduction orders.
  • GPU kernel scheduling: thread timing and fused kernels alter operation order between runs.
  • Library heuristics: cuBLAS/cuDNN algorithm selection varies by shape, hardware, or driver.
  • Mixed precision: BF16/FP16 rounding can amplify tiny deviations.
  • Decoding: temperature, top-k/p, and RNG states introduce variance even with fixed seeds.
  • Engine/tooling: graph capture, quantization, and compilers (e.g., TensorRT) can change numerics.

What Thinking Machines Lab is testing

Researcher Horace He outlined how tighter orchestration of GPU kernels during inference could reduce run-to-run variation. That likely means pinning algorithm choices, constraining kernel fusion, and standardizing execution order. If successful, enterprises and labs could get repeatable outputs without sacrificing too much throughput.

Potential impact

  • Auditable inference: stable outputs improve traceability and compliance reviews.
  • Reinforcement learning: less variance speeds up reward modeling and policy evaluation for business customization.
  • Benchmark integrity: reproducible metrics across hardware, drivers, and deployments.
  • Operations: more reliable A/B tests, incident triage, and SLA adherence in production.

Practical steps you can use now

  • Set seeds end to end and persist RNG states across services.
  • Enable deterministic modes and disable autotuning where possible (e.g., PyTorch and cuDNN). See PyTorch notes on reproducibility.
  • Pin versions: drivers, CUDA, libraries, model weights, tokenizers, and compilers.
  • Control numerics: consider disabling TF32, constrain mixed precision in sensitive ops, and calibrate quantization consistently.
  • Decode deterministically: temperature=0 (greedy) or fixed top-k/top-p; log sampling configs with outputs.
  • Standardize engines: compile once (same flags/hardware) and reuse; avoid runtime kernel changes.
  • Measure variance: run N replicates, track output drift, and set acceptance thresholds before deployment.
  • Record provenance: GPU model, SM count, clock, driver/CUDA versions, and env flags with every run.
  • For GPU-specific guidance, review NVIDIA's reproducibility recommendations.

Open research stance

The lab plans to publish work regularly under "Connectionism," sharing ideas and code early. That transparency contrasts with more closed development models and may attract researchers who prefer open collaboration and verifiable claims.

What to watch next

  • Whether the team's determinism techniques ship in the first product.
  • Benchmarks showing variance reductions across GPUs, drivers, and batch sizes.
  • Trade-offs: throughput, latency, and cost impacts of enforcing determinism.
  • Tooling: configs, kernels, or compilers that make deterministic inference easy to adopt.

If you're building reproducible AI workflows for research or production, explore focused training paths on AI systems and MLOps at Complete AI Training.