Marketing's new meter runs on tokens - agencies are split on who picks up the bill

AI runs on tokens, and at scale that meter adds up. Agencies are testing pricing models, capping usage, and focusing on speed and outcomes over pennies per prompt.

Categorized in: AI News Marketing
Published on: Mar 04, 2026
Marketing's new meter runs on tokens - agencies are split on who picks up the bill

The new marketing currency: making sense of AI token costs

Generative AI doesn't run on vibes. It runs on tokens. Every prompt and response is metered, and while each token costs a fraction of a cent, volume flips the bill from trivial to material fast. One recent brand push logged roughly 70,000 prompts and millions of tokens. That's real money.

The catch: there's no single model that does it all. Teams mix providers and models, each with different token rules and prices. That variability is forcing agencies to pick a stance on how to price, report, and control usage - or risk margin leaks and client confusion.

How agencies are pricing AI today

  • Metered pass-through (production-style): Merge bills token usage case by case. Big Spaceship treats compute like a standard production line item.
  • Subscription/seat model: Silverside prices access like SaaS seats - predictable for clients, pooled control for the studio.
  • Agency absorbs the cost: RPA covers tokens while results prove out. Anomaly is wary of passing costs through at all - "feels like a money grab."
  • Generation credits with tiers: Brandtech's Pencil sells "generations" (chat, image, video-second). Clients commit to volume; Pencil negotiates better rates with model providers and passes scale benefits back.
  • Cost-recovery for platform rollouts: Horizon Media's Blu charges a nominal fee during early usage. Most token spend sits in onboarding and ongoing development until users scale into the tens of thousands.
  • Embedded in retainer: Kepler folds AI into delivery without separate token deals. Impact is the metric; tokens are the fuel.
  • Bulk-buy, pass at cost: Lerma/ secures a bulk agreement, includes tokens in upfront estimates, no markup. If they overbuy, they deliver extra assets.

What actually moves margins

Cheap tokens aren't the prize. Labor compression is. As one consultant put it, if runtime replaces reporting teams, repetitive dashboards, and middleware, the upside dwarfs any bulk discount. That's the game: fewer handoffs, fewer dashboards, faster cycles.

There's also a positioning risk. Competing on the lowest token rate drags the conversation into pennies, not value. You win by outcomes and architecture - not arbitrage.

Set your token policy: a practical playbook

  • 1) Model your workload: List use cases, model choices, typical context sizes, and output lengths. Estimate tokens per task and per project. Price it using live rate cards from providers like OpenAI or Anthropic.
  • 2) Choose a pricing framework:
    • High-variance, production-heavy work: metered pass-through with caps.
    • Retainer/performance-led: absorb usage or bundle into a fixed platform fee.
    • Platform access: seat + credits with volume tiers and "top-up."
    • Enterprise scale: bulk deals, then tiered credits aligned to committed volume.
  • 3) Build safeguards: Project-level token caps, real-time meters, alerts at 50/80/100%, and auto "top-up" rules approved in writing.
  • 4) Set contracts and transparency: Decide on pass-through at cost vs. a declared margin. Include indemnification for enterprise plans. Make credits non-transferable across clients. Log usage by project.
  • 5) Tie usage to outcomes: Report tokens alongside cycle time saved, iterations shipped, CPA shifts, lift, and revenue impact. If you bill by usage, align incentives with client results.
  • 6) Plan for audits: Expect token audits to sit next to media audits. Keep verifiable logs: model, version, prompt class, tokens in/out, and cost center.

Cost control without killing quality

  • Right-size the model: smaller models for ideation and drafts; larger models for final polish and safety.
  • Reduce context bloat: retrieval over giant prompts, summaries over raw dumps, tight schemas.
  • Cache and batch: reuse system prompts, template common tasks, and batch generation runs.
  • Short prompts, structured outputs: fewer tokens in, cleaner tokens out.
  • Governance in the UI: visible token meters, budget per project, and hard stops.
  • Train the team: prompt patterns, model selection, and a shared library of approved workflows.

What clients will ask next

"Show me the receipts." CFOs will want pass-through clarity, and consultants may be asked to audit token usage just like media. Be ready with logs, caps, and proof that spending tokens created business returns - not just more content.

A simple decision cheat sheet

  • Heavy production or one-off shoots: Pass-through, metered, line item.
  • Always-on retainers: Absorb or fixed platform fee with sensible caps.
  • Self-serve platform: Seat + credits, tiered by volume, "top-up" if needed.
  • Enterprise scale: Bulk-buy plus tiered pricing; indemnification for larger plans.
  • Early-stage/testing: Agency absorbs to de-risk; revisit after proof of impact.

Bottom line

Tokens are fuel. Don't build your story around the gas bill. Build it around speed to concept, the volume and quality of iterations, and measurable lifts in performance. Price with clarity, measure what matters, and keep incentives aligned with outcomes.

If you're standing up AI programs or need to train managers on usage, pricing, and measurement, explore our AI Learning Path for Marketing Managers.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)