Token effort: AI tokens are surging - but are profits?
In the late '90s, startups sold the dream with "clicks" and "eyeballs." Today, it's "tokens." They're easy to count, easy to pitch, and they make AI demand look unstoppable. But if you run a P&L, you know simple metrics can hide messy economics.
Tokens rising doesn't guarantee cash in the bank. It can even mask shrinking margins. If you want signal instead of noise, measure what turns usage into profit.
Why tokens became the scorecard
- They're the core unit of LLM work: input + output are priced per 1,000 tokens on most platforms.
- Finance teams can forecast spend with a single lever: tokens used × price per 1,000.
- Vendors love it because it suggests traction, even if revenue is discounted or deferred.
The caveats that skew the story
- Tokens ≠ revenue: Free tiers, promo credits, and heavy discounting inflate usage without cash flow.
- Inconsistent tokenization: Different models count tokens differently; cross-vendor comparisons can mislead.
- Prompt bloat: Long contexts and verbose system prompts pump tokens without improving outcomes.
- Caching and RAG: Good engineering cuts repeat inference. Usage falls while user value stays flat.
- Efficiency gains: New models do the same task with fewer tokens, pressuring top line under usage pricing.
- Mix shift: Low-value spam, synthetic runs, or batch jobs can swell tokens while unit margins erode.
- Heavy fixed costs: GPUs, energy, and data-center leases dominate. Token growth doesn't always lift utilization enough to help margins.
- Reseller and platform cuts: Marketplaces and API layers take their share; list prices overstate take-home.
- Commit mechanics: Prepaid credits and minimums can flatter reported demand while deferring revenue recognition.
What to track instead
- Revenue per 1,000 tokens (RPT): Actual realized price after discounts and credits.
- Gross profit per 1,000 tokens (GPPT): Subtract GPU, energy, networking, and model licensing.
- Contribution margin by workload: Segment chat, code, search, batch ops; kill low-margin flows.
- Utilization: GPU hours used ÷ available. Idle capacity is silent margin leakage.
- Cohort retention and ARPU: Do paid cohorts expand usage, or churn when credits end?
- Price realization vs. list: Monitor variance by segment and channel.
- Cache hit rate and RAG efficiency: Higher hits, lower cost per answer.
- Success and latency SLAs: Re-runs and timeouts add cost without adding revenue.
A quick unit model you can plug into your P&L
- Start with RPT: say $5 per 1,000 tokens realized.
- Estimate cost per 1,000: GPUs + energy + networking + model fees = $3.20.
- GPPT = $1.80; gross margin = 36%.
- Layer in platform/reseller fees (e.g., 10%), support, and SRE overhead to get contribution margin.
- Scenario test: 20% price cuts, 25% efficiency gains, or a 15% mix shift to cheaper models.
This forces the real question: Do discounts and model switches help you win profitable volume, or just subsidize usage?
Signals to watch over the next year
- GPU supply and pricing: Changes to availability and cost ripple straight into GPPT.
- Model efficiency: If tasks need fewer tokens, you'll need better pricing or higher throughput to keep revenue flat.
- Enterprise commits: Seat-based and hybrid plans reduce token volatility but can hide breakage.
- Open-source adoption: Moving certain tasks in-house can slash cost per 1,000 but adds ops burden.
- Regulatory push on data and AI safety: Compliance overhead squeezes margins if unpriced.
Finance playbook: turn tokens into profit
- Define a single source of truth for tokens, revenue, and cost per 1,000; reconcile weekly.
- Cap token inflation: set internal limits on context size, enforce caching, and penalize prompt bloat.
- Route by margin: choose models per task based on GPPT and SLA, not hype.
- Price on value, not usage alone: add tiers by outcome (documents processed, issues resolved) with token overage only as a backstop.
- Renegotiate commits quarterly: swap idle credits for longer terms or better take-rates.
- Capex discipline: tie hardware buys to contracted demand and hard utilization thresholds.
- Forecast with sensitivity: volume ±20%, price ±20%, cost/1,000 ±20%-know your breakpoints.
If you need a baseline on token pricing models, see common structures on vendor pages like OpenAI pricing. For the cost side, GPU trends from major suppliers matter-start with public data on NVIDIA data center.
Want practical resources for finance teams building with AI? Explore curated tools and courses: AI tools for finance and courses by job.
Your membership also unlocks: