Health systems are racing to adopt AI. Can they prove its value?
Health systems are rolling out AI across clinical and business operations - ambient documentation, summarization, revenue cycle, patient access, the lot. The hard part isn't enthusiasm. It's proof. CFO-ready ROI is still elusive for many tools, even as IT, operations and clinical leaders push to scale.
Leaders at HLTH 2025 agreed: AI can simplify administration and boost productivity. But direct cost reductions that show up cleanly on the income statement are rare. The value story often lives in time saved, burnout reduced and patient experience improved - benefits that require a few more steps to connect back to dollars.
The economic pressure is real
Medicaid cuts and the potential expiration of enhanced ACA subsidies could push millions off coverage. More uninsured patients mean less revenue and more uncompensated care. In that context, AI investments need a clear path to margin improvement - even if the path is indirect.
What to measure besides cash
Some tools tie to money quickly. Revenue cycle AI can speed days to collect and reduce denials. Those gains are easy to track.
Others, like ambient documentation, are trickier. Evidence so far suggests meaningful reductions in clinician burnout and documentation time, while the impact on productivity and net financial performance is mixed. The Peterson Health Technology Institute has flagged that nuance in its reviews of ambient scribes.
Peterson Health Technology Institute and KLAS Research both point to a common thread: if you can't see dollars yet, track leading indicators that reasonably roll up to dollars later.
A practical framework to prove AI value
Use this before you sign the contract and again at 30/60/90 days post-go-live.
- Step 0: Define the problem. One sentence: what's broken, for whom and how will we know it's fixed?
- Baseline. Lock a 6-12 week pre-implementation baseline for each metric.
- Primary outcome. Pick one: margin lift, days to collect, wRVUs/visit, no-show rate, documentation time per note.
- Secondary outcomes. Clinician burnout, turnover/retention, patient satisfaction, coding accuracy, prior auth approvals.
- Guardrails. Quality, safety, bias tests, PHI exposure, hallucination rate, escalation paths.
- Attribution plan. Control group or staggered rollout. Note confounders (seasonality, staffing, benefit changes).
- Instrumentation. Who captures what, where and how often. Dashboard lives where. Owner is who.
- Decision rule. Pre-agree on "scale," "fix," or "sunset" thresholds.
Metric library by AI use case
- Ambient documentation/scribes
- Documentation time per note
- Note completion before end of day (%)
- Provider schedule capacity (visits/day)
- E/M level distribution shift and coding accuracy
- Burnout score (e.g., Mini-Z), turnover/retention
- Patient satisfaction with provider communication
- Revenue cycle
- First-pass claim acceptance (%)
- Days in A/R, net collection rate
- Denial rate by reason code; preventable denials
- Cost-to-collect and staff productivity (accounts/FTE)
- Patient access
- No-show rate, reschedule time, contact center AHT
- Self-service completion rate; leakage reduction
- Referral-to-appointment cycle time
- Clinical summarization/triage
- Chart prep time; message handling time
- Response time to patient messages
- Safety: agreement with clinician judgment; escalation rate
Turn "soft" metrics into dollars
- Avoided turnover. Replacing a physician often costs 2-3x annual salary. If ambient scribing drops annual exits by a few FTEs, that's six to seven figures retained.
- Time-to-capacity. Minutes saved per note x notes per day x loaded hourly rate. If capacity allows one extra visit/day, use average contribution margin per visit.
- Coding lift. Shift in E/M levels x payer mix x payment per level, minus audit risk and write-offs.
- Faster cash. Days in A/R reduced x average daily cash equals working capital relief; add interest or reinvestment benefit if applicable.
- Fewer no-shows. Appointments recovered x conversion to completed visits x contribution margin.
Governance that keeps you out of trouble
- Data use: PHI handling, BAAs, access logging, red-teaming for prompt/response leakage
- Quality: bias testing by cohort, error thresholds, human-in-the-loop for high-risk tasks
- Safety: clinical validation, override workflow, incident reporting
- Security: model and vendor risk reviews; clear SLAs, uptime, and rollback plan
- Change management: opt-in pilots, training, quick-reference guides, feedback loop
Scorecard template (use as-is)
- Objective: Reduce documentation time by 30% and improve patient communication scores by 10% in 90 days.
- Scope: Internal medicine, 40 providers, two clinics.
- Baseline: 13 min/note; 72% top-box communication.
- Targets: 9 min/note; 79% top-box; no increase in addenda or safety events.
- Financial proxy: 4 min saved x 22 notes/day x $2/min loaded rate = $176/provider/day; validate with capacity or message backlog reduction.
- Decision rule: Scale if targets met with no safety flags and provider NPS ≥ +30.
How leaders are stress-testing AI today
Set success criteria up front. Compare against a clear baseline. Expect returns to be correlative before they're causal. If the tool doesn't at least pay for itself or deliver an undeniable patient or clinician win, cut it.
That mindset is helping teams pick winners, contain hype and reallocate budget fast. It also builds trust with clinicians who want relief from administrative work without adding risk.
Next steps for your system
- Pick one high-friction workflow and run a tightly scoped pilot
- Instrument metrics before you buy, not after go-live
- Translate time saved into capacity and margin, conservatively
- Publish results internally and decide to scale, fix or sunset
If you need structured upskilling for clinicians, RCM and analytics teams working with AI workflows, explore concise programs by job role at Complete AI Training.
Your membership also unlocks: