AI in Healthcare: What's Working, What Isn't, and How to Move Forward
Bringing new tech into clinical care is hard for a reason: patient safety, regulation, and established workflows. If a tool risks either safety or efficiency, it won't last. That's not resistance to change; that's good governance.
AI is no exception. The question isn't "Can it pass the demo?" The question is "Does it improve outcomes or ROI in the messiness of real practice?"
Why automation stalls inside hospitals and clinics
Automation thrives on complete, consistent data. Healthcare rarely has that. Decisions and billing both break when even a small segment of the record is missing.
Think about claims: prior auth, supporting notes, lab results, imaging-leave one piece out and denials spike. Same for clinical support: if the radiology report doesn't reach the EHR, an autonomous agent can't act safely. Interoperability has improved, but there are still too many systems, vendors, and data hubs to align without deliberate effort.
The risk is real: if a procedure is performed at one center and the results don't show up elsewhere, nobody has the full picture. That's a patient safety problem, not a tech problem.
Evidence and reimbursement are the choke points
Every new tool consumes money, time, and attention. To justify that spend, it has to show one or more of the following: better outcomes, higher clinical productivity, fewer administrative hours, increased revenue, or lower cost.
Clearance alone doesn't prove any of that. For example, several radiology AI tools earned regulatory clearance by matching radiologist performance on specific findings. But many haven't shown outcome improvement in real-world use, so payers won't reimburse. Adoption stalls, and out-of-pocket models don't scale.
There's another issue: external validity. AI models can drop in performance on new scanners, sites, or patient populations. Post-market monitoring is inconsistent, and many deployments lack ongoing checks for drift, bias, and subgroup performance.
What's actually getting traction
In the last 18 months, the fastest adoption has shown up in lower-risk, human-in-the-loop use cases: ambient documentation, EHR copilots, and to a lesser extent, autonomous coding. These tools sit inside existing workflows and keep clinicians in charge.
Early signals suggest reduced cognitive load and burnout, but chart summaries can contain errors that require physician review. Financial ROI remains mixed and context-dependent. One recent evaluation of AI scribes highlighted clinician time savings but found system-level ROI still uncertain-worth reading if you're planning a pilot.
Peterson Health Technology Institute: Generative AI in Clinical Documentation
How to evaluate AI vendors without wasting a year
- Start with low-risk pilots. Use human-in-the-loop. Define success up front: minutes saved per note, after-hours EHR time, denial rate, throughput, patient wait time, and clinician satisfaction.
- Demand real-world evidence. Ask for external validation on data from sites unlike the training set, subgroup performance (age, sex, race, language), and calibration reports. Require a plan for post-go-live monitoring and drift detection.
- Integrate before you automate. Map the data path: EHR, PACS, LIS, RIS, RCM, HIE. Test interfaces in a sandbox. Confirm read/write permissions, identity matching, and error handling. No shortcuts here.
- Safety guardrails. Define clinical ownership, escalation paths, and immediate rollback criteria. Log every AI suggestion and human override for audit and learning.
- Measure what matters. Track note completion latency, coding accuracy, claim first-pass yield, denial overturn rate, staff overtime, and patient throughput. Publish results internally, good or bad.
- Plan for reimbursement. Map to CPT/HCPCS where relevant, confirm payer policies, and ensure documentation artifacts support audits. No evidence, no payment.
- Security and privacy. Require BAA, PHI minimization, encryption, access logs, model update controls, and data deletion terms. Clarify who owns derivative data and embeddings.
- Change management. Appoint clinical champions and super users. Start small, train thoroughly, collect feedback weekly, and iterate.
- Post-market monitoring. Shadow mode first, then phased rollout. Build dashboards for performance, bias, and drift. Revalidate after EHR upgrades, scanner changes, or population shifts.
- Contract for outcomes. Tie part of vendor fees to agreed metrics (e.g., documentation time, denial rate). Include service levels, uptime, remediation timelines, and exit clauses.
Data readiness: the hidden cost
Most "AI failures" are data problems. Fix identity matching, close gaps in results routing, and reduce free-text ambiguity before you expect automation to work. Join your HIE, clean up interfaces, and standardize vocabularies where you can.
If you can't reliably assemble a near-complete patient record for the use case, don't automate it yet. You'll burn trust and lose time.
Where to place your bets in the next 12 months
- Ambient documentation and summarization to cut clerical time, with strict validation before use in patient-facing communication.
- Assistive coding and CDI to improve capture and reduce denials, with coder review in the loop.
- Operational copilots for inbox triage, prior auth preparation, and patient messaging drafts-always reviewed by staff.
- Focused imaging AI with demonstrated outcome improvement and site-level validation, plus a clear path to reimbursement.
Bottom line
AI can help, but only where the data are connected, the workflow is owned, and the evidence is real. Start with safer assistive use cases, measure relentlessly, and make vendors prove value in your environment.
If you need structured upskilling for clinical and administrative teams, this curated catalog can help you prioritize tools and skills by role: AI courses by job.
Your membership also unlocks: