AI Across Federal Agencies: Acceleration, Caution, and a Playbook for Doing It Right
AI isn't just hitting the private sector. It's moving fast inside government, too.
The Washington Post reported 2,987 uses of AI across the executive branch last year, with hundreds labeled "high impact." NASA jumped from 18 reported applications in 2024 to 420 in 2025. The Department of Health and Human Services, overseen by Robert F. Kennedy Jr., now reports 398 uses, up from 255. The Department of Energy and the Commerce Department also saw major increases after the White House moved in April 2025 to clear barriers to AI adoption.
Concerns You Shouldn't Ignore
Rapid adoption raises familiar risks: bias, hallucinations, and the memory of a chaotic AI-enabled overhaul tied to the quasi-official Department of Government Efficiency during Elon Musk's brief orbit near power. The fear isn't abstract-bad rollouts hit real people.
"It's not clear using AI for most government tasks is necessary, or preferable to conventional software," says Chris Schmitz of the Hertie School. Legacy systems tempt "quick wins" that act more like Band-Aids than real modernization.
Why Smart Experimentation Still Matters
Others argue that tested, careful use of AI can be responsible governance. "We never really properly moved government into the internet era," says Jennifer Pahlka, a former U.S. deputy CTO. That gap shows up as delayed services and unmet public needs.
Pahlka's point is simple: it's early, so testing is appropriate-if you build tight feedback loops. "You want ways of experimenting that give you clear and effective feedback, so you catch problems before broad rollout."
Denice Ross, a former White House chief data scientist, adds the non-negotiable: evaluate rigorously. "Collect and analyze data about how [a tool] performs, and the outcomes for different populations." If you can't measure impact, you can't claim it works.
What Federal Leaders Should Do Now
- Start with the problem, not the tool. If conventional software solves it, use that. AI is a poor fix for broken processes.
- Run small, time-boxed pilots. Clear goals, documented risks, pre-defined exit criteria. No endless experiments.
- Build feedback loops. Human review, user testing, red-teaming, and measurable service outcomes before scaling.
- Measure equity impacts. Disaggregate outcomes by population. Flag gaps fast and pause if harm appears.
- Adopt a risk framework. Use recognized guidance such as the NIST AI Risk Management Framework (NIST AI RMF) and OMB direction (M-24-10).
- Protect data. Lock down PII, set retention limits, require clear data-use terms, and audit vendor handling.
- Document everything. Model cards, intended use, known limits, evaluation results, human-in-the-loop steps.
- Train your workforce. Teach prompt craft, review practices, and risk basics. For role-based upskilling, see courses by job.
- Update procurement. Require audit rights, incident reporting, content provenance, and the ability to switch or turn off models.
- Plan for incidents. Create a clear path to report failures, pause systems, notify affected users, and ship fixes.
Where AI Adds Real Value Today
- Summarization and triage. Case notes, complaints, and comment intake-route to the right queue faster with human checks.
- Search over large records. Retrieval-augmented search on public guidance and internal knowledge bases.
- Developer productivity. Code suggestions, test generation, and documentation (with review and logging).
- Anomaly spotting. Fraud indicators, duplicate records, and data quality flags-always with analyst oversight.
- Language access. Translation and plain-language drafts, reviewed by qualified staff.
Guardrails for Sensitive Decisions
- Keep humans in the loop for benefits eligibility, enforcement actions, and determinations that affect rights or liberty.
- Explain decisions. Provide reason codes, source documents, and a clear appeals path.
- Records and transparency. Log prompts, model versions, and outputs to meet records and FOIA obligations.
- No shadow deployments. Use ATO pathways and security reviews; publish an AI use inventory with risk tiers.
- Vendor accountability. Require red-teaming results, bias testing, and model update notifications before changes go live.
A 90-Day Plan You Can Execute
- Days 1-30: Pick two low-risk use cases. Define success metrics, risks, and human review steps. Set up logging and privacy controls.
- Days 31-60: Pilot with a small user group. Run red-team tests. Compare outcomes across populations. Fix or exit.
- Days 61-90: If metrics hold, expand carefully. Publish a one-page model card, impact summary, and contact for issues.
The Bottom Line
Adoption is accelerating. The question isn't "AI or no AI," it's whether agencies use it with discipline, transparency, and measurable benefit.
Experiment-yes. But do it with guardrails, clear outcomes, and the humility to stop what doesn't work. That's how AI actually helps government serve people better.
Your membership also unlocks: