AI is catching on in government. The Beeck Center wants to help
After years of drafting policies and debating risk, state and local agencies are stepping into a new phase of generative AI: careful, scaled experiments. The mood has shifted from "write the rules" to "test what works."
At Georgetown University's Beeck Center for Social Impact + Innovation, a new AI innovation and incubation fellow, Andrew Merluzzi, is focused on exactly that-finding the small set of uses that create real value and anticipating the second- and third-order effects that come with them. The goal: help governments move beyond pilots and into durable, low-risk improvements.
From policy to pilots
Los Angeles is rolling out Google's AI productivity tools to tens of thousands of workers. Early priorities include multilingual communications and, over time, smarter traffic and lighting operations.
Maryland has been testing AI for months to speed up website builds, stand up chatbots, and draft talking points. Adoption isn't uniform, though. Some CIOs are prioritizing legacy system replacements over marginal AI gains, and that's a rational call in resource-constrained environments.
Early movers and fast followers
Vermont stood up one of the first statewide AI leadership roles and reports maturity across governance, data retention, and practical outcomes for agencies. They're sharing openly and treating AI as a team sport.
Colorado is playing the second-mover strategy. With a tech-savvy governor, Jared Polis, the guidance to IT has been clear: be bold, but keep guardrails. The state borrowed a "fail fast" approach-fund many ideas, cut losers quickly, scale the few that work. Out of roughly 200 projects, about 5% have delivered outsized results.
One standout: AI-assisted training for staff handling unemployment insurance calls and emergency intake. Time to full productivity was cut in half, which matters in high-turnover roles.
What the Beeck Center is zeroing in on
Merluzzi is focused on uses that fit AI's strengths-pattern recognition and summarizing large volumes of text-and then helping states share the wins so every locality doesn't repeat the same pilot mistakes. Think "learning ecosystem," not isolated experiments.
Good examples already exist. Stanford's Regulation, Evaluation, and Governance Lab (RegLab) has used large language models to sift through regulations, flag redundancies, and save staff countless hours. The lab has also supported local efforts to identify and address racial covenants in land records to meet state requirements. These are the kinds of "high impact, low risk" projects worth copying.
Beeck Center for Social Impact + Innovation | Stanford RegLab
High-impact, low-risk uses to prioritize
- Summarizing regulations, policies, and grant guidance for staff and the public.
- Multilingual drafts for notices, SMS updates, and service pages.
- Knowledge assistants for frontline staff to surface policy snippets and SOPs fast.
- Form assistance and intake guidance to reduce incomplete submissions.
- Website QA to catch broken links, outdated content, and readability issues.
- Training and scenario simulations for call centers and benefits eligibility teams.
- First-draft support for RFPs and memos with mandatory human review and sourcing.
Plan for the risks that people actually face
Many agencies promise a human will make the final decision. That's good, but automation bias is real-people tend to trust system outputs even when they shouldn't. Simply putting a person "in the loop" doesn't guarantee independent judgment.
- Require decision logs that show what the AI suggested and why a human agreed or disagreed.
- Red-team prompts and datasets for each use case; publish known failure modes for staff.
- Run bias, privacy, and records-retention reviews early; assume FOIA discoverability.
- Put "trust but verify" into training: when to ignore the model and escalate.
- Add visible sourcing and confidence notes to AI-generated drafts.
- Track concrete outcomes: time saved, error rates, backlog reduction, user satisfaction.
There's another wrinkle: efficiency can increase demand. Make permits easier to file and you'll get more applications-plus more inspections, cleanups, and customer questions. This is the Jevons-like effect in public service. Plan for it.
- Set capacity triggers (e.g., queue lengths, SLAs) that auto-initiate staffing or vendor support.
- Tie AI wins to downstream budgets-inspections, facilities, and community engagement.
- Model demand scenarios before rollout; update quarterly as behavior changes.
A practical 90-day plan
- Pick three use cases from the list above; define success metrics and sunset criteria.
- Stand up a secure sandbox with audit logging; limit data exposure to what's necessary.
- Create one-page guardrails: approved tools, prohibited data, review steps, and contacts.
- Launch small, time-boxed pilots (6-10 weeks). Share results openly, even the misses.
- Scale one "clear win" to a second agency; document playbooks as you go.
- Join peer networks and compare notes to avoid repeating failed experiments.
Upskill the team that has to use this
Most productivity gains won't come from the tool-they'll come from staff who know how to frame problems, prompt well, and verify outputs. Invest in short, role-based training and keep it close to daily work.
Browse AI learning paths by job role to build consistent skills across policy, legal, operations, and frontline teams.
Keep the focus where it belongs
The point isn't to deploy AI everywhere. It's to save hours, reduce backlogs, improve access, and make services clearer for residents-without eroding judgment or trust. Test, measure, share, and scale the few things that work.
If you do that, the "toddler phase" will pass quickly-and your agency will be ready for bigger, smarter steps.
Your membership also unlocks: