Governments Are Using AI To Draft Legislation. What Could Possibly Go Wrong?
UK officials faced 50,000+ consultation responses on a water-sector overhaul. An in-house tool, Consult (part of the Humphrey suite), grouped the lot into themes in roughly two hours. Claimed cost: £240, followed by 22 hours of expert review. Scaled across departments, they say that could save 75,000 days of manual analysis a year.
The draw is obvious: free civil servants from sorting and summarizing so they can make decisions. The risk: once AI sits in the middle of public input, the method that filters those voices becomes a decision-maker itself. That's a legitimacy issue, not a tooling quirk.
Where AI is already in the legislative pipeline
- Italy's Senate clusters similar amendments, finds overlaps, and flags likely filibuster plays so staff can compare substance faster.
- The European Commission is procuring multilingual chatbots to help users understand obligations under the AI Act and the Digital Services Act. See the official overview of the EU AI approach here.
- Italy's Chamber of Deputies backs GENAI4LEX-B to summarize committee amendments and check drafting standards.
- Brazil's Chamber of Deputies is expanding Ulysses, rolling out "Ulysses Chat," and allowing staff to use external LLMs with promised security and transparency.
- New Zealand's Parliamentary Counsel Office tested AI for first-draft explanatory notes, with explicit guardrails on data sovereignty and over-reliance.
- Estonia's Prime Minister Kristen Michal has urged parliament to use AI to catch errors after a public "vibe-coded" bug-finder exposed a loophole that let online casinos dodge taxes-an estimated €2 million a month lost.
The quiet flood: AI-generated participation
It's not just analysis. It's volume. Public consultations can be swamped by mass-generated submissions, template amendments, and copied objections. That's a legislative DDoS-not hacking systems, but overwhelming them with plausible noise.
One obvious fix: one citizen, one submission, verified. Without basic guardrails, public consent can be diluted fast, and foreign actors can skew the signal. There are already tools that auto-draft objections to UK planning applications. That genie won't go back in the bottle.
Trust and transparency are already thin
In 11 of 28 countries surveyed in Edelman's Trust Barometer, government is more distrusted than trusted. In the UK, just 29% say they trust their government to use AI accurately and fairly. If AI becomes a black box in rulemaking, that number drops, not rises.
Link the tool to the process, or you'll own the blowback. Public disclosure of where AI was used, how it was tested, and where humans overruled it isn't a "nice to have." It's the price of legitimacy. See the public trust data from Edelman here.
Legal exposure: the US warning shot
US agencies have discussed using commercial models to draft supporting text for rulemaking. That invites a simple court test: did the agency follow required procedures, or did AI short-circuit reasoned analysis? If a judge sees "arbitrary and capricious," the rule can be tossed. Speed is good-until it erases the record you need to defend your work.
Guardrails that actually work (and survive scrutiny)
- Identity and rate-limiting: One verified person, one submission. Use government-backed identity, verified accounts, or verifiable credentials. Deduplicate templated inputs and weight unique contributions.
- Provenance on inputs: Flag likely AI-generated content. Prefer signed submissions and adopt content provenance standards where feasible. Don't ban AI-authored input; just label and weigh it appropriately.
- Transparent use of AI: Publish a plain-English note with each consultation or bill: models used, version/date, prompts/workflows, datasets (where possible), known limits, and what humans checked.
- Human-in-the-loop by default: Set minimum expert review thresholds. Require dissent capture and minority-view summaries so clustering doesn't erase edge cases.
- Model resilience and vendor independence: Keep multi-model redundancy. Record model versions, prompts, seeds, and outputs. Plan for outages, forced upgrades, and exit. Prefer data residency that matches your statute.
- Auditability: Store prompts, intermediate notes, and final outputs with timestamps for FOI/discovery. Make decisions reproducible or, at least, explainable to a layperson.
- Security and red-teaming: Test systems against spam floods, prompt injection, and targeted manipulation. Run continuous dedup checks. Monitor sudden shifts in themes that suggest coordinated campaigns.
- Process compliance: AI can draft; it cannot replace notice-and-comment, evidentiary records, or reasoned responses. Every claim in a rule needs a sourced basis a human can defend.
- Quality gates: Pre-deployment evals for legal accuracy, bias, and plain-language clarity. Define "error budgets" for drafts and halt if exceeded.
- Cost realism: Track not just time saved but error costs, rework, and challenge risk. Pilots first; kill switches always.
A simple workflow you can actually run
- Intake: Verify identity; label suspected AI-generated content; strip duplicates.
- Triage: Cluster themes with two different models; compare overlaps; log discrepancies.
- Synthesis: Produce a summary plus a "minority views" addendum. No single summary without the addendum.
- Expert review: Policy lead and counsel sign off. Track changes between AI draft and human final.
- Disclosure: Publish the AI-use note and method. Invite targeted feedback on the synthesis, not just the raw policy.
- Archive: Store versions, prompts, and model IDs for audit and potential litigation.
Metrics that matter (so you know it's working)
- Verified unique participants vs. total submissions
- Deduplication rate and suspected-AI share
- Turnaround time vs. human review hours
- Policy-relevant error rate found in review
- Outages/model-drift incidents and time to recover
- Number of legal challenges tied to process-and outcomes
- Public trust delta on AI use across consultation cycles
Use AI where it helps; avoid it where it hurts
- Good fits: Clustering similar amendments, cross-referencing statutes, drafting explanatory notes, plain-language summaries for the public, comparative analysis across jurisdictions.
- Bad fits: Final policy judgment, skipping statutory procedures, suppressing minority input, and any step you cannot explain or reproduce if questioned.
The bottom line
AI can make legislative work cleaner, faster, and more consistent. It can also undermine the very consent you need to govern if it becomes a shortcut for listening or a black box in the record.
Treat these systems like industrial tools: strong guards, clear labels, and trained operators. If you design for legitimacy first-verification, transparency, human accountability-you get the gains without handing opponents an easy win in court or the press.
Want to skill up your team responsibly?
If you're standing up internal AI workflows and need practical training on risk, prompting, and evaluation, see our latest programs for public-sector roles here.
Your membership also unlocks: