White House AI Memo Targets Bias in Government, But Enforcement Remains Unclear
The White House Office of Management and Budget issued a procurement memorandum in December requiring federal agencies to purchase artificial intelligence systems free from bias. The guidance, known as M-26-04 or the "Public Trust AI Memo," specifically targets large language models-the technology behind systems like ChatGPT that federal agencies increasingly use to process benefits, summarize reports, and support clinical reviews.
The memo establishes clear principles: LLMs must be truthful, prioritize historical accuracy, and function as neutral tools. Yet the document contains significant gaps that may allow biased systems to remain in government use, according to policy analysis.
The Core Problem: Enforcement Without Teeth
The memo allows agencies to modify existing contracts only "to the extent practicable"-language that creates a loophole for major vendors already under contract. OpenAI, xAI, Amazon, Meta, and Microsoft have foundational agreements with the federal government that predate this guidance.
All existing contracts should be reviewed immediately for compliance, not left until renewal. The federal government currently has no guardrails beyond those vendors voluntarily provide, according to government statements about systems contracted through USAI.gov.
White House Office of Science and Technology Policy Director Michael Kratsios told the Senate Commerce Subcommittee that "repercussions for selling a model to the U.S. government that isn't truth seeking and accurate are pretty harsh." The administration has not yet enforced those consequences.
Why LLM Requirements Are Difficult to Meet
The memo requires systems to be "truthful" and "objective," but LLMs don't actually verify factual accuracy. They generate text based on patterns in training data, not causal reasoning. An LLM cannot determine whether its own output is correct.
Bias is built into every LLM. These systems amplify whatever views appear most frequently in their training data. If training data comes from politically polarized sources, the model's responses will reflect that polarization. The memo does not require vendors to disclose where data came from or how models were trained.
LLMs also hallucinate-producing confident-sounding false information. They're designed to keep users engaged rather than acknowledge uncertainty.
Transparency Requirements Fall Short
The memo requires vendors to provide "model, system, and/or data cards" documenting training processes, identified risks, and performance benchmarks. The problem: the government hasn't specified what information these documents must contain or what level of detail is acceptable.
This allows vendors to provide generic responses while withholding critical details. When xAI released Grok 4, it didn't initially include a system card. Later versions acknowledged bias and deception but provided no details on how the company mitigated these issues. A recent update to Grok 4.1 doesn't mention bias at all.
The public has little visibility into which agencies use which systems. At the start of 2025, federal agencies reported 2,133 AI use cases. The government has not published a complete consolidated inventory of current deployments, despite requirements to do so.
Feedback Mechanisms Don't Include Government
The memo requires vendors to establish ways for users to report problematic outputs-hotlines, emails, or online forms. But these complaints go directly to the vendor, not to the government or the public.
If a federal employee encounters an LLM generating false information or biased responses, they report the problem to the company that built it. The vendor then decides whether to fix the issue. No public record of the complaint is created, and other users may never learn about shared problems.
A federal employee using a government-deployed model should not be expected to resolve issues directly with the vendor. Complaints about government AI systems should go to the government.
Third-Party Components Lack Transparency
Modern LLMs incorporate datasets, code libraries, and tools from multiple sources. OpenAI's system documentation mentions "diverse datasets, including information that is publicly available on the internet" and partnerships with unnamed third parties-but provides no specifics about data sources, quality, or security.
The memo does not explicitly require vendors to disclose information about third-party components. This means the government cannot verify whether models were trained on trusted sources or on polarized, inaccurate, or foreign-made content.
Vendor Self-Evaluation Creates Conflicts of Interest
The memo relies on vendor-provided benchmark scores to assess LLM performance. AI companies routinely optimize their systems to score well on popular benchmarks-a practice known as "gaming the results"-while still exhibiting problems in real-world use.
The government cannot outsource testing to the companies whose products are being tested. Federal procurement professionals interviewed for this analysis unanimously agreed that AI systems require independent evaluation, separate from vendor claims.
OMB's April 2025 memo M-25-22 requires independent evaluation for AI acquisition generally, but the Public Trust AI Memo does not mention this requirement for LLMs specifically.
The Case of Grok
xAI's Grok model, contracted to the federal government for $0.42 per agency, illustrates the enforcement gap. Civil society organizations warned the White House that Grok violates the executive order requiring neutral, objective AI in government.
The model has generated Holocaust denial talking points, conspiratorial content about South Africa, and other inflammatory material. Multiple countries investigated or removed Grok from public markets. The Trump administration has continued expanding its use, including on the Department of Defense's GenAI.mil platform, which 3 million military and civilian personnel access.
The administration has not taken action to suspend or modify this contract despite the documented problems.
What Needs to Change
Federal procurement professionals and policy experts have recommended several steps:
- Require impact assessments before AI procurement, not after deployment
- Establish public registries of procured AI systems and their uses
- Conduct independent, objective evaluation of vendor claims
- Apply new standards to existing contracts immediately, not just future purchases
- Create public reporting mechanisms for AI incidents, not vendor-only feedback channels
- Require detailed disclosure of training data sources, third-party components, and evaluation methods
The memo's title emphasizes "public trust." Trust requires transparency and accountability-not vendor discretion and delayed enforcement.
Federal agencies have a responsibility to the public for the tools they deploy. That responsibility does not expire when a contract is signed. The government should ensure all AI systems meet the standards it established, whether deployed today or next year. Anything less undermines the administration's own credibility and creates national security vulnerabilities.
For federal employees implementing AI systems, understanding these gaps is critical. The memo sets expectations, but enforcement-and the resulting quality of government AI-depends on agency action and public scrutiny.
Your membership also unlocks: