Japan's Digital Agency and OpenAI to co-develop generative AI for government, ChatGPT coming to officials
Japan's Digital Agency will use OpenAI to bring generative AI into government work, easing routine tasks. Priorities: security, evaluation, and possible domestic models.

Japan's Digital Agency partners with OpenAI to deploy generative AI for government work
October 3, 2025 - Tokyo. Japan's Digital Agency announced a collaboration with OpenAI to develop and use generative AI applications across government agencies. The goal is clear: streamline administrative tasks and reduce repetitive workload.
What's being rolled out
The agency will explore OpenAI's foundational models to build specialized applications for public-sector workflows. Officials will also gain access to ChatGPT for day-to-day duties, bringing AI assistance to routine research, drafting, and inquiry handling.
There are currently 20 types of generative AI applications specialized for administrative work, including the "Diet response search AI" and the "legal system research support AI," developed by the Digital Agency. This partnership focuses on expanding and strengthening that application stack for government use.
Why OpenAI was selected
The Digital Agency evaluated OpenAI's system across multiple metrics and found its performance strong enough to adopt for this initiative. At the same time, the agency is considering future use of domestically developed AI to diversify options and reduce vendor risk.
Technical implications for government IT
- Data governance: Set strict guardrails for PII handling, retention, redaction, and audit trails. Define prompt/response logging policies early.
- Security model: Use private networking, scoped API keys, and policy-based access. Consider role-based prompt libraries for common tasks.
- Model strategy: Start with base models plus Retrieval-Augmented Generation (RAG) over agency docs before considering fine-tuning. Track context window usage and quality drift.
- Evaluation: Build a repeatable evaluation harness with task-specific test sets, automatic metrics (e.g., BLEU/ROUGE where relevant), and human review for accuracy and tone.
- Latency and cost: Cache frequent queries, batch long-running jobs, and monitor token spend. Define SLAs for critical workflows.
- Compliance: Ensure outputs meet legal and policy standards. Add human-in-the-loop for sensitive responses and legal interpretations.
- Localization: Optimize prompts and embeddings for Japanese language and public-sector terminology to reduce hallucinations.
- Operations: Version prompts, documents, and model choices. Instrument with observability for failure modes, abuse detection, and performance regression.
Action items for engineering teams
- Identify 3-5 high-volume text tasks (e.g., inquiry summarization, policy lookup, response drafting) and run a contained pilot.
- Stand up a RAG pipeline with a vetted document store, strict access controls, and freshness checks for legal/policy updates.
- Create a redaction layer for inputs/outputs and set confidence thresholds that route low-confidence answers to human review.
- Ship an internal prompt library with templates for common scenarios and enforce versioning through CI/CD.
- Track metrics that matter: accuracy, time saved, escalation rate, unit cost per task, and user satisfaction.
Helpful links
Bottom line: this collaboration signals a push to make AI a standard utility in government operations. For IT and development teams, the work starts with secure architecture, strong evaluation, and disciplined rollout of high-impact use cases.