IBM, Nvidia, and Red Hat Back New AI-Native Document Format
A working group hosted by the Linux Foundation is developing DocLang, an open specification for documents designed specifically for AI systems rather than human readers. IBM, Nvidia, and Red Hat founded the effort, with ABBYY and Human Signal contributing to development.
The specification addresses a practical problem: current document formats like PDFs and Word files were built for human consumption, forcing AI systems to waste computational effort extracting meaning. DocLang defines a structured, machine-readable format similar to how JSON standardizes data, allowing any tool to implement it and any pipeline to consume it.
The working group builds on DocLing, an existing toolkit that converts human-readable documents into structured data. DocLang extends that work into a vendor-neutral standard for enterprise use.
Why This Matters for IT Teams
Organizations increasingly rely on generative AI and agentic systems to process business documents. The current fragmented approach-handling PDFs, JPEGs, spreadsheets, and other formats-introduces complexity, raises costs, and reduces reliability when extracting information at scale.
A standard format would let teams automate document preprocessing. When a user uploads a document to an AI agent, a preprocessing skill could convert it to DocLang format automatically, reducing token consumption and improving efficiency.
The approach also supports exporting AI-generated outputs-visualizations or structured data-back to formats humans can use outside AI tools.
Standards Need to Evolve
Existing document standards served their purpose for decades but weren't designed for AI workflows. Carmi Levy, an independent technology analyst, said documents in the AI era are more iterative and dynamic than static file formats allow.
"DocLang represents an early, best hope of achieving some kind of foundational baseline for document standards, one that will hopefully allow more intelligent, more efficient, lower-risk workflows than is currently the case," Levy said.
Taking an open-source, vendor-agnostic approach mirrors how earlier standards-for networking, documentation, the web, and cloud computing-enabled broad digital collaboration rather than locking users into proprietary systems.
Governance Questions Remain
Jason Andersen, principal analyst at Moor Insights & Strategy, supports automated preprocessing but warns the standard must preserve user choice. "These standards need to preserve the fact that humans can still do what they want, and do not need to know any coding to be proficient," he said.
Yaz Palanichamy, senior research analyst at Info-Tech Research Group, flagged a different concern: organizations adopting DocLang will need to implement and review controls to scale its use securely and maintain accountability. The governance framework around how documents flow through AI systems remains undefined.
The specification is still in development, with the working group accepting additional contributors. For AI for IT & Development professionals, understanding DocLang's role in document processing pipelines will become relevant as adoption grows in enterprise environments.
Your membership also unlocks: