Vertesia unveils semantic document prep service to boost AI accuracy and speed app development

Vertesia’s new semantic document preparation service automates converting raw documents into structured XML, cutting AI development time by up to 50%. It reduces AI hallucinations by preserving original context for more accurate outputs.

Categorized in: AI News IT and Development
Published on: Jun 04, 2025
Vertesia unveils semantic document prep service to boost AI accuracy and speed app development

Vertesia Launches Semantic Document Preparation Service to Boost AI Accuracy and Speed Development

Vertesia Inc., a low-code platform for building and deploying custom generative AI applications, has introduced a new semantic document preparation service. This tool targets a significant challenge in AI development: preparing documents to ensure reliable, accurate AI outputs while reducing development time.

According to Vertesia's research, up to 50% of the time spent developing generative AI applications goes into preparing documents. This process is often complex and resource-intensive. The newly launched service aims to automate and simplify document preparation by providing developers with APIs that convert raw documents into richly structured, semantically tagged XML without altering the original content.

Addressing AI Hallucinations and Data Preparation Challenges

One of the biggest issues in generative AI is hallucination—when a large language model confidently outputs incorrect or false information. These errors stem from various factors such as noisy or incomplete data and limitations in contextual understanding.

Vertesia’s semantic document preparation service tackles this by preserving the original document’s structure, relationships, and context. This enables large language models (LLMs) to interpret the data correctly, reducing hallucinations and improving the accuracy and relevancy of AI responses.

How the Service Works

The document transformation engine breaks down documents at the page level and selects the best AI model for each content type—whether dense text, tables, or images. It applies a combination of LLMs, optical character recognition (OCR), and vision models to generate high-fidelity XML outputs.

  • Maintains original document structure without rewriting
  • Generates semantically rich, tagged XML ready for AI ingestion
  • Supports complex documents like reports and regulatory filings

This hybrid approach helps maintain consistency and ensures that the AI can work with accurate, well-structured data.

Integration and Use Cases

The Semantic DocPrep service is accessible via API, allowing developers to plug it directly into their development pipelines. Documents can be sent for processing and returned as XML outputs ready for chunking, indexing, and model ingestion.

No additional setup or model training is needed, making it a practical addition for teams building custom generative AI applications or retrieval-augmented generation (RAG) pipelines. These pipelines improve AI output by incorporating real-time data, further enhancing accuracy.

Vertesia’s new service complements its existing platform, which supports organizations in building, deploying, and managing custom AI applications and agents at scale.

For IT and development professionals aiming to optimize AI application workflows, this document preparation service offers a clear path to reduce development time and improve results.

To explore training and tools that help you build and deploy AI applications more effectively, visit Complete AI Training.