Parloa uses OpenAI models to build and evaluate voice agents for enterprise customer service

Berlin startup Parloa automates high-volume call center work for large enterprises, with AI agents handling millions of voice conversations across retail, travel, and insurance. One deployment cut requests for human agents by 80%.

Categorized in: AI News Customer Support
Published on: May 09, 2026
Parloa uses OpenAI models to build and evaluate voice agents for enterprise customer service

Parloa builds customer service agents that handle millions of conversations

Berlin-based Parloa uses AI models to automate high-volume customer service interactions for large enterprises. The company's AI Agent Management Platform (AMP) lets non-technical teams design, deploy, and manage voice-driven customer support at scale without writing code.

Parloa started with a simple observation. Co-founder Stefan Ostwald spent a day in an insurance call center and heard the same conversations repeat: password resets, policy questions, routine changes. Much of that work could be automated.

The company initially built rule-based voice agents. With the rise of large language models, Parloa shifted to a different approach. Instead of mapping rigid workflows, teams now define agent behavior in natural language, connect to internal systems, and test changes before deployment.

How the platform works

Subject matter experts define an agent's role, instructions, tools, and boundaries in plain language. That configuration becomes the basis for how the model operates in production.

Before going live, Parloa simulates customer conversations. One model acts as the caller while another runs the configured agent. Teams inspect these interactions, test changes against realistic scenarios, and iterate.

The platform then evaluates those simulated conversations using both rule-based checks and AI-powered scoring. It measures whether the agent followed instructions, used tools correctly, and completed tasks as expected.

During live conversations, the platform prompts a language model with the agent configuration and conversation context. It retrieves information from internal systems and triggers tools to interact with customer backends. After each call, separate workflows summarize the interaction, classify customer intent, and measure performance.

As agents grew more complex, Parloa introduced a modular approach. Tasks like authentication or booking changes run as separate sub-agents, improving instruction-following and making systems easier to update over time.

The platform also incorporates deterministic controls where reliability matters most. Enterprises can define structured API chains and event-based logic to ensure critical steps happen in the right order.

Testing before deployment

Parloa's core differentiator is its evaluation-first approach. When a new model becomes available, the company runs its own benchmarking suite against real production scenarios, not theoretical tests.

The team mirrors actual customer service agents and runs them through simulation and evaluation pipelines. These tests measure instruction-following reliability, API-calling consistency, latency, and overall performance under realistic conditions.

Only models that perform reliably across real customer scenarios get deployed. "Enterprise customers face a real migration cost," says MatthΓ€us Deutsch, Senior Applied Scientist at Parloa. "Once a system is working in production, they keep it stable and only switch when the benefits are clear."

This approach produces measurable results. In one deployment with a global travel company, Parloa's agents reduced requests for human agents by 80%. Across millions of customer interactions, most calls resolve without friction.

Voice introduces different constraints

Voice-based support requires optimization that text-based chat does not. Every interaction runs through a pipeline: speech-to-text, model reasoning, and text-to-speech. Even small delays compound into noticeable pauses for the caller.

Parloa evaluates each component independently. Speech-to-text systems are tested for accuracy on sensitive inputs like policy numbers. Text-to-speech models undergo blind listening tests to assess how natural they sound to real users.

The company built these systems for global deployment from the start. Benchmarks span multiple languages, reflecting both Parloa's European roots and enterprise requirements for consistent performance across regions.

Today, Parloa's agents handle millions of conversations across retail, travel, and insurance, including revenue-generating flows like teleshopping.

What's next: multimodal customer journeys

Parloa sees customer service evolving into multimodal experiences. A conversation might start on the phone, continue in chat, and include interactive elements. The platform is designed to treat this as a single interaction, not separate flows.

As enterprises automate more customer interactions, Parloa focuses on making AI agents reliable, flexible, and trustworthy enough to operate at global scale. Learn more about AI for Customer Support and AI Agents & Automation.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)