OpenAI’s gpt-realtime Shakes Up Voice AI Market as T-Mobile Trials Humanlike Support Agents
OpenAI’s gpt-realtime model streamlines AI voice agents with unified audio processing and emotional nuance. Startups relying on basic telephony may face tougher competition.

OpenAI’s New Speech-to-Speech Model Challenges Voice AI Startups
OpenAI recently launched its most advanced speech-to-speech model, gpt-realtime, alongside the general release of its Realtime API. These tools simplify creating AI voice agents, particularly for phone-based customer support. The new API supports image inputs and remote MCP servers, making voice agents more versatile and capable.
One standout feature is SIP telephony support, which allows developers to connect phone numbers from services like Twilio directly to OpenAI’s interface. This streamlines building voice-over-phone applications, a common need in customer support environments.
The Impact on Voice AI Startups
Many startups that simply provide phone network interfaces for existing speech-to-speech AI services face increased competition. According to Andreas Granig, CEO at Sipfront, the voice interface for AI assistants is becoming a commodity. Startups that rely on third-party telephony services without unique offerings will find it harder to stand out. However, companies focusing on advanced tool integrations remain less affected, as that requires specialized expertise.
What Makes the gpt-realtime Model Stand Out?
- Unified audio understanding: The model processes audio input and output without needing separate transcription or voice synthesis steps.
- Faster responses: Handling all tasks within one model reduces latency.
- Emotional nuance: It can detect and replicate subtle sounds like laughter or sighs, improving naturalness.
- Instruction following: The model manages complex conversations with multiple turns effectively.
- Customization: Developers can modify pace, tone, style, and even roleplay characters.
- Robustness: Better at understanding unclear audio and long alphanumeric strings such as phone or license numbers.
These features make it appealing for customer support teams aiming to automate or improve voice interactions.
Cost and Control Considerations
Despite the benefits, the cost of gpt-realtime remains relatively high—$32 per million audio input tokens and $64 per million output tokens. This is roughly four times the cost of traditional pipelines chaining speech-to-text, language models, and text-to-speech systems.
Additionally, some experts point out limited control in the current model. Unlike “chained” systems, it lacks the ability to vary voice, apply guardrails, or switch models mid-conversation. For voice AI companies requiring granular customization, this could be a drawback.
Enterprise Adoption: T-Mobile’s Use Case
T-Mobile has tested OpenAI’s speech models over six months and recently gained access to gpt-realtime. They report significant improvements in customer engagement, especially for device upgrade calls.
Julianne Roberson, Director of AI at T-Mobile, showcased how the AI assistant guides customers through selecting phones under $300, checking compatibility with satellite services, and confirming plan eligibility. The assistant handles unpredictable conversations smoothly while recognizing emotions and multimodal inputs like images.
This approach supports T-Mobile’s goal to deliver expert-level service across all channels using AI. Their close collaboration with OpenAI might offer insights into the future of voice-based customer support.
What This Means for Customer Support Professionals
If you work in customer support, OpenAI’s new tools could shift how voice interactions are automated and managed. The ability to deploy production-ready voice agents with emotional awareness and multimodal capabilities promises more natural and efficient service.
However, the higher cost and less granular control mean teams should carefully evaluate whether gpt-realtime fits their needs or if existing chained solutions remain more practical.
For those interested in expanding their AI skills relevant to customer support automation, exploring targeted courses can provide an edge. Resources like Complete AI Training’s customer support AI courses offer practical guidance on integrating AI tools effectively.