Winning with AI Comes from Scenarios, Not Base Models
Ask a product team what makes AI useful, and you'll hear it: the scenario. That's the core of Zhang Hao's message as CTO of Huolala. Base models are public utilities; your advantage is how you apply them to your data, processes, and user moments.
Huolala operates in over 400 cities with nearly 20 million monthly active users and 2 million active drivers. The business is simple to explain and hard to execute: match shippers and drivers faster, safer, and with less friction. That puts two targets at the center-operational efficiency and user experience.
How They Prioritized AI (and What They Skipped)
Two years ago, the team assessed where AI would move the needle most. They borrowed a practical approach: job surveys, task breakdowns, and automation difficulty ratings, similar to the method discussed in the 2023 Goldman Sachs AI analysis. High data density and labor-heavy tasks got priority; high-certainty analytics waited their turn.
More importantly, they stopped chasing a proprietary foundational model. The pace of base model progress outstrips most in-house efforts. Instead, they doubled down on three assets: digital data, business APIs, and institutional know-how.
That decision led to a different bet: build an internal AI application platform so every improvement in public models instantly boosts outcomes across their stack.
The Three Internal Platforms
- Wukong Platform: Lets non-technical users assemble intelligent agents in minutes.
- Visual process orchestration to connect company APIs and data assets.
- Zero-code agent creation via natural language.
- Enterprise tool library and MCP compatibility to standardize capabilities (Model Context Protocol).
- Dolphin Platform: One-stop workflow for ML teams-from data prep and training to deployment and lifecycle management. The goal: reduce overhead so algorithm engineers spend time on models, not plumbing.
- Evaluation & Annotation Platform: AB testing and model PK with tight segmentation plus the Lala Intelligence Evaluation system. Good launches need repeatable, audited outcomes; this platform makes that real.
Application Scenarios That Actually Shipped
- AI security prevention and control: Real-time detection of illegal passenger-carrying, dangerous goods, and risky driving using voice, images, and unstructured signals. Short intervention windows demand fast decisions and high accuracy.
- AI coding in R&D: Now used by ~90% of people and teams; ~60% penetration across the product-to-deploy pipeline. Net throughput gain sits around 10% given verification and testing overhead. Strong for new projects and front-end work; complex business logic still needs human steering.
- "Take a Photo to Select a Vehicle": Point-cloud segmentation estimates cargo volume from a single photo and matches the right vehicle within ~10 seconds. Solves uncertainty for first-time or infrequent shippers.
- User feedback analyzer: A small model handles fast classification; an LLM summarizes patterns. Example: surfaced invoice issuance inefficiency quickly-previously easy to miss.
- AI product knowledge expert: Pulls from PRDs, repos, configs, and more to answer "who/why/how" behind features. Reduces knowledge blind spots across departments.
- SMS content optimization: LLM rewrites shorter, clearer messages and pre-checks for compliance risks. Result: ~12% annual cost reduction and fewer brand risks at scale.
- AI digital human business partner: ASR → LLM → TTS pipeline with hot-words and acoustic model tuning. Dialect-aware voice increased perceived trust and naturalness. Metrics: 94% semantic ASR accuracy, ~92% human-likeness.
- Emotion-aware support: Question rewriting, scenario routing, and a multi-agent setup improved resolution rates and accuracy for anxious or angry users.
What Moved the Numbers
- Risky orders (dangerous goods, illegal passenger-carrying): down ~30%.
- Order reminder coverage in safety workflows: ~100%.
- AI coding: ~10% net efficiency gain; ~60% pipeline penetration.
- SMS costs: ~12% annual savings.
- ASR semantic accuracy: ~94%; perceived human-likeness: ~92%.
The broader takeaway: in service-heavy O2O businesses, AI's average effect is a modest 5-10% efficiency gain. Some roles see larger shifts, but the consistent wins are cost reduction, risk prevention, and smoother operations.
Practical Playbook for Product Teams
- Start with a work map: Job × task × error tolerance × data density. Prioritize high-volume, high-friction workflows. A similar framing is discussed in the generative AI research by Goldman Sachs (link).
- Treat base models as interchangeable parts: Your moat is data, APIs, and process know-how. Build an application layer and tool registry (MCP-compatible) that survives model swaps.
- Institutionalize evaluation: Gold datasets, adversarial tests, and online AB as first-class citizens. Every release should be explainable and repeatable.
- Optimize for latency and accuracy: If you chain ASR → LLM → TTS, measure end-to-end. Consider end-to-end multimodal models as they mature to cut hops and drift.
- Quantify wins that matter to P&L: Incident rate, time-to-resolution, cost per message, % AI-generated code deployed, safety reminders delivered, and CSAT shifts.
- Design for trust: Dialect, prosody, and context memory matter in voice. Small changes in tone can lift perceived credibility.
- Plan orchestration: One "digital human" is useful; many, coordinated across upstream to downstream, is where compounding gains show up.
What's Next
Base models keep improving, and that alone will lift well-built applications. Huolala's path points to a near-term focus: multimodal, end-to-end pipelines to reduce latency and error surfaces, plus orchestration of multiple agents across the full process.
If your team is upskilling across roles (PM, Eng, Ops), a curated set of AI learning paths by job can help standardize the baseline across functions: Courses by job.
Your membership also unlocks: