FriendliAI secures $20M to boost AI inference speed and cut costs for developers
FriendliAI raised $20M to speed up AI inference and cut costs by up to 90%. Its continuous batching boosts large language model throughput over tenfold.

FriendliAI Raises $20M to Boost AI Inference Workloads
FriendliAI Corp., a startup focused on speeding up AI model inference, has secured $20 million in funding. Capstone Partners led this seed extension round, with participation from Sierra Ventures, Alumni Ventures, KDB, and KB Securities. This follows the company's initial $5 million raise back in 2021.
Cutting Inference Costs and Improving Speed
The company offers a software platform called the Friendli Engine, which can reduce inference costs by up to 90% while also improving AI response times. The engine achieves these gains through low-level optimizations applied directly to customers' AI workloads.
Large language models (LLMs) typically process user requests in batches. When one request finishes earlier than others in the same batch, the results are delayed until all prompts are complete. This batching process can slow down response times significantly.
Continuous Batching: Faster Processing Without Delay
FriendliAI addresses this problem with a technique called continuous batching. This method changes the order in which inference requests are processed, minimizing unnecessary wait times. The company claims continuous batching can increase LLM throughput by more than ten times in some cases.
In addition, FriendliAI recently added support for N-gram speculative decoding. This technique allows LLMs to reuse data from previous prompt responses when generating new outputs, improving efficiency compared to generating everything from scratch.
Three Offerings for Different Needs
FriendliAI commercializes its technology through three main products:
- Friendli Container: Enables organizations to run FriendliAI’s software on private GPU clusters.
- Cloud Service for Open-Source Models: Offers inference capabilities without the need for customers to maintain infrastructure.
- Friendli Dedicated Endpoints: Supports custom LLMs and automatically adjusts GPU allocation based on workload demands.
Growing Client Base and Future Plans
According to Crunchbase, FriendliAI currently serves between 25 and 30 large clients. These customers are expected to help the company increase revenue by as much as 600% this year. Although not yet profitable, FriendliAI maintains strong gross margins.
The new funding will support expanding go-to-market initiatives in North America and Asia. The company also plans to enhance its inference software and acquire additional GPUs for its cloud services.
For developers and IT professionals interested in AI acceleration and inference optimization, staying informed about these advancements is key. You can explore relevant AI courses and resources at Complete AI Training.