Stability AI Extends Audio Generation to 6+ Minutes With New Model Family
Stability AI released Stability Audio 3.0, a family of four audio models that can generate music and sound effects up to six minutes and 20 seconds long. The largest model, available only through API and paid self-hosting, more than doubles the generation length of the company's previous version from 2024.
The company is open-sourcing three of the four models: a small effects model (459M parameters), a small general model (459M parameters), and a medium model (1.4B parameters). The large model (2.7B parameters) requires a paid license, with enterprises earning over $1 million in revenue needing an enterprise agreement.
The small models handle on-device generation up to two minutes. The medium and large models maintain musical structure and melodic coherence across full compositions, a capability Stability Audio 2.0 couldn't sustain.
Licensing Strategy Differentiates Approach
Stability AI built these models on fully licensed data, the company said. In 2024, the company signed deals with Warner Music Group and Universal Music Group to develop music creation tools.
This licensing approach distinguishes Stability AI from competitors like Suno and Udio, both currently defending against lawsuits over data sourcing. Google and ElevenLabs are also releasing music generation models, but the legal status of training data remains contested across the industry.
Professional Music Tools in Development
Stability AI is building a suite of products for professional musicians but provided no specifics on features or timeline. Ethan Kaplan, former chief digital officer at Universal Audio and Fender, joined the company to lead the professional music division.
The hire reflects a broader trend: AI companies are recruiting music industry veterans to build credibility. Suno hired Jeremy Sirota, former CEO of Merlin, as chief commercial officer earlier this year. ElevenLabs brought on Derek Cournoyer from music publisher Kobalt as a strategy lead for music business development.
For product developers, the open-source models represent accessible baselines for building audio generation into applications. The licensing arrangement signals that companies building on Stability's work may face clearer contractual ground than competitors still litigating data rights.
Generative AI and LLM technologies continue to expand beyond text. Understanding how licensing and model architecture affect deployment decisions is relevant for teams building AI for Product Development.
Your membership also unlocks: