Stability AI Unveils Fast Text-to-Audio Model

Stability AI has unveiled a new artificial intelligence model for text-to-audio generation, developed in collaboration with Arm. Announced on Wednesday, the model is named Stable Audio Open Small and is designed to create brief audio samples from text prompts. The London-based AI company highlighted that this lightweight model is optimized to operate entirely on Arm CPUs, featuring a swift generation time suitable for numerous use cases. The open-source audio model can be downloaded from platforms like GitHub and Hugging Face.

Stability AI Releases Stable Audio Open Small

In a recent news post, Stability AI elaborated on their new model. This version is a distilled iteration of the Stable Audio Open model, which launched in June 2024, and has the capability to generate audio lasting up to 47 seconds. The new text-to-audio model prioritizes reduced size and increased generation speed.

The Stable Audio Open Small model, encompassing 341 million parameters, is capable of generating audio samples for up to 11 seconds. The firm claims that it can produce an audio sample in less than eight seconds while running locally on a smartphone. Notably, the partnership facilitating generative audio creation was announced during the Mobile World Congress (MWC) 2025 event.

Regarding its architecture and training, the Stable Audio Open Small employs a latent diffusion model founded on transformer architecture. It has been trained on a dataset comprising 486,492 licensed audio recordings. For the text conditioning aspect, a publicly available pre-trained T5 model was utilized. Additionally, the Adversarial Relativistic-Contrastive (ARC) algorithm was implemented in the post-training phase to enhance prompt adherence and boost inference speed.

The company asserts that this text-to-audio model is ideal for producing drum loops, foley, instrument riffs, and ambient textures. Its compact size enables deployment on Arm-powered smartphones and edge devices. This model is particularly beneficial in scenarios requiring real-time audio generation and rapid response.

The model weights for Stable Audio Open Small are available for download through the AI firm’s Hugging Face listing, while the code base can be accessed via the GitHub listing. The AI model is licensed under the permissive Stability AI Community Licence, allowing for both commercial and non-commercial use.

Stability AI Unveils Fast Text-to-Audio Model

Comment

Stability AI Unveils Fast Text-to-Audio Model

Share This Post

or copy the link

Stability AI Releases Stable Audio Open Small

Tamamen Ücretsiz Olarak Bültenimize Abone Olabilirsin

Related News

Samsung Set to Unveil AI Image-to-Video Feature!

Meta Delays “Behemoth” AI Launch Over Capability Concerns

YouTube Launches AI Ad Feature to Maximize Engagement

Google Unleashes AlphaEvolve: The Future of AI Coding!

Audible Unveils AI Tools to Transform Audiobook Access

Write a Reply Cancel