1. News
  2. AI
  3. Stability AI Unveils Fast Text-to-Audio Model

Stability AI Unveils Fast Text-to-Audio Model

featured
Share

Share This Post

or copy the link

Stability AI has unveiled a new artificial intelligence model for text-to-audio generation, developed in collaboration with Arm. Announced on Wednesday, the model is named Stable Audio Open Small and is designed to create brief audio samples from text prompts. The London-based AI company highlighted that this lightweight model is optimized to operate entirely on Arm CPUs, featuring a swift generation time suitable for numerous use cases. The open-source audio model can be downloaded from platforms like GitHub and Hugging Face.

Stability AI Releases Stable Audio Open Small

In a recent news post, Stability AI elaborated on their new model. This version is a distilled iteration of the Stable Audio Open model, which launched in June 2024, and has the capability to generate audio lasting up to 47 seconds. The new text-to-audio model prioritizes reduced size and increased generation speed.

The Stable Audio Open Small model, encompassing 341 million parameters, is capable of generating audio samples for up to 11 seconds. The firm claims that it can produce an audio sample in less than eight seconds while running locally on a smartphone. Notably, the partnership facilitating generative audio creation was announced during the Mobile World Congress (MWC) 2025 event.

Regarding its architecture and training, the Stable Audio Open Small employs a latent diffusion model founded on transformer architecture. It has been trained on a dataset comprising 486,492 licensed audio recordings. For the text conditioning aspect, a publicly available pre-trained T5 model was utilized. Additionally, the Adversarial Relativistic-Contrastive (ARC) algorithm was implemented in the post-training phase to enhance prompt adherence and boost inference speed.

The company asserts that this text-to-audio model is ideal for producing drum loops, foley, instrument riffs, and ambient textures. Its compact size enables deployment on Arm-powered smartphones and edge devices. This model is particularly beneficial in scenarios requiring real-time audio generation and rapid response.

The model weights for Stable Audio Open Small are available for download through the AI firm’s Hugging Face listing, while the code base can be accessed via the GitHub listing. The AI model is licensed under the permissive Stability AI Community Licence, allowing for both commercial and non-commercial use.

Stability AI Unveils Fast Text-to-Audio Model
Comment

Tamamen Ücretsiz Olarak Bültenimize Abone Olabilirsin

Yeni haberlerden haberdar olmak için fırsatı kaçırma ve ücretsiz e-posta aboneliğini hemen başlat.

Your email address will not be published. Required fields are marked *

Login

To enjoy Technology Newso privileges, log in or create an account now, and it's completely free!