Stability AI Unveils “Stable Audio”: A Text-to-Audio AI Platform for Creative Sound Generation

In the ever-evolving landscape of artificial intelligence, London-based generative AI company Stability AI has taken a significant stride forward with the introduction of its groundbreaking platform, “Stable Audio.” This new venture marks Stability AI’s debut in the realm of music and sound generation, leveraging the power of artificial intelligence to convert text into audio. With the capability to produce songs of up to 90 seconds in length, Stable Audio opens up a world of possibilities for creators in various domains, including commercials, audiobooks, and video games.

Stability AI has long been a prominent figure in the AI industry, renowned for its pioneering work in AI-generated visuals. However, with the launch of Stable Audio, the company now steps into direct competition with other industry giants such as OpenAI, Google, and Meta.

At the core of the Stable Audio platform lies a diffusion model, the same AI model that drives Stability AI’s highly regarded image platform, Stable Diffusion. However, for Stable Audio, this model has been meticulously trained using audio data instead of images. This fundamental shift empowers users to generate songs or background audio of any desired length, rendering it an adaptable tool for a multitude of creative projects.

What sets the Stable Audio platform apart is its commitment to addressing the limitations of conventional audio diffusion models. By undergoing specialized music-centric training and incorporating essential text metadata that specifies song start and end times, Stable Audio permits users to generate songs of varying durations, effectively eliminating the constraints imposed by fixed-duration audio clips.

In a statement reported by The Verge, Stability AI emphasized the groundbreaking nature of Stable Audio, saying, “Stable Audio represents the cutting-edge audio generation research by Stability AI’s generative audio research lab, Harmonai. We continue to improve our model architectures, datasets, and training procedures to enhance output quality, controllability, inference speed, and output length.”

The development of Stable Audio involved extensive training with a comprehensive dataset of over 800,000 audio files, encompassing music, sound effects, and individual instrument stems. This vast dataset also integrates text metadata sourced from AudioSparx, a stock music licensing company, culminating in a staggering 19,500 hours of diverse sound content. Stability AI has diligently secured the necessary permissions to employ copyrighted materials through its strategic partnership with a licensing company.

For users eager to harness the capabilities of Stable Audio, Stability AI has introduced three distinct pricing tiers:

  1. Free Version: This entry-level tier grants users the ability to generate up to 45 seconds of audio for a maximum of 20 tracks per month. However, users are restricted from using the audio generated with Stable Audio for commercial purposes.
  2. Professional Level ($11.99): At this tier, users gain access to a more robust offering, allowing the creation of 500 tracks, each of which can be up to 90 seconds in duration.
  3. Enterprise Subscription: Tailored to the needs of companies seeking customized usage plans and pricing structures, this tier provides flexibility and scalability for larger-scale projects.

While text-to-audio generation is not an entirely novel concept, Stability AI’s Stable Audio introduces a promising leap in terms of accessibility, quality, and versatility. Previously, companies like Meta and Google have ventured into generative AI-based audio tools, such as AudioCraft and MusicLM. However, these offerings are often restricted to researchers and select audio professionals, making Stable Audio’s accessible pricing tiers an attractive option for a broader range of creatives.

As AI continues to reshape the creative landscape, Stability AI’s Stable Audio represents a significant step forward in AI-powered sound generation, empowering creators to explore new horizons in audio content production. Whether you’re a budding musician, a content creator, or a game developer, Stable Audio offers a valuable tool to unlock your creative potential. As this technology continues to evolve, we can expect further innovations in AI-driven audio production, ushering in a new era of creative possibilities.

Related posts

Leave a Comment