Stability AI introduces Stable Audio for precision-controlled music generation using latent diffusion

Stability AI introduces Stable Audio for precision-controlled music generation using latent diffusion

On the back of significant advancements in generative AI models for imaging, Stability AI is once again making waves, but now in the audio domain. Today, the company unveiled Stable Audio, its latest tool designed to generate high-fidelity, 44.1 kHz music through a mechanism known as latent diffusion.

Trained not just on audio metadata, but also on the durations and start times of audio files, Stability AI touts Stable Audio as the next frontier in AI-driven music creation. This sophisticated model, with its approximately 1.2 billion parameters, offers creators a heightened level of control over both content and length, a leap ahead of previous generative music tools.

Ed Newton-Rex, VP of audio for Stability AI, shared with TechCrunch, "We began our journey with Stable Diffusion, and have since expanded our horizons to languages, coding, and now music. Our vision is centered around multimodality in generative AI."

Contrary to popular belief, Stable Audio wasn't exclusively the brainchild of Harmonai, a research organization previously funded by Stability. This new model was inspired by Dance Diffusion, an earlier model by Stability AI. The subsequent development and training of Stable Audio was a collaborative effort between Stability's newly-formed audio team and Harmonai.

Highlighting the advancements, Newton-Rex explained the key differences between Dance Diffusion and Stable Audio. While the former churned out short, arbitrary audio clips, the latter can generate longer audio sequences. Users can provide text prompts and specify the desired duration to guide the generation process. The model excels particularly with genres like EDM, ambient music, and beat-driven tracks.

Yet, as of now, users can only access Stable Audio via a web application. Stability AI has not announced any intentions to make the model behind Stable Audio open-source, diverging from its historical approach to open research.

One of the standout features of Stable Audio is its ability to produce longer coherent music pieces, approximately up to 90 seconds. In comparison, many AI models lose their coherence after just a few seconds. The secret sauce is latent diffusion, a technique that gradually refines a noisy starting point to match a text description.

Beyond music, Stable Audio can mimic sounds like a passing car or a drum solo, further showcasing its versatility.

Stability AI joined forces with AudioSparx, a commercial music library, to train Stable Audio. The collaboration led to the sourcing of nearly 800,000 songs from a diverse catalog of independent artists. In order to sidestep potential legal and ethical concerns over "deepfaked" vocals, vocal tracks were diligently filtered out.

Interestingly, Stability AI doesn't seem to have a filter for prompts that might land them in legal disputes, unlike tools like Google's MusicLM. When probed about the potential to recreate songs in the style of major artists, Newton-Rex clarified that while the tool's scope is limited by its training data, the potential for 'style emulation' is very much present.

Discussing the broader implications of AI in music, Newton-Rex mentioned Stability AI's proactive measures to address emerging risks. "We're actively working to combat emerging risks in AI by implementing content authenticity standards and watermarking in our imaging models," he said.

In the evolving landscape of AI-driven music, copyright issues are still a gray area. Stability AI believes users of Stable Audio can monetize their works, but may not necessarily have copyright over them.

As for the business model, Stability AI's Pro tier for Stable Audio is priced at $11.99 per month, allowing users to generate up to 500 commercial tracks of 90 seconds each month. Free tier users have a cap of 20 tracks at 20 seconds each month, which are non-commercializable.

Amidst all these developments, the question remains: will Stable Audio be the game-changer Stability AI needs? Only time will tell, but it's evident that Stability AI is determined to push the envelope in the realm of generative AI.