Nvidia Says AI Model Generates ‘Sounds Never Heard Before’

Nvidia has unveiled an AI model it dubs “a Swiss Army knife for sound.”

Fugatto (or “Foundational Generative Audio Transformer Opus 1”) is an artificial intelligence (AI) tool that can take prompts using any mix of text and audio files to generate or transform any combination of sounds, music and voices, the tech giant said Monday (Nov. 25).

“For example, it can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice — even let people produce sounds never heard before,” the company wrote on its blog.

Nvidia argues that Fugatto, which supports numerous audio generation and transformation tasks, is the first foundational generative AI model that showcases emergent properties — capabilities stemming from the interaction of its various trained abilities — and the ability to meld free-form instructions.

“Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale,” said Rafael Valle, a manager of applied audio research at Nvidia. An orchestral conductor and composer, he is among the dozen-plus people who helped develop Fugatto.

Valle noted that music producers could use Fugatto to quickly prototype or edit an idea for a song, testing different styles, voices and instruments, or add effects and improve the overall sound quality of an existing track.

But the tool’s use goes beyond music, the company said. Ad agencies could employ Fugatto to target campaigns for multiple regions or situations, applying a range of different accents and emotions to voiceovers.

And video game companies could use the tool to modify prerecorded audio to it changing action as players progress in a game.

The launch of Fugatto comes days after Nvidia released quarterly earnings showing a 94% jump in revenue. And as covered here last week, CEO Jensen Huang is not resting on his laurels after reaching that milestone.

“Many AI services are running 24/7, just like any factory,” Huang said during an earnings call.

“We’re going to see this new type of system come online. And I call it [the company’s data centers] an AI factory because that’s really close to what it is. It’s unlike a data center of the past. And these fundamental trends are really just beginning. We expect this to happen, this growth, this modernization and the creation of a new industry to go on for several years.”

As PYMNTS wrote, Huang and CFO Colette Kress clearly believe that the company’s best days are ahead of it, despite analysts wondering whether or not it can keep up the pace in several areas: large language model (LLM) development, AI usage scale and the rapid-fire revenue growth it has enjoyed over the past two years.

For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.