[ad_1]
AI-generated art is popping up everywhere, but that’s just the beginning. Microsoft recently released a new AI tool called VALL-E, which is similar to DALL-E but for voices. After listening to just three seconds of audio, VALL-E can replicate any voice.
If that sounds scary, that’s because it is. That’s not all, either. According to AITemas, Microsoft’s new tool easily blends emotion and pitch, something many voice AI tools struggle with. The team trained VALL-E on approximately 60,000 hours of English speech data and demonstrated in-context learning abilities and was even able to replicate words it had never heard before.
The report says that VALL-E is capable of prompt-based TTS, follows context, and does not need pre-engineered acoustics or any structural engineering to deliver a high-quality audio sample. Basically, this new AI tool is pretty awesome. All VALL-E needs is to hear about three seconds of any voice, and you can quickly and easily imitate (or replicate) the voice.
There are several audio examples of the tool on GitHub, and while some sound great, others are less than impressive and have a robotic tone. But when it works, it works great. That said, these are still early days for VALL-E, and things will get better with time. Also, if the team used larger samples, it would probably be more accurate.
It’s important to note that VALL-E isn’t available to the public, at least not yet, so we can all breathe a sigh of relief. If that happens, there will undoubtedly be a host of security, social, and ethical concerns, to say the least. While this technology certainly sounds impressive, it’s also pretty wild.
via Windows Central
[ad_2]