Microsoft is working on an artificial intelligence called VALL-E that can clone someone’s voice from a three-second audio clip.
VALL-E is trained with 60,000 hours of English speech and is capable of mimicking a voice in “zero-shot scenarios”, meaning the AI tool can make a voice say words it has never heard before. VALL-E uses text-to-speech technology to convert written words into spoken words in “high-quality personalized” speeches, according to the 16-page paper.
VALL-E used recordings of over 7,000 real speakers from LibriLight, an audiobook dataset comprised of public-domain texts read by volunteers. The tech giant released samples of how VALL-E would work, demonstrating how a speaker’s voice is cloned.
The AI tool is not currently available for public use and Microsoft hasn’t made it clear what its intended purpose is.
Meanwhile, Microsoft announced Monday it will make OpenAI’s ChatGPT available to its own services after announcing its interest in investing $10 billion in the AI writing tool.