Microsoft AI tool VALL-E can clone your voice from a 3-second audio

Microsoft is working on an artificial intelligence called VALL-E that can clone someone’s voice from a three-second audio clip. VALL-E is trained with 60,000 hours of English speech and is capable of mimicking a voice in “zero-shot scenarios”, meaning the AI tool can make a voice say words it has never heard before. VALL-E uses text-to-speech technology to convert written words into spoken words in “high-quality personalized” speeches, according to the 16-page paper. VALL-E used recordings of over 7,000 real speakers from LibriLight, an audiobook dataset comprised of public-domain texts read…

Read More