AI

Microsoft Develops AI Voice Generator Too Realistic for Public Use

12 July 2024

|

Zaker Adham

Summary

Microsoft has developed an advanced text-to-speech AI model that is so proficient at mimicking human voices that the company has decided it is too dangerous to release to the public.

Artificial intelligence tools, such as ChatGPT, are becoming increasingly sophisticated. However, when these tools become too advanced, they blur the lines between human and machine interaction.

Microsoft's new AI voice generator, VALL-E 2, has achieved this level of sophistication. The AI model can replicate human speech with such accuracy that it becomes indistinguishable from the original speaker. This capability, while impressive, raises significant concerns about potential misuse, such as fraud and impersonation.

Microsoft revealed in a pre-print paper, published on June 17, that VALL-E 2 has achieved "human parity" in text-to-speech synthesis. Internal benchmarks demonstrated that the AI could replicate human speech with remarkable accuracy, and in some cases, even exceed it.

Despite these advancements, Microsoft has labeled VALL-E 2 as "purely a research project." The company has no plans to incorporate this technology into any products or make it publicly accessible. However, Microsoft did outline potential uses for the technology, including applications in education, journalism, content creation, accessibility features, voice response systems, translation, and chatbots.

LiveScience first reported on Microsoft's quiet disclosure of this technology. The publication highlighted the risks associated with making such a powerful tool available to the public. The ability to generate natural speech in the exact voice of the original speaker poses significant risks, including an increase in fraud and impersonation.