The American computer software company Microsoft has developed software that can mimic anyone’s voice in just 3 seconds.
According to foreign media, this voice simulator software developed by Microsoft can mimic any person’s voice in just 3 seconds using artificial intelligence.
According to the report, 60,000 hours of English speech from 7,000 people were used to create the VALLE language model to reproduce the ‘high-quality speech’ of any unseen person.
The advantage of this software is that it does not just copy what is heard, but if a single voice recording of a person comes into this system, it can say anything in that person’s voice, even emotional fluctuations. , is also capable of fully copying words’ subtext and sound environment.
Previously, Microsoft has created another program that mimics the paintings of famous artists, DellE, but the new model is text-to-speech, that is, it converts written words into sound. It can mimic the voice of anyone in the world for all it needs is a three-second audio file. However, a slightly longer audio file may be required for further improvements.
In this way, you can make Wall-E play any character that he never said. That is, like deepfake technology that mimics a person’s visual resemblance in videos, it has the potential to be misused. Experts have also expressed fear that this will lead to a new flood of fake audio and fake recordings and create various problems.
However, Microsoft has said that there may be some benefits. As if an artist leaves the dubbing of a film in the middle and gets busy elsewhere, then dubbing can be done with software. Other similar minor issues can be well handled by the EAI.
Microsoft has stated that ‘VALE software used to simulate voice is used to simulate the voice,’ citing other concerns, including potential risks if the model is misused, such as voice spoofing or simulating a speaker. The ware is not currently available for public use.’
Microsoft said it will continue to improve VALLEY as well as implement its own artificial intelligence principles. Also, possible methods of synthesizing sound detection to reduce such risks will be considered.