Meta, the parent company of Facebook, has unveiled Voicebox, a new generative artificial intelligence that focuses on producing audio clips rather than text-based responses like ChatGPT or Google’s Bard. Voicebox utilizes a 2-second audio sample to synthesize speech. It can match the audio style, perform text-to-speech generation, and recreate interrupted speech caused by external noise.
Furthermore, Voicebox has the capability to read English text in various other languages such as French, German, Spanish, Polish, and Portuguese, providing a versatile solution for multilingual audio synthesis.
Meta envisions Voicebox being used to give virtual assistants and nonplayer characters in the metaverse, which are digital worlds where people gather to work, play, and socialize, a natural-sounding voice. Additionally, it could be beneficial for visually impaired individuals, allowing them to hear messages spoken in the voices of their friends.
It is important to note that Voicebox is still under development and is not yet available to the general public. Meta acknowledges the potential misuse of this AI technology and is actively working on implementing effective measures to differentiate between authentic speech and audio generated by Voicebox.