OpenAI has recently unveiled its innovative Voice Engine technology, which is capable of generating natural speech that closely resembles the original speaker in multiple languages. While the tool has been in development since 2022, the company has decided to proceed with caution and engage in a dialogue about the responsible deployment of synthetic voices.
Voice Engine has the potential to revolutionize a variety of use cases, such as providing reading assistance to non-readers and children by automatically generating pre-scripted voice over content with natural sounding voices. It can also help patients with speech impairments due to conditions like oncologic or neurologic issues, allowing them to recover their voices with the help of AI-generated speech.
Partners testing Voice Engine are required to adhere to OpenAI’s usage policies, which include obtaining explicit consent from the original speaker, disclosing to the audience that the voices are AI-generated, and prohibiting the impersonation of individuals or organizations without consent. Additionally, OpenAI is implementing watermarking to track the origin of any audio generated by Voice Engine and proactively monitor its usage to prevent misuse.
Despite its potential positive impact, Voice Engine also poses significant risks, such as being used for fraudulent extortion scams targeting families and small businesses, false election and marketing campaigns, and potential exploitation of voice artists’ voices without consent. OpenAI has recommended safety approaches for voice technologies, including phasing out voice-based authentication for sensitive information and educating the public about the capabilities and limitations of AI technologies.
OpenAI has decided to delay a formal public release of Voice Engine due to safety concerns, especially during the election year, and is taking a cautious approach to its deployment. With the filing of a trademark application, OpenAI may be positioning itself to compete directly with Amazon’s Alexa, signaling its market direction. Despite the risks associated with voice cloning technology, it is clear that this technology is here to stay.
In a recent announcement, the Federal Communications Commission (FCC) declared that calls made using voices generated with AI technology will be classified as “artificial” under the TCPA, making robocalls employing voice cloning technology and targeting consumers illegal. As the landscape of voice technology continues to evolve, it is essential for regulators, companies, and individuals to prioritize safety, consent, and ethical usage to prevent potential harms and promote responsible innovation in the field.