OpenAI delays launch of voice assistant to address safety issues

OpenAI is delaying the release of a much-anticipated voice assistant feature for ChatGPT in order to ensure it can safely and effectively process requests from millions of users.

The artificial intelligence startup unveiled the voice option at a product launch event in May for GPT-4o, an updated version of its GPT-4 model that is better at handling text, audio and images in real time. In a statement, OpenAI said it had originally intended to roll out the voice feature to a small group of paid ChatGPT Plus subscribers in late June, but decided it needs another month to “reach our bar to launch.”

“We’re improving the model’s ability to detect and refuse certain content,” the company said on Tuesday. “We’re also working on improving the user experience and preparing our infrastructure to scale to millions while maintaining real-time responses.”

The delay represents a possible setback for OpenAI as it works to stay ahead of an increasingly crowded field of AI rivals. The company had introduced a more limited option for ChatGPT to talk back to users last year, but the new feature promised to be faster and pair with powerful image-recognition capabilities to turn the chatbot into a far more useful and dynamic conversational partner.

On stage at the launch event, OpenAI employees showed off ChatGPT responding almost instantly to requests such as solving a math problem on a piece of paper placed in front of a researcher’s smartphone camera. Some viewers likened the tool to the AI virtual assistant in the 2013 film Her, voiced by Scarlett Johansson. The actress later demanded one of the ChatGPT voices be removed for sounding too much like her.

On Tuesday, OpenAI said it plans to roll out the voice feature to all of its paid subscribers in the fall. OpenAI said it’s “also working” on releasing video and screen-sharing features that the company demonstrated during its May event. The company said it will let users know more about the timing for those features in the future.

As a result, it’s likely when the voice option does become available to select paid users in the next month, its capabilities will be more limited than what was demonstrated at the event. For example, the chatbot will not be able to access a computer-vision feature that would let it offer spoken feedback on a user’s dance moves simply by using the smartphone’s camera. – Bloomberg

Related Posts