Opinion: AI’s ‘Her’ era has arrived

SAN FRANCISCO: A lifelike artificial intelligence with a smooth, alluring voice enchants and impresses its human users – flirting, telling jokes, fulfilling their desires and eventually winning them over.

I’m summarising the plot of the 2013 movie Her, in which a lonely introvert named Theodore, played by Joaquin Phoenix, is seduced by a virtual assistant named Samantha, voiced by Scarlett Johansson.

But I might as well be describing the scene on May 13 when OpenAI, the creator of ChatGPT, showed off an updated version of its AI voice assistant at an event in San Francisco.

The company’s new model, called GPT-4o (the o stands for “omni”) will let ChatGPT talk to users in a much more lifelike way – detecting emotions in their voices, analysing their facial expressions and changing its own tone and cadence depending on what a user wants. If you ask for a bedtime story, it can lower its voice to a whisper. If you need advice from a sassy friend, it can speak in a playful, sarcastic tone. It can even sing on command.

The new voice feature, which ChatGPT users will be able to start using for free in the coming weeks, immediately drew comparisons to Samantha from Her. (Sam Altman, OpenAI’s CEO, who has praised the movie, posted its title on the social platform X after Monday’s announcement, making the connection all but official.)

On social media, users hailed the arrival of an AI voice assistant that will finally understand them, or at least pretend that it does.

In a series of live demonstrations Monday, OpenAI employees showed off ChatGPT’s new capabilities. One asked ChatGPT to read him a story – then to read it again more dramatically, using the voice of a robot. (“Initiating dramatic robotic voice,” it responded.) Another asked it to sing “Happy Birthday”. ChatGPT did well at both tasks, and it also performed ably when employees asked it to serve as a real-time translator between languages.

But the real killer feature was the way ChatGPT’s voice itself changed. One moment, it was a sing-songy soprano. The next, it shifted into a lilting contralto. It paused for effect, giggled at its own jokes and added filler phrases like “hmm” and “let’s see” for extra realism. It sounded more humanlike than some humans I know.

It also seemed to have a sense of humour. At one point during a demo, an OpenAI employee breathed in a heavy, exaggerated pant. ChatGPT heard him and responded, “Mark, you’re not a vacuum cleaner.”

For years, AI voice assistants have been limited by their inability to pick up on the nuances of conversation, such as tone and emotional affect. Synthetic AI voices, like those used by Siri and Alexa, tend to be flat and impersonal; they sound the same whether they’re giving tomorrow’s weather forecast or telling you that your cookies are done.

And as I discovered recently when I spent a month talking to a group of AI “friends”, a big problem with today’s AI voice models is speed. It’s hard to forget you’re talking to a robot when every answer has a three-second delay.

OpenAI has addressed the latency problem by giving GPT-4o what is known as “native multimodal support” – the ability to take in audio prompts and analyse them directly, without converting them to text first. That has made its conversations faster and more fluid, to the point that if the ChatGPT demos were accurate, most users will barely notice any lag at all.

All this adds up to a much different subjective experience. If previous AI assistants felt like talking to a dispassionate librarian, the new ChatGPT feels like a friendly, chatty co-worker (albeit one who occasionally spouts nonsense – but don’t we all have one of those?).

These demonstrations, along with other AI news from recent days – including reports that Apple is in talks with OpenAI to use its technology on the iPhone and is preparing a new, generative AI-powered version of Siri – signal that the era of the detached, impersonal AI helper is coming to an end.

Instead, we’re getting chatbots modeled after Samantha in Her – with playful intelligence, basic emotional intuition and a wide range of expressive modes.

Some users may be repelled by them. But many will come to love and appreciate the new breed of AI assistants – and some will inevitably fall in love, as Theodore does.

The most telling detail of Monday’s demo, in my view, was the way that OpenAI’s own employees have started talking to ChatGPT. They anthropomorphise it relentlessly and treat it with deference – often asking, “Hey, ChatGPT, how’s it going?” before peppering it with questions. They cheer when it nails a difficult response, the way you might root for a precocious child. One OpenAI employee even wrote, “I heart ChatGPT” on a piece of paper and showed it to ChatGPT through his phone’s camera. (“That’s so sweet of you!” ChatGPT responded.)

These are seasoned AI experts who know full well that they are summoning statistical predictions from a neural network, not talking to a sentient being. And some of it may be showmanship. But if OpenAI’s own employees can’t resist treating ChatGPT like a human, is it any mystery whether the rest of us will?

After all, users were already trying to trick ChatGPT into acting like their boyfriend, even before the upgrade. And my recent experiment with AI friends proved to me that the technology required to create realistic AI companions already exists, even if the execution isn’t perfect yet.

(The New York Times sued OpenAI and its partner, Microsoft, in December, claiming copyright infringement of news content related to AI systems.)

In some ways, the choice to model a chatbot after Samantha from Her is an odd one. The film is hardly a utopian picture of AI companionship, and it ends – spoiler alert! – with Theodore getting his heart broken by Samantha.

But despite the film’s cautionary message, there’s no turning back now. After Monday’s announcement, one OpenAI employee posted, perhaps a bit ominously: “You are all gonna fall in love with it.” – The New York Times

Tagged