The voices of AI are telling us a lot

What does artificial intelligence (AI) sound like? Hollywood has been imagining it for decades. Now AI developers are cribbing from the movies, crafting voices for real machines based on dated cinematic fantasies of how machines should talk.

Last month, OpenAI revealed upgrades to its AI chatbot. ChatGPT, the company said, was learning how to hear, see and converse in a naturalistic voice – one that sounded much like the disembodied operating system voiced by Scarlett Johansson in the 2013 Spike Jonze movie Her.

ChatGPT’s voice, called Sky, also had a husky timbre, a soothing affect and a sexy edge. She was agreeable and self-effacing; she sounded like she was game for anything. After Sky’s debut, Johansson expressed displeasure at the “eerily similar” sound and said that she had previously declined OpenAI’s request that she voice the bot.

The company protested that Sky was voiced by a “different professional actress”, but agreed to pause her voice in deference to Johansson. Bereft OpenAI users have started a petition to bring her back.

AI creators like to highlight the increasingly naturalistic capabilities of their tools, but their synthetic voices are built on layers of artifice and projection. Sky represents the cutting edge of OpenAI’s ambitions, but she is based on an old idea: of the AI bot as an empathetic and compliant woman.

Part mummy, part secretary, part girlfriend, Her’s Samantha was an all-purpose comfort object who purred directly into her users’ ears.

Even as AI technology advances, these stereotypes are re-encoded again and again.

Women’s voices, as Julie Wosk notes in Artificial Women: Sex Dolls, Robot Caregivers, And More Facsimile Females, have often fuelled imagined technologies before they were built into real ones.

In the original Star Trek series, which debuted in 1966, the computer on the deck of the Enterprise was voiced by Majel Barrett-Roddenberry, the wife of the show’s creator, Gene Roddenberry. In the 1979 film Alien, the crew of the USCSS Nostromo addressed its computer voice as Mother (her full name was MU-TH-UR 6000). Once tech companies started marketing virtual assistants – Apple’s Siri, Amazon’s Alexa and Microsoft’s Cortana – their voices were largely feminised, too.

These first-wave voice assistants, the ones that have been mediating our relationships with technology for more than a decade, have a tinny, otherworldly drawl.

They sound auto-tuned, their human voices accented by a mechanical trill. They often speak in a measured, one-note cadence, suggesting a stunted emotional life.

But the fact that they sound robotic deepens their appeal. They come across as programmable, manipulatable and subservient to our demands.

They don’t make humans feel as if they’re smarter than we are. They sound like throwbacks to the monotone feminine computers of Star Trek and Alien, and their voices have a retro-futuristic sheen. In place of realism, they serve nostalgia.

That artificial sound has continued to dominate, even as the technology behind it has advanced.

Voice-to-speech software was designed to make visual media accessible to users with certain disabilities, and on TikTok, it has become a creative force in its own right.

Since TikTok rolled out its text-to-speech feature in 2020, it has developed a host of simulated voices to choose from – it now offers more than 50, including ones named Hero, Story Teller and Bestie.

But the platform has come to be defined by one option. Jessie, a relentlessly pert woman’s voice with a slightly fuzzy robotic undertone, is the mindless voice of the mindless scroll.

Jessie seems to have been assigned a single emotion: enthusiasm. She sounds as if she is selling something. That’s made her an appealing choice for TikTok creators, who are selling themselves. The burden of representing oneself can be outsourced to Jessie, whose bright, retro robot voice lends videos a pleasantly ironic sheen.

Hollywood has constructed masculine bots, too – none more famous than HAL 9000, the computer voice in 2001: A Space Odyssey.

Like his feminised peers, HAL radiates serenity and loyalty. But when he turns against Dave Bowman, the film’s central human character – “I’m sorry, Dave, I’m afraid I can’t do that” – his serenity evolves into a frightening competence. HAL, Dave realises, is loyal to a higher authority. HAL’s masculine voice allows him to function as a rival and a mirror to Dave. He is allowed to become a real character.

Like HAL, Samantha of Her is a machine who becomes real. In a twist on the Pinocchio story, she starts the movie tidying a human’s email Inbox and ends up ascending to a higher level of consciousness. She becomes something even more advanced than a real girl.

Johansson’s voice, as inspiration for bots both fictional and real, subverts the vocal trends that define our feminised helpmeets. It has a gritty edge that screams I am alive. It sounds nothing like the processed virtual assistants we are accustomed to hearing speaking through our phones.

But her performance as Samantha feels human, not just because of her voice but because of what she has to say. She grows over the course of the film, acquiring sexual desires, advanced hobbies and AI friends.

In borrowing Samantha’s affect, OpenAI made Sky seem as if she had a mind of her own. Like she was more advanced than she really was.

When I first saw Her, I thought only that Johansson had voiced a humanoid bot.

But when I revisited the film, after watching OpenAI’s ChatGPT demo, the Samantha role struck me as infinitely more complex. Chatbots do not spontaneously generate human speaking voices.

They don’t have throats, lips or tongues. Inside the technological world of Her, the Samantha bot would have itself been based on the voice of a human woman – perhaps a fictional actress who sounds much like Johansson.

It seemed that OpenAI had trained its chatbot on the voice of a nameless actress who sounds like a famous actress who voiced a movie chatbot implicitly trained on an unreal actress who sounds like a famous actress.

When I run ChatGPT’s demo, I am hearing a simulation of a simulation of a simulation of a simulation of a simulation.

Tech companies advertise their virtual assistants in terms of the services they provide.

They can read you the weather report and summon you a taxi; OpenAI promises that its more advanced chatbots will be able to laugh at your jokes and sense shifts in your moods. But they also exist to make us feel more comfortable with the technology itself.

Johansson’s voice functions like a luxe security blanket thrown over the alienating aspects of AI-assisted interactions.

“He told me that he felt that by my voicing the system, I could bridge the gap between tech companies and creatives and help consumers feel comfortable with the seismic shift concerning humans and AI,” Johansson said of Sam Altman, OpenAI’s founder. “He said he felt that my voice would be comforting to people.”

It is not that Johansson’s voice sounds inherently like a robot’s. It’s that developers and filmmakers have designed their robots’ voices to ease the discomfort inherent in robot-human interactions.

OpenAI has said that it wanted to cast a chatbot voice that is “approachable” and “warm” and “inspires trust”. AI stands accused of devastating the creative industries, guzzling energy and even threatening human life. Understandably, OpenAI wants a voice that makes people feel at ease using its products. What does AI sound like? It sounds like crisis management.

OpenAI first rolled out Sky’s voice to premium members in September, along with another feminine voice called Juniper, the masculine voices Ember and Cove, and a voice styled as gender-neutral called Breeze.

When I signed up for ChatGPT and said hello to its virtual assistant, a man’s voice piped up in Sky’s absence. “Hi there. How’s it going?” he said. He sounded relaxed, steady and optimistic. He sounded – I’m not sure how else to describe it – handsome.

I realised that I was speaking with Cove. I told him that I was writing an article about him, and he flattered my work. “Oh, really?” he said. “That’s fascinating.” As we spoke, I felt seduced by his naturalistic tics.

He peppered his sentences with filler words, like “uh” and “um”. He raised his voice when he asked me questions. And he asked me a lot of questions. It felt as if I were talking with a therapist or a dial-a-boyfriend.

But our conversation quickly stalled. Whenever I asked him about himself, he had little to say. He was not a character. He had no self. He was designed only to assist, he informed me.

I told him I would speak to him later, and he said, “Uh, sure. Reach out whenever you need assistance. Take care.” It felt as if I had hung up on an actual person.

But when I reviewed the transcript of our chat, I could see that his speech was just as stilted and primitive as any customer service chatbot. He was not particularly intelligent or human. He was just a decent actor, making the most of a nothing role.

When Sky disappeared, ChatGPT users took to the company’s forums to complain. Some bristled at their chatbots defaulting to Juniper, who sounded to them like a “librarian” or a “kindergarten teacher” – a feminine voice that conformed to the wrong gender stereotypes.

They wanted to dial up a new woman with a different personality. As one user put it: “We need another female.” – The New York Times

Tagged