How Amazon blew Alexa’s shot to dominate AI, according to more than a dozen employees who worked on it

“Alexa, let’s chat.”

With that phrase, David Limp, at the time Amazon’s head of devices and services, showed off a new generative AI-powered version of the company’s signature Alexa voice assistant in September 2023.

At a packed event at the Seattle-based tech giant’s lavish second headquarters in the Washington DC suburbs, Limp demonstrated the new Alexa for a room full of reporters and cheering employees.

He showed how in response to the new trigger phrase, “Alexa, let’s chat,” the digital assistant responded in a far more natural and conversational voice than the friendly-but-robotic one that hundreds of millions have become accustomed to communicating with for weather updates, reminders, timers and music requests.

Limp asked Alexa how his favourite football team – Vanderbilt University – was doing. Alexa showed how it could respond in a joyful voice, and how it could write a message to his friends to remind them to watch the upcoming Vanderbilt football game and send it to his phone.

The new Alexa LLM, the company said, would soon be available as a free preview on Alexa-powered devices in the US. Rohit Prasad, Amazon’s SVP and Alexa leader said the news marked a “massive transformation of the assistant we love”, and called the new Alexa a “super agent”.

It was clear the company wanted to refute perceptions that the existing Alexa lacked smarts. (Microsoft CEO Satya Nadella reportedly called it “dumb as a rock” in March 2023 as OpenAI’s ChatGPT rocketed to fame).

But after the event, there was radio silence – or digital assistant silence, as the case may be. The traditional Alexa voice never changed on the half-a-billion devices that have been sold globally, and little news emerged over the coming months about the new generative AI Alexa, other than recent reports about a potential launch later this year that could include a subscription charge.

The reason, according to interviews with more than a dozen former employees who worked on AI for Alexa, is an organisation beset by structural dysfunction and technological challenges that have repeatedly delayed shipment of the new generative AI-powered Alexa. Overall, the former employees paint a picture of a company desperately behind its Big Tech rivals Google, Microsoft, and Meta in the race to launch AI chatbots and agents, and floundering in its efforts to catch up.

The September 2023 demo, the former employees emphasise, was just that – a demo. The new Alexa was not ready for a prime time rollout, and still isn’t. The Alexa large language model (LLM), that sits at the heart of the new Alexa, and which Amazon positioned as taking on OpenAI’s ChatGPT, is, according to former employees, far from state-of-the-art.

Research scientists who worked on the LLM said Amazon does not have enough data or access to the specialised computer chips needed to run LLMs to compete with rival efforts at companies like OpenAI.

Amazon has also, former employees say, repeatedly deprioritised the new Alexa in favour of building generative AI for Amazon’s cloud computing unit, AWS. And while Amazon has built a partnership and invested US$4bil (RM18.84bil) in AI startup Anthropic, whose LLM model Claude is considered competitive with OpenAI’s models, it has been unable to capitalise on that relationship to build a better Alexa.

Privacy concerns have kept Alexa’s teams from using Anthropic’s Claude model, former employees say – but so too have Amazon’s ego-driven internal politics.

An Amazon spokesperson said details provided by the former research scientists for this story were “dated” – even though many of these sources left the company in the past six months – and did not reflect the current state of the Alexa LLM. She added that the company has access to hundreds of thousands of GPUs and other AI-specific chips.

She also disputed the idea that Alexa has been deprioritised or that Anthropic’s Claude has been off-limits due to privacy concerns, but she declined to provide evidence of how Claude is being used in the new Alexa.

While aspects of Amazon’s struggle to update Alexa are unique, the company’s challenges give an indication of how difficult it is for companies to revamp digital assistants built on older technologies to incorporate generative AI. Apple, too, has faced similar struggles to integrate AI into its products, including its digital assistant Siri.

Siri and Alexa share a similar technological pedigree – in fact, Siri debuted three years prior to Alexa, in October 2011. And like Amazon, Apple underinvested in the kind of AI expertise needed to build the massive language models that underpin today’s generative AI, and in the vast clusters of graphics processing units (GPUs), the specialised computer chips such models require. Apple too, like Amazon, has launched a determined, but belated, effort to catch up.

Apple took some big steps towards regaining lost ground in the generative AI race with a set of highly-anticipated announcements at its WWDC conference earlier this week. The debut included a big upgrade for Siri, including a more natural-sounding voice and the potential for “on-screen awareness”, which will eventually allow Siri to take more agent-like actions across apps. Apple also announced a Siri integration with ChatGPT. Apple’s announcements only up the pressure on Amazon to deliver the new Alexa.

Unfortunately, there’s growing evidence that Amazon is ill-prepared for this renewed battle of the digital assistants – even though many assumed the company would have been perfectly positioned to take Alexa into the generative AI age.

Yesterday, Mihail Eric, a former senior machine learning scientist at Alexa AI, took to X (formerly Twitter) to say just that: In a post titled “How Alexa dropped the ball on being the top conversational system on the planet”, Eric, who left Amazon in July 2021, pointed out that Alexa had sold over 500 million devices, “which is a mind-boggling user data moat,” and that “we had all the resources, talent, and momentum to become the unequivocal market leader in conversational AI”.

But most of that tech never saw the light of day, he said, because Alexa AI “was riddled with technical and bureaucratic problems”. The dozen former employees Fortune spoke to over the past month echo Eric’s account and add further details to the story of how the Everything Company has failed to do this one thing. The former employees spoke anonymously to avoid violating non-disclosure agreements or non-disparagement clauses they had signed.

Amazon Alexa was caught flat-footed by ChatGPT

Well before ChatGPT wowed the world in November 2022, there was Amazon’s Alexa. The digital assistant was launched in 2014 alongside the Echo smart speaker that served as its hardware interface. The digital assistant, Amazon said, had been inspired by the all-knowing computer featured on Star Trek (Amazon founder Jeff Bezos is a big Star Trek fan).

The product quickly became a hit with consumers, selling over 20 million devices by 2017. But Alexa was not built on the same AI models and methods that made ChatGPT groundbreaking. Instead, it was a collection of small machine learning models and thousands of hand-crafted and hard-coded rules that turned a user’s utterances into the actions Alexa performed.

Amazon had been experimenting with some early large language models – all of them much smaller than GPT-3 and GPT-4, the two models OpenAI would use to power ChatGPT – but these were nowhere near ready for deployment in a product. The company was caught flat-footed by the generative AI boom that followed ChatGPT’s late November 2022 launch, former employees say.

A frantic, frenetic few months followed as Amazon’s Alexa organisation struggled to coalesce around a vision to take the digital assistant from a stilted command-action bot to a truly conversational, helpful agent. Non-generative AI projects were deprioritised overnight, and throughout the 2022 Christmas period executives urged Amazon’s scientists, engineers and product managers to figure out how to ensure Amazon had generative AI products to offer customers.

One former Alexa AI project manager described the atmosphere at the company as “a bit panicked”. Amazon’s response almost immediately ran into trouble, as various teams within Alexa and AWS failed to coalesce around a unified plan.

Many employees were still working remotely following the Covid pandemic, leading to people being endlessly “huddled on conference calls debating the minutiae of strategic PRFAQs” (Amazon-speak for ​​a written document used when proposing a product idea in its early stages), the Alexa AI project manager said. The company struggled, he said, to “shift from peacetime to wartime mode”.

One senior Alexa data scientist said this was especially frustrating because he had tried to sound the alarm on the coming wave of generative AI as far back as mid-2022, gathering data to show his director-level leadership, but he said he could not convince them that the company needed to change its AI strategy. Only after ChatGPT launched did the company swing into action, he explained.

The problem is, as hundreds of millions are aware from their stilted discourse with Alexa, the assistant was not built for, and has never been primarily used for, back-and-forth conversations. Instead, it always focused on what the Alexa organisation calls “utterances” – the questions and commands like “what’s the weather?” or “turn on the lights” that people bark at Alexa.

In the first months after ChatGPT launched, it was not clear LLMs would be able to trigger these real-world actions from a natural conversation, one Ph.D. research scientist who interned on the Alexa team during this period said.

“The idea that an LLM could ‘switch on the lights’ when you said ‘I can’t see, turn it all on’ was not proven yet,” he said. “So the leaders internally clearly had big plans, but they didn’t really know what they were getting into.” (It is now widely accepted that LLMs can, at least in theory, be coupled with other technology to control digital tools.)

Instead, teams were figuring out how to implement generative AI on the fly. That included creating synthetic datasets – in this case, collections of computer-generated dialogues with a chatbot – that they could use to train an LLM. Those building AI models often use synthetic data when there isn’t enough real-world data to improve AI accuracy, or when privacy protection is needed – and remember, most of what the Alexa team had were simple, declarative “utterances”.

“(Customers were) talking in Alexa language,” one former Amazon machine learning scientist said. “So now imagine you want to encourage people to talk in language that has never happened – so where are you going to get the data from to train the model? You have to create it, but that comes with a whole lot of hurdles because there’s a gazillion ways people can say the same thing.”

Also, while Alexa has been integrated with thousands of third-party devices and services, it turns out that LLMs are not terribly good at handling such integrations. According to a former Alexa machine learning manager, who worked on Alexa’s smart home capabilities, even OpenAI’s latest GPT 4o model, or the newest Google Gemini model – which both are able to use voice, rather than just text – struggle to go from spoken dialogue to performing a task using other software. That requires what is known as an API call and LLMs don’t do this well yet.

“It’s not consistent enough, it hallucinates, gets things wrong, it’s hard to build an experience when you’re connecting to many different devices,” the former machine learning scientist said.

As spring gave way to the summer of 2023, many in Alexa’s rank and file remained in the dark about how the digital assistant would meet the generative AI moment. The project lacked vision, former employees said.

“I remember my team and myself complaining a lot to our superiors that it wasn’t transparent what the vision looks like – it wasn’t transparent what exactly we’re trying to launch,” one said.

Another former manager said the new Alexa LLM was talked about in the months prior to the September demo, but it wasn’t clear what it would mean. “We were just hearing things like, ‘Oh yeah, this is coming,’” he said. “But we had no idea what it was or what it would look like.”

Alexa LLM demo did not meet ‘go/no-go’ criteria

The September 2023 Alexa demo made it seem like a widespread rollout of the new Alexa LLM was imminent. But the new language model-based Alexa ultimately “didn’t meet the go/no-go criteria,” one former employee said. LLMs are known for producing hallucinations and sometimes toxic content, and Amazon’s was no different, making broad release risky.

This, former employees say, is the reason Alexa’s “Let’s Chat” feature has never made it into wide release. “It’s very hard to make AI safe enough and test all aspects of that black box in order to release it,” a former manager said.

The September 2023 demo, he pointed out, involved different functionality than what Alexa was best known for – that is, taking a command and executing it. Ensuring Alexa could still perform these old functions while also enabling the conversational dialogue the new Alexa promised would be no easy task.

The manager said it was increasingly clear to him that the organisation would, at least temporarily, need to maintain two completely different technology stacks – one supporting Alexa’s old features and another the new ones. But managers did not want to entertain that idea, he said.

Instead, the message at the company at the time he was laid off in November 2023 was still “we need to basically burn the bridge with the old Alexa AI model and pivot to only working on the new one”.

Even as the new Alexa LLM rollout floundered, Amazon executives set ever more lofty generative AI goals. Right before the demo, Prasad, the Amazon SVP who had served as Alexa’s head scientist, was promoted to a new role designed to bring the company’s disparate research teams under a single umbrella, with a goal to develop human-level artificial general intelligence, or AGI.

The move put Amazon in direct competition with companies like OpenAI, Google DeepMind, and Anthropic, which have the creation of AGI as their founding mission. Meta CEO Mark Zuckerberg has also recently said that creating AGI is his company’s mission too.

By November 2023, there was word that Amazon was investing millions in training an AI model, codenamed Olympus, that would have 2 trillion parameters – or tunable variables. Parameters are a rough approximation of a model’s size and complexity. And Olympus’s reported parameter count would make it double the reported size of OpenAI’s most capable model, GPT-4.

The former research scientist working on the Alexa LLM said Project Olympus is “a joke”, adding that the largest model in progress is 470 billion parameters. He also emphasized that the current Alexa LLM version is unchanged from the 100 billion-parameter model that was used for the September 2023 demo, but has had more pretraining and fine tuning done on it to improve it. (To be sure, 100 billion parameters is still a relatively powerful model. Meta’s Llama 3, as a comparison, weighs in at 70 billion parameters).

A lack of data made it tough to ‘get some magic’ out of the LLM

In the months following the September 2023 demo, a former research scientist who worked on building the new Alexa LLM recalled how Alexa leadership, including Amazon’s generative AI leader Rohit Prasad, pushed the team to work harder and harder.

The message was to “get some magic” out of the LLM, the research scientist said. But the magic never happened. A lack of adequate data was one of the main reasons why, former employees said.

Meta’s Llama 3 was pre-trained on 15 trillion tokens, the smallest unit of data that an LLM processes. The Alexa LLM has only been trained on 3 trillion. (Unlike parameters, which are the number of tunable settings that a model has, a token is the small unit of data – such as a word – that the model processes during training).

Meanwhile, “fine-tuning” an AI model – which takes a pre-trained model and further hones it for specific tasks – also benefits from larger datasets than what Amazon has at the ready. Meta’s Llama 3 model was fine-tuned on 10 million data points. The LLM built by Amazon’s AGI organisation has so far accumulated only around 1 million, with only 500,000 high-quality data points, the former Alexa LLM research scientist said.

One of the many reasons for that, he explained, is that Amazon insists on using its own data annotators (people responsible for labelling data so that AI models can recognise patterns) and that organisation is very slow. “So we can never never get high quality data from them after several rounds, even after one year of developing the model,” he said.

Beyond a paucity of data, the Alexa team also lacks access to the vast quantities of the latest Nvidia GPUs, the specialised chips used to train and run AI models, that the teams at OpenAI, Meta, and Google have, two sources told Fortune. “Most of the GPUs are still A100, not H100,” the former Alexa LLM research scientist added, referring to the most powerful GPU Nvidia currently has available.

At times, building the new Alexa has taken a backseat to other generative AI priorities at Amazon, they said. Amazon’s main focus after ChatGPT launched was to roll out Bedrock, a new AWS cloud computing service that allowed customers to build generative AI chatbots and other applications in the cloud – which was announced in April 2023 and made generally available in September. AWS is a critical profit-driver for Amazon.

Alexa, on the other hand, is a cost center – the division reportedly loses billions each year – and is mostly viewed as a way to keep customers engaged with Amazon and as a way to gather data that can help Amazon and its partners better target advertising.

The LLM that Amazon scientists are building (a version of which will also power Alexa) is also first being rolled out to AWS’ business-focused generative AI assistant Amazon Q, said a former Alexa LLM scientist who left within the past few months, because the model is now considered good enough for specific enterprise use cases. Amazon Q also taps Anthropic’s Claude AI model. But Alexa’s LLM team has not been allowed to use Claude due to concerns about data privacy.

Amazon’s spokesperson said the assertion about Claude and privacy is false, and disputed other details about Amazon’s LLM effort that Fortune heard from multiple sources. “It’s simply inaccurate to state Amazon Q is a higher priority than Alexa. It’s also incorrect to state that we’re using the same LLM for Q and Alexa.”

Bureaucracy and infrastructure issues slowed down Alexa’s gen AI efforts

One former Alexa AI employee who has hired several employees who had been working on the new Alexa LLM said that most have mentioned “feeling exhausted” by the constant pressure to ready the model for a launch that is repeatedly postponed – and frustrated because other work is on hold until in the meantime. A few have also conveyed a growing scepticism as to whether the overall design of the LLM-based Alexa even makes sense, he added.

“One story I heard was that early in the project, there was a big push from senior executives who had become overconfident after experimenting with ChatGPT, and that this overconfidence has persisted among some senior leaders who continue to drive toward an unrealistic-feeling goal,” he said.

Another former Alexa LLM scientist said managers set unachievable deadlines. “Every time the managers assigned us a task related to (the) LLM, they requested us to complete it within a very short period of time (e.g., 2 days, one week), which is impossible,” he said. “It seems the leadership doesn’t know anything about LLMs – they don’t know how many people they need and what should be the expected time to complete each task for building a successful product like ChatGPT.”

Alexa never aligned with Jeff Bezos’ idea of “two-pizza teams” – that is, that teams should ideally be small enough that you could cater a full team meeting with just two pizzas. Bezos thought smaller teams drove effective decision-making and collaboration.

Instead, Alexa has historically been – and remains, for the most part – a giant division. Prior to the most recent layoffs, it had 10,000 employees. And while it has fewer now, it is still organised into large, siloed domains such as Alexa Home, Alexa Entertainment, Alexa Music and Alexa Shopping, each with hundreds of employees, along with directors and a VP at the top.

As pressure grew for each domain to work with the new Alexa LLM to craft generative AI features, each of which required accuracy benchmarks, the domains came into conflict, with sometimes counterproductive results, sources said.

For instance, a machine learning scientist working on Alexa Home recalled that while his domain was working on ways for Alexa to help users control their lights or the thermostat, the Music domain was busy working on how to get Alexa to understand very specific requests like “play Rihanna, then Tupac, and then pause 30 minutes and then play DMX”.

Each domain team had to build its own relationship with the central Alexa LLM team. “We spent months working with those LLM guys just to understand their structure and what data we could give them to fine-tune the model to make it work.” Each team wanted to fine-tune the AI model for its own domain goals.

But as it turned out, if the Home team tried to fine-tuned the Alexa LLM to make it more capable for Home questions, and then the Music team came along and fine-tuned it using their own data for Music, the model would wind up performing worse.

“Catastrophic forgetting”, where what a model learns later in training degrades its ability to perform well on tasks it encountered earlier in training is a problem with all deep learning models. “As it gets better in Music, (the model) can get less smart at Home,” the machine learning scientist said. “So finding the sweet spot where you’re trying to fine tune for 12 domains is almost a lottery.”

These days, he added, LLM scientists know that fine tuning may not be the best technique for creating a model with both rich capabilities and flexibility – there are others, like prompt engineering, that can do better. But by then, many months had gone by with little progress to show for it.

Each Alexa domain, with its own leadership, wanted to protect and expand its fiefdom, one former product manager said. “This organisation has just turned out into something like a mafia,” she said. “Let’s say, if I work for you, I’m just taking orders because it is in my best interest to agree with you. It is my best interest to not get chopped off in the next layoff – it’s quite ruthless. It’s in my best interest because you’re going to help me build my empire.”

Amazon says it stands by its commitment to Alexa

Amazon insists it is fully committed to delivering a generative AI Alexa, adding that its vision remains to build the “world’s best personal assistant.” An Amazon representative pointed out that over half a billion Alexa-enabled devices have been sold, and customers interact with Alexa tens of millions of times every hour.

She added that the implementation of generative AI comes with “huge responsibility – the details really matter” with a technical implementation of this scale, on a device that millions of customers have welcomed into their home. While the Alexa LLM “Let’s chat” feature has not been rolled out to the general public, it has been tested on small groups of customers “on an ongoing basis”.

But many of the employees Fortune spoke to said they left in part because they despaired that the new Alexa would ever be ready – or that by the time it is, it will have been overtaken by products launched by nimbler competitors, such as OpenAI. Those companies don’t have to navigate an existing tech stack and defend an existing feature set.

The former employee who has hired several who left the Alexa organisation over the past year said many were pessimistic about the Alexa LLM launch. “They just didn’t see that it was actually going to happen,” he said.

It’s possible, say some of the employees Fortune interviewed, that Amazon will finally launch an LLM-based Alexa – and that it will be an improvement to today’s Alexa. After all, there are hundreds of millions of Alexa users out there in the world who would certainly be happy if the device sitting on their desk or kitchen counter could do more than execute simple commands.

But given the challenges weighing down the Alexa LLM effort, and the gap separating it from the offerings of generative AI leaders like OpenAI and Google, none of the sources Fortune spoke with believe Alexa is close to accomplishing Amazon’s mission of being “the world’s best personal assistant”, let alone Amazon founder Jeff Bezos’ vision of creating a real-life version of the helpful Star Trek computer.

Instead, Amazon’s Alexa runs the risk of becoming a digital relic with a cautionary tale – that of a potentially game-changing technology that got stuck playing the wrong game. – Fortune.com/The New York Times

Tagged