The White Clam Pizza at Frank Pepe Pizzeria Napoletana in New Haven, Connecticut, is a revelation. The crust, kissed by the intense heat of the coal-fired oven, achieves a perfect balance of crispness and chew. Topped with freshly shucked clams, garlic, oregano and a dusting of grated cheese, it is a testament to the magic that simple, high-quality ingredients can conjure.
Sound like me? It’s not. The entire paragraph, except the pizzeria’s name and the city, was generated by GPT-4 in response to a simple prompt asking for a restaurant critique in the style of Pete Wells.
I have a few quibbles. I would never pronounce any food a revelation, or describe heat as a kiss. I don’t believe in magic, and rarely call anything perfect without using “nearly” or some other hedge. But these lazy descriptors are so common in food writing that I imagine many readers barely notice them. I’m unusually attuned to them because whenever I commit a cliche in my copy, I get boxed on the ears by my editor.
He wouldn’t be fooled by the counterfeit Pete. Neither would I. But as much as it pains me to admit, I’d guess that many people would say it’s a four-star fake.
The person responsible for Phony Me is Balazs Kovacs, a professor of organisational behaviour at Yale School of Management. In a recent study, he fed a large batch of Yelp reviews to GPT-4, the technology behind ChatGPT, and asked it to imitate them.
His test subjects – people – could not tell the difference between genuine reviews and those churned out by artificial intelligence. In fact, they were more likely to think the AI reviews were real. (The phenomenon of computer-generated fakes that are more convincing than the real thing is so well known that there’s a name for it: AI hyperrealism.)
Kovacs’ study belongs to a growing body of research suggesting that the latest versions of generative AI can pass the Turing test, a scientifically fuzzy but culturally resonant standard. When a computer can dupe us into believing that language it spits out was written by a human, we say it has passed the Turing test.
It’s long been assumed that AI would eventually pass the test, first proposed by mathematician Alan Turing in 1950. But even some experts are surprised by how rapidly the technology is improving. “It’s happening faster than people expected,” Kovacs said.
The first time Kovacs asked GPT-4 to mimic Yelp, few were tricked. The prose was too perfect. That changed when Kovacs instructed the program to use colloquial spellings, emphasize a few words in all caps and insert typos – one or two in each review. This time, GPT-4 passed the Turing test.
Aside from marking a threshold in machine learning, AI’s ability to sound just like us has the potential to undermine whatever trust we still have in verbal communications, especially shorter ones. Text messages, emails, comments sections, news articles, social media posts and user reviews will be even more suspect than they already are. Who is going to believe a Yelp post about a pizza-croissant or a glowing OpenTable dispatch about a US$400 (RM1,882) omakase sushi tasting knowing that its author might be a machine that can neither chew nor swallow?
“With consumer-generated reviews, it’s always been a big question of who’s behind the screen,” said Phoebe Ng, a restaurant communications strategist in New York City. “Now it’s a question of what’s behind the screen.”
Online opinions are the grease in the wheels of modern commerce. In a 2018 survey by the Pew Research Center, 57% of Americans polled said they always or almost always read internet reviews and ratings before buying a product or service for the first time. Another 36% said they sometimes did.
For businesses, a few points in a star rating on Google or Yelp can mean the difference between making money and going under. “We live on reviews,” the manager of an Enterprise Rent-a-Car location in Brooklyn told me last week as I picked up a car.
A business traveller who needs a ride that won’t break down on the New Jersey Turnpike may be more swayed by a negative report than, say, somebody just looking for brunch. Still, for restaurant owners and chefs, Yelp, Google, TripAdvisor and other sites that let customers have their say are a source of endless worry and occasional fury.
One special cause of frustration is the large number of people who don’t bother to eat in the place they’re writing about. Before an article on Eater pointed it out last week, the first New York location of the Taiwanese-based dim sum chain Din Tai Fung was being pelted by one-star Google reviews, dragging its average rating down to 3.9 of a possible 5. The restaurant hasn’t opened yet.
Some phantom critics are more sinister. Restaurants have been blasted with one-star reviews, followed by an email offering to take them down in exchange for gift cards.
To fight back against bad-faith slams, some owners enlist their nearest and dearest to flood the zone with positive blurbs. “One question is, how many aliases do all of us in the restaurant industry have?” said Steven Hall, owner of a New York public relations firm.
A step up from an organised ballot-stuffing campaign, or maybe a step down, is the practice of trading comped meals or cash for positive write-ups. Beyond that looms the vast and shadowy realm of reviewers who don’t exist.
To hype their own businesses, or kneecap their rivals, companies can hire brokers who have manufactured small armies of fictitious reviewers. According to Kay Dean, a consumer advocate who researches fraud in online reviews, these accounts are usually given an extensive history of past reviews that act as camouflage for their pay-for-play output.
In two recent videos, she pointed out a chain of mental health clinics that had received glowing Yelp reviews ostensibly submitted by satisfied patients whose accounts were littered with restaurant reviews lifted word for word from TripAdvisor.
“It’s an ocean of fakery, and much worse than people realise,” Dean said. “Consumers are getting duped, honest businesses are being harmed and trust is eroding.”
All this is being done by mere people. But as Kovacs writes in his study, “The situation now changes substantially because humans will not be required to write authentic-looking reviews.”
Dean said that if AI-generated content infiltrates Yelp, Google and other sites, it will be “even more challenging for consumers to make informed decisions.”
The major sites say they have ways to ferret out Potemkin accounts and other forms of phoniness. Yelp invites users to flag dubious reviews, and after an investigation will take down those found to violate its policies. It also hides reviews that its algorithm deems less trustworthy. Last year, according to its most recent Trust & Safety Report, the company stepped up its use of AI “to even better detect and not recommend less helpful and less reliable reviews.”
Kovacs believes that sites will need to try harder now to show that they aren’t regularly posting the thoughts of robots. They could, for instance, adopt something like the “Verified Purchase” label that Amazon sticks on write-ups of products that were bought or streamed through its site. If readers become even more suspicious of crowdsourced restaurant reviews than they already are, it could be an opportunity for OpenTable and Resy, which accept feedback only from those diners who show up for their reservations.
One thing that probably won’t work is asking computers to analyse the language alone. Kovacs ran his real and ginned-up Yelp blurbs through programs that are supposed to identify AI. Like his test subjects, he said, the software “thought the fake ones were real.”
This did not surprise me. I took Kovacs’s survey myself, confident that I would be able to spot the small, concrete details that a real diner would mention. After clicking a box to certify that I was not a robot, I quickly found myself lost in a wilderness of exclamation points and frowny faces. By the time I’d reached the end of the test, I was only guessing. I correctly identified seven out of 20 reviews, a result somewhere between tossing a coin and asking a monkey.
What tripped me up was that GPT-4 did not fabricate its opinions out of thin air. It stitched them together from bits and pieces of Yelpers’ descriptions of their afternoon snacks and Sunday brunches.
“It’s not totally made up in terms of the things people value and what they care about,” Kovacs said. “What’s scary is that it can create an experience that looks and smells like real experience, but it’s not.”
By the way, Kovacs told me that he gave the first draft of his paper to an AI editing program, and took many of its suggestions in the final copy.
It probably won’t be long before the idea of a purely human review will seem quaint. The robots will be invited to read over our shoulders, alert us when we’ve used the same adjective too many times, nudge us toward a more active verb. The machines will be our teachers, our editors, our collaborators. They’ll even help us sound human. – The New York Times