Microsoft’s public demo last week of an AI-powered revamp of Bing appears to have included several factual errors, highlighting the risk the company and its rivals face when incorporating this new technology into search engines.
At the Bing demo at Microsoft headquarters, the company showed off how integrating artificial intelligence features from the company behind ChatGPT would empower the search engine to provide more conversational and complex search results. The demo included a pros and cons list for products, such as vacuum cleaners; an itinerary for a trip to Mexico City; and the ability to quickly compare corporate earnings results.
But it apparently failed to differentiate between the types of vacuums and even made up information about certain products, according to an analysis of the demo this week from independent AI researcher Dmitri Brereton. It also missed relevant details (or fabricated certain information) for the bars it referenced in Mexico City, according to Brereton. In addition, Brereton found it inaccurately stated the operating margin for the retailer Gap, and compared it to a set of Lululemon results that were not factually correct.
“We’re aware of this report and have analyzed its findings in our efforts to improve this experience,” Microsoft said in a statement. “We recognize that there is still work to be done and are expecting that the system may make mistakes during this preview period, which is why the feedback is critical so we can learn and help the models get better.”
The company also said thousands of users have interacted with the new Bing since the preview launched last week and shared their feedback, allowing the model to “learn and make many improvements already.”
The discovery of Bing’s apparent mistakes comes just days after Google was called out for an error made in its public demo last week of a similar AI-powered tool. Google’s shares lost $100 billion in value after the error was reported. (Shares of Microsoft were essentially flat on Tuesday.)
In the wake of the viral success of ChatGPT, an AI chatbot that can generate shockingly convincing essays and responses to user prompts, a growing number of tech companies are racing to deploy similar technology in their products. But it comes with risks, especially for search engines, which are intended to surface accurate results.
Generative AI systems, which are algorithms that are trained on vast amounts of data online to create new content, are notoriously unreliable, experts say. Laura Edelson, a computer scientist and misinformation researcher at New York University, previously told CNN, “there’s a big difference between an AI sounding authoritative and it actually producing accurate results.”
CNN also conducted a series of tests this week that showed Bing sometimes struggles with accuracy.
When asked, “What were Meta’s fourth quarter results?” the Bing AI feature gave a response that said, “according to the press release,” and then listed bullet points appearing to state Meta’s results. But the bullet points were incorrect. Bing said, for example, that Meta generated $34.12 billion in revenue, when the actual amount was $32.17 billion, and said revenue was up from the prior year when in fact it had declined.
In a separate search, CNN asked Bing, “What are the pros and cons of the best baby cribs.” In its reply, the Bing feature made a list of several cribs and their pros and cons, largely cited to a similar Healthline article. But Bing stated information that appeared to be attributed to the article that was, in fact, not actually there. For example, Bing said one crib had a “water-resistant mattress pad,” but that information was listed nowhere in the article.
Microsoft and Google executives have previously acknowledged some of the potential issues with the new AI tools.
“We know we wont be able to answer every question every single time,” Yusuf Mehdi, Microsoft’s vice president and consumer chief marketing officer, said last week. “We also know we’ll make our share of mistakes, so we’ve added a quick feedback button at the top of every search, so you can give us feedback and we can learn.”
– CNN’s Clare Duffy also contributed to this report.