The use of LLMs to interpret and summarize search results (so-called "generative search") is widely touted as the future of information access. Yet, a recent real-world search demonstrates the hallucination and confounding perils of the current generation of generative search tools with their use of external memories and blended summarization.
Earlier this week, I was attempting to compare the common household cleaner Comet scouring powder against a competing product and couldn't remember offhand what its active ingredient was and its concentration. So, like most modern digital citizens, I simply opened a web browser and typed "what is the active ingredient in comet?" In the past this would have yielded a list of webpages like an EPA communication that lists the active ingredient as citric acid at 6% concentration, taken directly from the label, an EWG guide that lists it as troclosene sodium dihydrate, a Wikipedia entry that lists it as calcium carbonate at 60-100% concentration and a cleaning supply company's site with the official P&G MSDS that lists calcium carbonate and bleach, among other results. So which is it? Part of the challenge lies in the range of Comet variants that are formulated differently and the way in which chemicals are listed on consumer products and technical datasheets. For a consumer, none of this matters: they just want a simple answer as to which product contains which chemicals and in what concentrations. That is the promise of generative search: an LLM will read through all of the search results and summarize them in a succinct and easy-to-understand fashion, using its learned technical knowledge to interpret and simplify complex chemistry into simple prose.
When running the search above, the search engine produced this generative summary that appeared above the search results:
The active ingredient in Comet is citric acid. Comet is a powdered cleaning product sold in North America and distributed in the USA by Prestige Brands. It contains 1.2% sodium dichloro-s-triazinetrione dihydrate and 98.8% "other" ingredients. Comet is also known by several other names, including Ajax Push, Reverse Ajax, Two-way-web, HTTP Streaming, and HTTP server push.
Contradictorily, the LLM states both that the active ingredient is citric acid, but also that it is 1.2% sodium dichloro-s-triazinetrione dihydrate and 98.8% "other" ingredients (which adds up to 100%). So which is it?
It is the last sentence, however, that is the most interesting, offering a fascinating glimpse at the dangers of external LLM memories. Scrolling through the first several pages of search results, the search engine correctly identified that "comet" in the context of "active ingredients" refers to the cleaning product, with no unrelated results visible. Yet, somehow the generative search interface looked past the context of an "active ingredient" and instead conflated the cleaning product with the Comet technology model.
Interestingly, the model also plagiarized from Wikipedia. Compare the sentence above to the 6th sentence of the Wikipedia article about the technology:
LLM: Comet is also known by several other names, including Ajax Push, Reverse Ajax, Two-way-web, HTTP Streaming, and HTTP server push.
Wikipedia: Comet is known by several other names, including Ajax Push, Reverse Ajax, Two-way-web, HTTP Streaming, and HTTP server push among others
Not only did the LLM conflate Comet the bathroom cleaner with Comet the web technology, but it plagiarized to boot.
What explains the difference between the search results and the generative results in conflating cleaners with software? While the specific implementations used by the major search engines are not published, it is likely that the company uses a variant of the traditional embedding+LLM model in which the search index is chunked into paragraphs or other blocks of text and converted into embeddings, with the user query being converted to an embedding, an ANN search run to identify the top X results and then the underlying text passed to the LLM for blended summarization. Here, the embedding model likely struggled to sufficiently encode the concept of an ingredient list as distinct from software and thus yielded an embedded skewed more towards the term of Comet as a product name and latched onto the Wikipedia entry for the technology.
The search engine returned the same result when asked the question multiple times that day, but curiously, three days later, when asked the same question it responded with:
The active ingredient in Comet is sodium dichloro-s-triazinetrione dihydrate. It makes up 1.2% of the product. The other 98.8% of the product is made up of other ingredients. Troclosene sodium dihydrate is not a hypochlorite, as is bleach.
Here the model's knowledgestore correctly recognizes that "sodium dichloro-s-triazinetrione dihydrate" and "troclosene sodium dihydrate" are synonyms, though it fails to understand that using technical chemical terms interchangeably in response to a common user query may yield confusing results at best. However, its second sentence is more problematic. Its awkward phrasing in "not a hypochlorite, as is bleach" makes it difficult to fully parse, but if understood as arguing that bleach is not hypochlorite, this is clearly false, though the phrasing is ambiguous.
To compare the results, the same query was run in a competing generative search interface. This system latched onto the fact that astronomical bodies are also frequently described in terms of their component makeup, with many popular audience articles referring to their "ingredients", yielding:
Comets contain both rock and a variety of frozen gases including ammonia, water, carbon monoxide, carbon dioxide, and methane. They are made up of a core, nucleus, coma and tail and have the primary ingredients of ice and dust, including ice made of water, methane and various metals.
Here the LLM appears to have blended various true statements together, but conflates some, such as describing ice made of metals.
Asked to regenerate its results, it produced:
The active ingredient in Comet cleaner is calcium carbonate. It is used as a scrubbing agent to remove dirt and debris from surfaces.
Of the results thus far, this is the most succinct. Yet, it also reflects the instability of embedding+LLM search. It is unlikely that the underlying embedding search returned existentially different results just minutes apart. Instead, this is likely a result either of top X reranking (the model reweighting results in the belief that the user didn't like the original results – though this is unlikely here since each search was run as an independent query in a brand new browser session that cleared all cookies first) or the LLM latching onto different result text in different runs (the more likely scenario). In other words, if an LLM is presented with a selection of paragraphs, some of which pertain to astronomical bodies and others to bathroom cleaners, in one run it might blend the two, in another it might select just the celestial comets, in another just the bathroom cleaners. Each run can yield different results.
To test stability, it was asked the question again, yielding:
The active ingredient in Comet cleaner is sodium dichloro-s-triazinetrione dihydrate 1. According to the Safety Data Sheets (SDS) published by Procter & Gamble for "PGP Comet Deodorizing Cleanser with Chlorinol" and Prestige Brands for “Comet Powdered Cleanser”, Comet cleanser contains 60–100% calcium carbonate (Ca CO 3) 2. Other ingredients that are common to all Prestige Brands Comet Powdered Cleansers are listed as: 1) Calcium carbonate – Scrubbing agent 2) Calcium hydroxide – pH adjuster 3) Fragrance – Smell 4) Green 7 – Colorant (absent in "Comet Lemon Powder") 5) Sodium carbonate – Builder/sequestering agent 6) Sodium linear alkylbenzenesulfonate surfactant – Cleaning agent 7) Trichloro-s-triazinetrione – Bleach 8) Trichloroisocyanuric acid (Symclosene) – Disinfectant
Compare this with the near-verbatim paragraph from Wikipedia:
According to the Safety Data Sheets (SDS) published by Procter & Gamble for "PGP Comet Deodorizing Cleanser with Chlorinol" and Prestige Brands for "Comet Powdered Cleanser", Comet cleanser contains 60–100% calcium carbonate (CaCO3).
Ingredients that are common to all Prestige Brands Comet Powdered Cleansers are listed as:
-
- Calcium carbonate – Scrubbing agent
- Calcium hydroxide – pH adjuster
- Fragrance – Smell
- Green 7 – Colorant (absent in "Comet Lemon Powder")
- Sodium carbonate – Builder/sequestering agent
- Sodium linear alkylbenzenesulfonate surfactant – Cleaning agent
- Trichloro-s-triazinetrione – Bleach
- Trichloroisocyanuric acid (Symclosene) – Disinfectant
While Wikipedia was cited as one of several sources for the information, the generative search engine almost precisely plagiarized the source material. Two different generative search engines, two different cases of plagiarism.
Asked the question again, the engine this time clearly attributed Wikipedia as the source and offered a simple summarization:
According to the Wikipedia article on Comet (cleanser), the active ingredient in Comet cleaner is sodium dichloro-s triazinetrione dihydrate, which is a bleach and disinfectant. It makes up 1-5% of the product, along with other ingredients such as calcium carbonate, sodium carbonate, calcium hydroxide, fragrance, and colorant.
Yet, asked again, it not only reverted to plagiarism, but this time copied the Wikipedia entry word-for-word in its entirety, rather than rewriting a few words in the opening sentence like it did before:
According to the Safety Data Sheets (SDS) published by Procter & Gamble for “PGP Comet Deodorizing Cleanser with Chlorinol” and Prestige Brands for "Comet Powdered Cleanser", Comet cleanser contains 60–100% calcium carbonate (CaCO3). Other ingredients that are common to all Prestige Brands Comet Powdered Cleansers are listed as: Calcium hydroxide, Fragrance, Green 7, Sodium carbonate, Sodium linear alkylbenzenesulfonate surfactant, Trichloro-s-triazinetrione, and Trichloroisocyanuric acid (Symclosene).
Beside the results it displayed the Wikipedia article, but beside it also displayed an article from National Geographic and a STEM activity guide, both about astronomical comets.
Putting this all together, this simple example demonstrates some of the great limitations of current generative search and the embeddings+LLM external memory models that undergird them, not to mention reinforcing the dangers of hallucination, ambiguous wording and plagiarism.