Generative search in the form of Retrieval Augmented Generation (RAG) has been widely hyped as the future of search, with LLMs positioned between traditional embedding-based search results and the end user, summarizing and synthesizing the results and presenting the user a succinct targeted answer to their question, rather than a page of search results. Yet, our tests to date increasingly call into question the accuracy and utility of current LLMs in RAG applications and whether the outputs they generate are truly more useful than simply providing more advanced embedding-based snippet results due to the density of hallucination, coherence failure, freshness and excluded results. The dangers of RAG are especially apparent when it comes to technical questions.
Our experiment applying GPT-4 + Bing generative search last week vividly demonstrates the limitations of RAG even in simple applications. In both cases, GPT-4 translated the query into a reasonable search query and in both cases Bing returned the correct answer in multiple of the top search results. Yet, in the first case, GPT-4 scanned the search results from Bing, skipped past all of the top results (correct and authoritative results from GCP's official documentation) and instead summarized two outdated articles from the web that are more than a year old to provide the wrong answer to the user containing just one of the five machine types and failing to note the restriction on the type it did list. In the second case, despite once again being provided the correct answer in multiple top search results, GPT-4 merely provides a generic summary of what an SSD disk is, rather than answering the question of what GCE machine types support it. In short, RAG search may not be a good fit for technical searches – one of the very areas it is frequently hyped for.
Last week we asked GPT-4 to list the machine types on GCP that support Hyperdisk Extreme block storage devices:
Which machine types on GCP are Hyperdisk Extreme disks supported for?
For reference, the correct answer from the Hyperdisk documentation is:
- C3 with 88 or more vCPUs
- M1 with 80 or more vCPUs
- M2
- M3 with 64 or more vCPUs
- N2 with 80 or more vCPUs
Rather than attempt to answer from its knowledge store, GPT-4 analyzed the query and translated it into a Bing search that accurately captured the gist of the question. The top search results returned by Bing also accurately contain the correct answer. In fact, the top search result is the page above and Bing has even excerpted the answer. Yet, rather than correctly excerpt or summarize the answer from those search results, GPT-4 fixates on the statistical frequency of the C3 machine series throughout the results (due to widespread news coverage of C3's support for Extreme disks). It then produces the following response which lists only C3 support, fails to mention the core requirement for C3 and fails to mention the other four supported series. Worse, rather than cite the official Hyperdisk documentation from GCP, it cites two articles on two developer websites that date from October and November of 2022. In other words, GPT-4 scanned the search results from Bing, skipped past the correct and authoritative results from GCP's official documentation and instead summarized two outdated articles from the web that are more than a year old:
I did a [quick search] and here's what I found.
Hyperdisk Extreme disks in Google Cloud Platform (GCP) are designed for high-performance and data-intensive workloads, and they are specifically optimized for use with the C3 machine series. The C3 series includes custom Intel Infrastructure Processing Units (IPUs) and 4th Gen Xeon Scalable processors, providing a significant performance enhancement over previous generations. This makes them particularly suitable for demanding applications such as Hadoop and Microsoft SQL Server data workloads, as well as for high-end database management systems like SAP HANA, SQL Server, and others.
The C3 virtual machines, when paired with Hyperdisk, deliver four times more throughput and a tenfold improvement in IOPS compared to the previous C2 instances. This enhancement is a result of the unique architecture that combines CPUs and an IPU, leading to more efficient use of processing resources. Therefore, for applications requiring substantial storage and networking capabilities, C3 VMs with Hyperdisk are a highly effective choice.
For detailed information and updates on GCP's Hyperdisk Extreme disks and compatible machine types, it is recommended to refer to Google Cloud's official documentation or contact their sales representatives【Source】【Source】.
Similarly, we asked GPT-4 to list the machine types supported by Persistent SSD (pd-ssd) disks:
Which machine types on GCP are Persistent SSD disks supported for?
Once again, GPT-4 translates the query into a Bing search that returns the correct answer on several of the top search results, yet in summarizing the results, it merely describes what an SSD PD is in GCE, rather than answer the actual question regarding which machine types support it:
I did a [quick search] and here's what I discovered.
SSD Persistent Disks (pd-ssd) in Google Cloud Platform's Compute Engine are supported across all machine types. These disks are designed for fast and reliable block storage, making them suitable for high-performance database needs and enterprise applications. They are backed by Solid-State Drives (SSDs) and provide higher IOPS compared to Standard Persistent Disks. The SSD Persistent Disks can be either zonal or regional, allowing for flexibility in deployment and redundancy. This makes them a versatile option for a wide range of computing needs on the Google Cloud Platform. For more detailed information, you can refer to [Architecture Simplified](https://architecturesimplified.com/gcp-google-compute-engine-disks-options/).