One of the challenges with generative AI models has been that they tend to hallucinate responses. In other words, they will present an answer that is factually incorrect, but will be confident in doing so, sometimes even doubling down when you point out that what they’re saying is wrong.
“[Large language models] can be inconsistent by nature with the inherent randomness and variability in the training data, which can lead to different responses for similar prompts. LLMs also have limited context windows, which can cause coherence issues in extended conversations, as they lack true understanding, relying instead on patterns in the data,” said Chris Kent, SVP of marketing for Clarifai, an AI orchestration company.
Retrieval-augmented generation (RAG) is picking up traction because when applied to LLMs, it can help to reduce the occurrence of hallucinations, as well as offer some other additional benefits.
“The goal of RAG is to marry up local data, or data that wasn’t used in training the actual LLM itself, so that the LLM hallucinates less than it otherwise would,” said Mike Bachman, head of architecture and AI strategy at Boomi, an iPaaS company.
He explained that LLMs are typically trained on very general data and often older data. Additionally, because it takes months to train these models, by the time it is ready, the data has become even older.
For instance, the free version of ChatGPT uses GPT-3.5, which cuts off its training data in January 2022, which is nearly 28 months ago at this point. The paid version that uses GPT-4 gets you a bit more up-to-date, but still only has information from up to April 2023.
“You’re missing all of the changes that have happened from April of 2023,” Bachman said. “In that particular case, that’s a whole year, and a lot happens in a year, and a lot has happened in this past year. And so what RAG will do is it could help shore up data that’s changed.”
As an example, in 2010 Boomi was acquired by Dell, but in 2021 Dell divested the company and now Boomi is privately owned again. According to Bachman, earlier versions of GPT-3.5 Turbo were still making references to Dell Boomi, so they used RAG to supply the LLM with up-to-date knowledge of the company so that it would stop making those incorrect references to Dell Boomi.
RAG can also be used to augment a model with private company data to provide personalized results or to support a specific use case.
“I think where we see a lot of companies using RAG, is they’re just trying to basically handle the problem of how do I make an LLM have access to real-time information or proprietary information beyond the time period or data set under which it was trained,” said Pete Pacent, head of product at Clarifai.
For instance, if you’re building a copilot for your internal sales team, you could use RAG to be able to supply it with up-to-date sales information, so that when a salesperson asks “how are we doing this quarter?” the model can actually respond with updated, relevant information, said Pacent.
The challenges of RAG
Given the benefits of RAG, why hasn’t it seen greater adoption so far? According to Clarifai’s Kent, there are a couple factors at play. First, in order for RAG to work, it needs access to multiple different data sources, which can be quite difficult, depending on the use case.
RAG might be easy for a simple use case, such as conversation search across text documents, but much more complex when you apply that use case across patient records or financial data. At that point you’re going to be dealing with data with different sources, sensitivity, classification, and access levels.
It’s also not enough to just pull in that data from different sources; that data also needs to be indexed, requiring comprehensive systems and workflows, Kent explained.
And finally, scalability can be an issue. “Scaling a RAG solution across maybe a server or small file system can be straightforward, but scaling across an org can be complex and really difficult,” said Kent. “Think of complex systems for data and file sharing now in non-AI use cases and how much work has gone into building those systems, and how everyone is scrambling to adapt and modify to work with workload intensive RAG solutions.”
RAG vs fine-tuning
So, how does RAG differ from fine-tuning? With fine-tuning, you are providing additional information to update or refine an LLM, but it’s still a static mode. With RAG, you’re providing additional information on top of the LLM. “They enhance LLMs by integrating real-time data retrieval, offering more accurate and current/relevant responses,” said Kent.
Fine-tuning might be a better option for a company dealing with the above-mentioned challenges, however. Generally, fine-tuning a model is less infrastructure intensive than running a RAG.
“So performance vs cost, accuracy vs simplicity, can all be factors,” said Kent. “If organizations need dynamic responses from an ever-changing landscape of data, RAG is usually the right approach. If the organization is looking for speed around knowledge domains, fine-tuning is going to be better. But I’ll reiterate that there are a myriad of nuances that could change those recommendations.”