Agentic RAG is currently getting a lot of attention as a practical way to reduce (or, depending on the vendor’s boldness, eliminate) hallucinations caused by generative artificial intelligence (genAI) tools. Sadly, it may not reduce hallucinations, but it could open the door to other problems.
To be clear, there’s nothing wrong with Agentic RAG (which stands for Recovery Augmented Generation) – it works well for some users, but for others it’s disappointing, expensive, labor intensive, and doesn’t always deliver on its key promises.
Agentic RAG is designed to allow the integration of additional databases and data sources, so that a genAI algorithm has a broader range of information for its initial findings. But using AI to manage AI, in a nutshell, adding Even more AI in the equation does not always produce better results.
I spoke to two genAI experts who should know: Alan Nichol, CTO of Rasa, and agent specialist Sandi Besen.
“RAG Agentic is an unnecessary buzzword,” Nichol said. “It simply means adding a loop around your [large language models] and recovery calls. The market is in a strange situation where adding an extra ‘while’ loop or ‘if’ statement to code is touted as a revolutionary new method. State-of-the-art web agents only achieve a 25% success rate, which is unacceptable in any software context.
“Companies and developers should explicitly write some business logic in regular code,” he said. “They can use LLMs to convert user input into structured formats and paraphrase search results, making them sound more natural.”
Nichol argued that Agentic RAG is often the wrong approach to enterprise data analytics needs. “Agentic RAG is the wrong way to think about the problem,” he said. “Any RAG that works well is just a simple search engine with a little LLM magic sprinkled on it.”
While that tactic may work, IT should stop thinking that “the way to solve this[hallucination]problem is to add another call for the LLM program,” Nichol said. “People expect that this kind of approach will magically solve the root problem.”
And what is the underlying problem? The quality of the data.
Nichol said he often sees companies that have “built a bad recovery system, because they haven’t cleaned up their data. It’s boring and unsexy to clean up outdated information, like version control and data conflict management. Instead, they add seven more calls to the LLM to paper over all the data issues they have. It’s just going to put a lot of work into the LLM and it’s not going to work very well.”
“It won’t solve your problem, but it will feel that way.”
Besen, an applied AI researcher at IBM, argues that the agency can indeed reduce hallucinations, but agrees with Nichol that it might not always be the best business approach.
Besen warns that adding complexity to an already complex genAI package can lead to unexpected problems.
“When you increase the number of agents, you inherently increase the variability of a solution,” he said. “However, with the right architecture in place, i.e. with the right team of agents, [is] “If built effectively and given the right cues, there should be a lower likelihood of hallucinations because you can incorporate evaluation and reasoning. For example, you can have one agent that retrieves content and another that evaluates whether the retrieved information is relevant to answering the original question. With traditional RAG, there was no natural language reasoning check to determine whether the retrieved information was relevant.”
Like anything else in programming, this may or may not yield the desired results. “There is a way for it to be very successful and a way for it to not be. The trick is to match our expectations to the capabilities of the technology,” Besen said. “The ability of an agent is only as good as the language model that underpins it. The ability to reason depends on the language model.”
That said, Besen stressed that despite what some AI vendors claim, even the best implementation of agentic RAG will never make hallucinations go away. “It’s impossible to completely eliminate hallucinations at this point. But there could be a reduction in hallucinations.”
IT executives must decide whether they can live with that uncertainty and the risk of getting wrong answers from time to time. “If you want the same result every time, don’t use genAI,” Besen said. As for accepting occasional hallucinations, Besen suggested that the IT department consider how it would react if an employee or contractor did so.
“Are you okay with having an employee who is wrong 10% of the time?”