Academic

Keyword search is all you need: Achieving RAG-Level Performance without vector databases using agentic tool use

arXiv:2602.23368v1 Announce Type: cross Abstract: While Retrieval-Augmented Generation (RAG) has proven effective for generating accurate, context-based responses based on existing knowledge bases, it presents several challenges including retrieval quality dependencies, integration complexity and cost. Recent advances in agentic-RAG and tool-augmented LLM architectures have introduced alternative approaches to information retrieval and processing. We question how much additional value vector databases and semantic search bring to RAG over simple, agentic keyword search in documents for question-answering. In this study, we conducted a systematic comparison between RAG-based systems and tool-augmented LLM agents, specifically evaluating their retrieval mechanisms and response quality when the agent only has access to basic keyword search tools. Our empirical analysis demonstrates that tool-based keyword search implementations within an agentic framework can attain over $90\%$ of the pe

arXiv:2602.23368v1 Announce Type: cross Abstract: While Retrieval-Augmented Generation (RAG) has proven effective for generating accurate, context-based responses based on existing knowledge bases, it presents several challenges including retrieval quality dependencies, integration complexity and cost. Recent advances in agentic-RAG and tool-augmented LLM architectures have introduced alternative approaches to information retrieval and processing. We question how much additional value vector databases and semantic search bring to RAG over simple, agentic keyword search in documents for question-answering. In this study, we conducted a systematic comparison between RAG-based systems and tool-augmented LLM agents, specifically evaluating their retrieval mechanisms and response quality when the agent only has access to basic keyword search tools. Our empirical analysis demonstrates that tool-based keyword search implementations within an agentic framework can attain over $90\%$ of the performance metrics compared to traditional RAG systems without using a standing vector database. Our approach is simple to implement, cost effective, and is particularly useful in scenarios requiring frequent updates to knowledge bases.

Executive Summary

This study challenges the necessity of vector databases in Retrieval-Augmented Generation (RAG) systems, demonstrating that agentic keyword search can achieve over 90% of the performance metrics of traditional RAG systems. The authors' approach is simple, cost-effective, and particularly useful for scenarios requiring frequent knowledge base updates. By leveraging tool-based keyword search within an agentic framework, the study shows that high-performance RAG systems can be achieved without the need for complex vector databases. This finding has significant implications for the development of efficient and scalable RAG systems. The study's results suggest that agentic keyword search can be a viable alternative to traditional RAG systems, offering a more straightforward and cost-effective solution for question-answering tasks.

Key Points

  • Agentic keyword search can achieve high-performance RAG metrics without vector databases
  • Tool-based keyword search implementations can attain over 90% of traditional RAG system performance
  • The approach is simple, cost-effective, and suitable for frequent knowledge base updates

Merits

Efficiency and Cost-Effectiveness

The proposed approach eliminates the need for complex and costly vector databases, making it a more efficient and cost-effective solution for RAG systems.

Demerits

Limited Contextual Understanding

The reliance on keyword search may limit the system's ability to capture nuanced contextual relationships and semantics, potentially affecting response quality in certain scenarios.

Expert Commentary

The study's results contribute to the ongoing discussion about the role of vector databases in RAG systems, highlighting the potential for agentic keyword search to be a viable alternative. The findings have significant implications for the development of efficient and scalable RAG systems, particularly in scenarios where knowledge bases require frequent updates. However, further research is needed to fully explore the limitations and potential applications of this approach, including its ability to capture nuanced contextual relationships and semantics. As the field continues to evolve, it will be essential to consider the trade-offs between efficiency, cost-effectiveness, and response quality in RAG systems.

Recommendations

  • Further research into the limitations and potential applications of agentic keyword search in RAG systems
  • Exploration of hybrid approaches that combine the benefits of vector databases and agentic keyword search

Sources