Academic

Higress-RAG: A Holistic Optimization Framework for Enterprise Retrieval-Augmented Generation via Dual Hybrid Retrieval, Adaptive Routing, and CRAG

Weixi Lin · March 7, 2026 · 1 min read · 15 views

#cs.IR #cs.AI #cs.CL

arXiv:2602.23374v1 Announce Type: cross Abstract: The integration of Large Language Models (LLMs) into enterprise knowledge management systems has been catalyzed by the Retrieval-Augmented Generation (RAG) paradigm, which augments parametric memory with non-parametric external data. However, the transition from proof-of-concept to production-grade RAG systems is hindered by three persistent challenges: low retrieval precision for complex queries, high rates of hallucination in the generation phase, and unacceptable latency for real-time applications. This paper presents a comprehensive analysis of the Higress RAG MCP Server, a novel, enterprise-centric architecture designed to resolve these bottlenecks through a "Full-Link Optimization" strategy. Built upon the Model Context Protocol (MCP), the system introduces a layered architecture that orchestrates a sophisticated pipeline of Adaptive Routing, Semantic Caching, Hybrid Retrieval, and Corrective RAG (CRAG). We detail the technical implementation of key innovations, including the Higress-Native Splitter for structure-aware data ingestion, the application of Reciprocal Rank Fusion (RRF) for merging dense and sparse retrieval signals, and a 50ms-latency Semantic Caching mechanism with dynamic thresholding. Experimental evaluations on domain-specific Higress technical documentation and blogs verify the system's architectural robustness. The results demonstrate that by optimizing the entire retrieval lifecycle - from pre-retrieval query rewriting to post-retrieval corrective evaluation - the Higress RAG system offers a scalable, hallucination-resistant solution for enterprise AI deployment.

Executive Summary

The article introduces Higress-RAG, a holistic optimization framework for enterprise retrieval-augmented generation. It addresses challenges in retrieval precision, hallucination, and latency by implementing a full-link optimization strategy. The system features a layered architecture with adaptive routing, semantic caching, hybrid retrieval, and corrective RAG. Experimental evaluations demonstrate the system's robustness and scalability, offering a hallucination-resistant solution for enterprise AI deployment.

Key Points

▸ Higress-RAG framework for enterprise retrieval-augmented generation
▸ Full-link optimization strategy to address retrieval precision, hallucination, and latency
▸ Layered architecture with adaptive routing, semantic caching, hybrid retrieval, and corrective RAG

Merits

Scalability

The system demonstrates scalability and robustness in experimental evaluations

Hallucination Resistance

The corrective RAG mechanism reduces hallucination in the generation phase

Demerits

Complexity

The system's layered architecture and multiple components may add complexity to implementation and maintenance

Expert Commentary

The Higress-RAG framework represents a significant advancement in retrieval-augmented generation, addressing key challenges in retrieval precision, hallucination, and latency. The system's full-link optimization strategy and layered architecture demonstrate a nuanced understanding of the complexities involved in enterprise AI deployment. However, further research is needed to fully explore the implications of this framework, particularly with regards to explainability, data quality, and regulatory compliance.

Recommendations

✓ Further evaluation of the Higress-RAG framework in diverse enterprise settings to assess its generalizability and adaptability
✓ Investigation into the potential applications of the Higress-RAG framework in other domains, such as healthcare or finance

Sources

arXiv - cs.AI

Higress-RAG: A Holistic Optimization Framework for Enterprise Retrieval-Augmented Generation via Dual Hybrid Retrieval, Adaptive Routing, and CRAG

AI Commentary

Executive Summary

Key Points

Merits

Scalability

Hallucination Resistance

Demerits

Complexity

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs