Academic

Why Agent Caching Fails and How to Fix It: Structured Intent Canonicalization with Few-Shot Learning

arXiv:2602.18922v1 Announce Type: new Abstract: Personal AI agents incur substantial cost via repeated LLM calls. We show existing caching methods fail: GPTCache achieves 37.9% accuracy on real benchmarks; APC achieves 0-12%. The root cause is optimizing for the wrong property -- cache effectiveness requires key consistency and precision, not classification accuracy. We observe cache-key evaluation reduces to clustering evaluation and apply V-measure decomposition to separate these on n=8,682 points across MASSIVE, BANKING77, CLINC150, and NyayaBench v2, our new 8,514-entry multilingual agentic dataset (528 intents, 20 W5H2 classes, 63 languages). We introduce W5H2, a structured intent decomposition framework. Using SetFit with 8 examples per class, W5H2 achieves 91.1%+/-1.7% on MASSIVE in ~2ms -- vs 37.9% for GPTCache and 68.8% for a 20B-parameter LLM at 3,447ms. On NyayaBench v2 (20 classes), SetFit achieves 55.3%, with cross-lingual transfer across 30 languages. Our five-tier c

A
Abhinaba Basu
· · 1 min read · 20 views

arXiv:2602.18922v1 Announce Type: new Abstract: Personal AI agents incur substantial cost via repeated LLM calls. We show existing caching methods fail: GPTCache achieves 37.9% accuracy on real benchmarks; APC achieves 0-12%. The root cause is optimizing for the wrong property -- cache effectiveness requires key consistency and precision, not classification accuracy. We observe cache-key evaluation reduces to clustering evaluation and apply V-measure decomposition to separate these on n=8,682 points across MASSIVE, BANKING77, CLINC150, and NyayaBench v2, our new 8,514-entry multilingual agentic dataset (528 intents, 20 W5H2 classes, 63 languages). We introduce W5H2, a structured intent decomposition framework. Using SetFit with 8 examples per class, W5H2 achieves 91.1%+/-1.7% on MASSIVE in ~2ms -- vs 37.9% for GPTCache and 68.8% for a 20B-parameter LLM at 3,447ms. On NyayaBench v2 (20 classes), SetFit achieves 55.3%, with cross-lingual transfer across 30 languages. Our five-tier cascade handles 85% of interactions locally, projecting 97.5% cost reduction. We provide risk-controlled selective prediction guarantees via RCPS with nine bound families.

Executive Summary

This article presents a novel approach to addressing the limitations of current caching methods for personal AI agents, which are often plagued by high costs due to repeated Large Language Model (LLM) calls. The authors propose a structured intent canonicalization framework, W5H2, that leverages few-shot learning to achieve significant improvements in cache effectiveness. The framework is evaluated on a massive multilingual dataset, demonstrating its ability to reduce costs by 97.5% and improve accuracy. The authors also introduce a risk-controlled selective prediction scheme, RCPS, to provide guarantees for prediction accuracy. The proposed solution has significant implications for the development of more efficient and effective AI agents.

Key Points

  • The existing caching methods, such as GPTCache and APC, fail to achieve satisfactory results due to incorrect optimization of cache effectiveness.
  • A novel structured intent canonicalization framework, W5H2, is introduced to address the limitations of current caching methods.
  • Few-shot learning is used to improve cache effectiveness, achieving significant improvements in accuracy and cost reduction.

Merits

Strength of the proposed framework

The authors present a comprehensive and well-motivated framework for structured intent canonicalization, which is evaluated on a massive multilingual dataset. The framework demonstrates significant improvements in cache effectiveness and accuracy, making it a promising approach for addressing the limitations of current caching methods.

Demerits

Limitation of the proposed framework

The authors rely on few-shot learning, which may not be suitable for all domains or scenarios. Additionally, the proposed framework may require significant computational resources and expertise to implement and fine-tune.

Expert Commentary

The authors make a significant contribution to the field of AI and natural language processing by introducing a novel framework for structured intent canonicalization. The proposed framework demonstrates significant improvements in cache effectiveness and accuracy, making it a promising approach for addressing the limitations of current caching methods. However, the authors rely on few-shot learning, which may not be suitable for all domains or scenarios. Additionally, the proposed framework may require significant computational resources and expertise to implement and fine-tune. Nevertheless, the authors' work on RCPS provides a valuable contribution to the development of more robust and reliable AI systems. Overall, the proposed framework is a valuable addition to the field of AI and natural language processing, and its implications are significant for both practical and policy-related applications.

Recommendations

  • Future research should focus on evaluating the proposed framework in more diverse and complex scenarios to assess its robustness and generalizability.
  • The authors should provide more details on the implementation and fine-tuning of the proposed framework, including the computational resources required and the expertise needed to implement it.

Sources