RACER: Risk-Aware Calibrated Efficient Routing for Large Language Models
arXiv:2603.06616v1 Announce Type: new Abstract: Efficiently routing queries to the optimal large language model (LLM) is crucial for optimizing the cost-performance trade-off in multi-model systems. …
Sai Hao, Hao Zeng, Hongxin Wei, Bingyi Jing
10 views