ConsRoute:Consistency-Aware Adaptive Query Routing for Cloud-Edge-Device Large Language Models
arXiv:2603.21237v1 Announce Type: new Abstract: Large language models (LLMs) deliver impressive capabilities but incur substantial inference latency and cost, which hinders their deployment in latency-sensitive …
Haoyu Qiao, Hao Zhang, Shanwen Mao, Siyao Cheng, Jie Liu
9 views