This platform requires JavaScript for full functionality. Please enable JavaScript in your browser settings.

Quality follows upgrading

Sunghyeon Woo, Ahreum Seo, Jaegwang Lee, Jaeeun Kil, Hanbae Seo, Joonghoon Kim, Baeseong Park, Se Jung Kwon, Dongsoo Lee

Articles by Sunghyeon Woo, Ahreum Seo, Jaegwang Lee, Jaeeun Kil, Hanbae Seo, Joonghoon Kim, Baeseong Park, Se Jung Kwon, Dongsoo Lee

Academic · 1 min

SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

arXiv:2603.02599v1 Announce Type: new Abstract: In multi-model LLM serving, decode execution remains inefficient due to model-specific resource partitioning: since cross-model batching is not possible, memory-bound …

18 views Mar 7

Sunghyeon Woo, Ahreum Seo, Jaegwang Lee, Jaeeun Kil, Hanbae Seo, Joonghoon Kim, Baeseong Park, Se Jung Kwon, Dongsoo Lee

Articles by Sunghyeon Woo, Ahreum Seo, Jaegwang Lee, Jaeeun Kil, Hanbae Seo, Joonghoon Kim, Baeseong Park, Se Jung Kwon, Dongsoo Lee

SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

JCG, PC

HSOLLC Co., Ltd.