Beyond Test-Time Compute Strategies: Advocating Energy-per-Token in LLM Inference
arXiv:2603.20224v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate exceptional performance across diverse tasks but come with substantial energy and computational costs, particularly in …
Patrick Wilhelm, Thorsten Wittkopp, Odej Kao
120 views