Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability
arXiv:2603.10384v1 Announce Type: new Abstract: Evaluating LLM reliability via scalar probabilities often fails to capture the structural dynamics of reasoning. We introduce TRACED, a framework …
Xinyan Jiang, Ninghao Liu, Di Wang, Lijie Hu
11 views