Test-Time Scaling with Diffusion Language Models via Reward-Guided Stitching
arXiv:2602.22871v1 Announce Type: new Abstract: Reasoning with large language models often benefits from generating multiple chains-of-thought, but existing aggregation strategies are typically trajectory-level (e.g., selecting …
Roy Miles, Aysim Toker, Andreea-Maria Oncescu, Songcen Xu, Jiankang Deng, Ismail Elezi
6 views