Why Any-Order Autoregressive Models Need Two-Stream Attention: A Structural-Semantic Tradeoff
arXiv:2602.16092v1 Announce Type: new Abstract: Any-order autoregressive models (AO-ARMs) offer a promising path toward efficient masked diffusion by enabling native key-value caching, but competitive performance …
Patrick Pynadath, Ruqi Zhang
4 views