SR-TTT: Surprisal-Aware Residual Test-Time Training
arXiv:2603.06642v1 Announce Type: new Abstract: Test-Time Training (TTT) language models achieve theoretically infinite context windows with an O(1) memory footprint by replacing the standard exact-attention …
Swamynathan V P
8 views