Why Grokking Takes So Long: A First-Principles Theory of Representational Phase Transitions
arXiv:2603.13331v1 Announce Type: new Abstract: Grokking is the sudden generalization that appears long after a model has perfectly memorized its training data. Although this phenomenon …
Truong Xuan Khanh, Truong Quynh Hoa, Luu Duc Trung, Phan Thanh Duc
4 views