Hierarchical Latent Structures in Data Generation Process Unify Mechanistic Phenomena across Scale
arXiv:2603.06592v1 Announce Type: new Abstract: Contemporary studies have uncovered many puzzling phenomena in the neural information processing of Transformer-based language models. Building a robust, unified …
Jonas Rohweder, Subhabrata Dutta, Iryna Gurevych
22 views