Simplifying Outcomes of Language Model Component Analyses with ELIA
arXiv:2602.18262v1 Announce Type: new Abstract: While mechanistic interpretability has developed powerful tools to analyze the internal workings of Large Language Models (LLMs), their complexity has …
Aaron Louis Eidt, Nils Feldhus
3 views