Causally Grounded Mechanistic Interpretability for LLMs with Faithful Natural-Language Explanations
arXiv:2603.09988v1 Announce Type: cross Abstract: Mechanistic interpretability identifies internal circuits responsible for model behaviors, yet translating these findings into human-understandable explanations remains an open problem. …
Ajay Pravin Mahale
15 views