Delta-Crosscoder: Robust Crosscoder Model Diffing in Narrow Fine-Tuning Regimes
arXiv:2603.04426v1 Announce Type: new Abstract: Model diffing methods aim to identify how fine-tuning changes a model's internal representations. Crosscoders approach this by learning shared dictionaries …
Aly Kassem, Thomas Jiralerspong, Negar Rostamzadeh, Golnoosh Farnadi
4 views