DistillLens: Symmetric Knowledge Distillation Through Logit Lens
arXiv:2602.13567v1 Announce Type: new Abstract: Standard Knowledge Distillation (KD) compresses Large Language Models (LLMs) by optimizing final outputs, yet it typically treats the teacher's intermediate …
Manish Dhakal, Uthman Jinadu, Anjila Budathoki, Rajshekhar Sunderraman, Yi Ding
4 views