Task-Specific Knowledge Distillation via Intermediate Probes
arXiv:2603.12270v1 Announce Type: cross Abstract: Knowledge distillation from large language models (LLMs) assumes that the teacher's output distribution is a high-quality training signal. On reasoning …
Ryan Brown, Chris Russell
11 views